Mamba and state-space models #
Mamba is a selective state-space model (SSM) architecture for sequence modeling. It achieves transformer-class quality on language tasks while running in linear time and constant memory with respect to sequence length, in contrast to the quadratic-time, linear-memory attention layer.
Definition #
State-space models are a family of sequence models drawn from classical control theory. They maintain a continuous-time hidden state that evolves through linear differential equations, summarizing the entire sequence history in a fixed-size representation. Mamba (Gu and Dao, 2023) made SSMs competitive with transformers by introducing selectivity: the parameters of the state-space dynamics depend on the input, allowing the model to selectively retain or forget information per token.
Why it matters #
Pure-transformer attention is O(n²) in sequence length and requires a key-value (KV) cache that grows linearly with context. At long contexts (128K, 256K, 1M tokens), this becomes economically punishing both in memory and in compute.
Mamba and related selective SSMs are O(n) in sequence length with constant per-step memory. For long-context workloads, the throughput advantage is substantial.
Hybrid is the production frontier #
Pure Mamba models underperform pure transformers on some short-context retrieval-style tasks, where attention’s ability to directly query any prior token is the right primitive. The 2025 to 2026 production frontier is hybrid: transformer attention layers interleaved with Mamba layers, often with Mixture-of-Experts on top.
The flagship example is Jamba 1.5 (AI21): 398B total parameters, 94B active, 256K-token context, with Mamba and attention layers at a 1:7 ratio and MoE every two blocks. Mamba-3 was published in 2026.
Tradeoffs #
- Pros. Linear sequence complexity. Smaller inference-time memory. Up to 5x throughput vs equivalent pure-transformer at long context. Smaller KV cache.
- Cons. Pure SSMs lag on tasks requiring sharp attention to specific prior tokens. Hybrid architectures recover this at the cost of architectural complexity.
Related #
- The Inference Stack in 2026. Section 5 covers hybrid architectures and Jamba 1.5 in detail.
- Edge AI silicon: CV5 vs Jetson vs Hexagon. Where memory constraints make linear-complexity sequence models particularly valuable.
References #
- AI21. Jamba 1.5: Hybrid Transformer-Mamba MoE Models. link
- Gu, A. and Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. link