Research Program / emerging

Distributed Runtimes

Runtime systems, state, scheduling, observability, and reliability for AI workloads.

Questions

Topic Links

Linked Artifacts

Field Note / May 3, 2026

The Inference Stack in 2026.

A field note on token economics, runtime systems, model architecture, and the stack changes behind public LLM API price compression.

Definition / Undated

Speculative Decoding

A decoding-time acceleration technique that drafts tokens with a smaller model and verifies them with a larger model.