Research Program / emerging

Distributed Runtimes

Runtime systems, state, scheduling, observability, and reliability for AI workloads.

Questions

Field Note / May 3, 2026

A field note on token economics, runtime systems, model architecture, and the stack changes behind public LLM API price compression.

Definition / Undated

A decoding-time acceleration technique that drafts tokens with a smaller model and verifies them with a larger model.