Field Notes

Public field notes on inference economics, verification economics, and AI systems engineering. Field Notes #1 to #3 form the May 2026 inference and verification economics sequence; Field Note #4 extends the archive into AI-system failure analysis; Field Note #5 opens a daily-review cadence of fresh external work.

Note on register. Field notes, not peer-reviewed research; each piece synthesizes published literature and adds an analytical decomposition. Original measurement is forthcoming. The original research papers live at /papers.

Featured. Field Note. May 17, 2026

The Power-Cap Illusion. SM Clock Locking and the Real Decode Lever.

A daily field note on Ma, Afzal, Eitzinger, and Wellein (arXiv:2605.11999). Across GQA, Multi-head Latent Attention, Gated DeltaNet, and Mamba2 on NVIDIA H200, autoregressive decode draws only 137 to 300 W on a 700 W GPU and no power cap ever triggers. The cap is above the natural ceiling of a memory-bound workload that saturates HBM bandwidth rather than compute. SM clock locking is the lever actually on the critical path and Pareto-dominates power capping, recovering up to 32% of decode energy at minimal throughput loss. The paper identifies three architecture-dependent DVFS behavioral classes and reports a prefill-decode energy crossover that halves total request energy relative to GQA at production batch sizes. The economic consequence is a tightened decode-cost term in Cost-correct and a shift in the inference-frontier threshold in favor of memory-efficient attention replacements.

inference economicsai systems engineering

Earlier notes

Related

Original research papers sit alongside. Definitions live under the glossary. Active investigative lines live under Programs.