Public field notes on inference economics, verification economics, and AI
systems engineering. Field Notes #1 to #3 form the May 2026
inference and verification economics sequence; Field Note #4 extends the
archive into AI-system failure analysis; Field Note #5 opens a
daily-review cadence of fresh external work.
Note on register. Field notes, not peer-reviewed
research; each piece synthesizes published literature and adds an
analytical decomposition. Original measurement is forthcoming. The
original research papers live at /papers.
A daily field note on Ma, Afzal, Eitzinger, and Wellein (arXiv:2605.11999). Across GQA, Multi-head Latent Attention, Gated DeltaNet, and Mamba2 on NVIDIA H200, autoregressive decode draws only 137 to 300 W on a 700 W GPU and no power cap ever triggers. The cap is above the natural ceiling of a memory-bound workload that saturates HBM bandwidth rather than compute. SM clock locking is the lever actually on the critical path and Pareto-dominates power capping, recovering up to 32% of decode energy at minimal throughput loss. The paper identifies three architecture-dependent DVFS behavioral classes and reports a prefill-decode energy crossover that halves total request energy relative to GQA at production batch sizes. The economic consequence is a tightened decode-cost term in Cost-correct and a shift in the inference-frontier threshold in favor of memory-efficient attention replacements.
A daily field note on Gao, Zhao, Muhtar et al.'s ROSE. Cooperative elasticity for agentic RL rollouts on idle serving GPUs. Why the rollout-cost term in Cost-correct can be priced at the marginal-of-idle rate, and what that does to the inference-frontier threshold.
A daily field note on Mei, Li, Chen, Pan, Wu, Miao, Jia, and Rashmi's Coral. Cost-efficient multi-LLM serving over heterogeneous cloud GPUs. Why the fragmentation of the LLM market and the heterogeneity of GPU supply make joint allocation the binding cost lever.
A daily field note on Lai, Feng, Teh, and Miao's VHG. Three-party setter-solver-verifier self-play. Why the verifier's job in the production lifecycle just expanded from two places to three.
A field note on NVIDIA's Ising-Decoding release. Why the AI pre-decoder paired with correlated PyMatching stops improving logical error rate at distance 17 and above, and what to do about it.