Field Note May 17, 2026 published
A daily field note on Ma, Afzal, Eitzinger, and Wellein. Power capping does not bite in memory-bound LLM decode on NVIDIA H200. SM clock locking recovers up to 32% of decode energy. Why the standard energy lever moves the wrong knob, and what that does to the decode-cost term in Cost-correct.
Open PDF Raw source
Paper May 16, 2026 published
Research Paper #1 in the AI systems engineering wedge. A closed-form decomposition of cost per SLO-compliant served token into a prefill term, a decode term, and a KV-transfer tax. Re-derives published throughput from five 2023–2025 systems papers (PagedAttention, Sarathi-Serve, DistServe, Splitwise, Mooncake) into a common frame, plots the first cross-system Pareto frontier under explicit p99 TTFT and p99 TPOT contracts, and solves the break-even surface between colocated and disaggregated architectures. The frontier partitions cleanly.
Open PDF Raw source
Field Note May 16, 2026 published
A daily field note on Gao, Zhao, Muhtar et al.'s ROSE. Cooperative elasticity for agentic RL rollouts on idle serving GPUs. Why the rollout-cost term in Cost-correct can be priced at the marginal-of-idle rate, and what that does to the inference-frontier threshold.
Open PDF Raw source
Paper May 15, 2026 published
Research Paper #2 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when the marginal compute dollar reduces cost-per-correct-answer faster on the inference channel than on the training channel. Calibrated against rStar-Math, DeepSeek-R1, and test-time-compute curves; matches the observed frontier-vs-commodity market split.
Open PDF Raw source
Paper May 15, 2026 published
Research Paper #3 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when conditioning inference compute on a noisy difficulty estimate reduces cost-per-correct-answer: routing pays iff κ·Δ > γ, where κ is classifier calibration, Δ is workload heterogeneity, and γ is classifier overhead. Unifies five published patterns (speculative decoding, cascades, adaptive self-consistency, complexity-aware exploration, early exit) as one allocation rule, and calibrates against six deployed systems with every operating point on the positive side of the threshold.
Open PDF Raw source
Paper May 11, 2026 published
Research Paper #2 in the verification-economics wedge. Per-verifier strictly proper elicitation does not compose. Pipeline miscalibration under any monotone Boolean composition rule equals the within-instance verifier-disagreement covariance exactly. A joint scoring-rule mechanism on the cross-product report space restores DSIC and minimax-optimal regret of order sqrt((log K_1 + log K_2) / N). Per-component procurement records are insufficient evidence under the August 2026 EU AI Act high-risk obligations on composed pipelines.
Open PDF Raw source
Field Note May 11, 2026 published
A daily field note on Mei, Li, Chen, Pan, Wu, Miao, Jia, and Rashmi's Coral. Cost-efficient multi-LLM serving over heterogeneous cloud GPUs. Why the fragmentation of the LLM market and the heterogeneity of GPU supply make joint allocation the binding cost lever.
Open PDF Raw source
Field Note May 10, 2026 published
A daily field note on Lai, Feng, Teh, and Miao's VHG. Three-party setter-solver-verifier self-play. Why the verifier's job in the production lifecycle just expanded from two places to three.
Open PDF Raw source
Paper May 10, 2026 published
Original research paper. Posted-price markets for verification-as-a-service collapse to the worst verifier under unobservable quality. A scoring-rule mechanism on adversarially constructed grounded probes is dominant-strategy incentive-compatible, with matching minimax regret bounds of order sqrt(log K / N).
Open PDF Raw source
Field Note May 7, 2026 published
A field note on NVIDIA's Ising-Decoding release. Why the AI pre-decoder paired with correlated PyMatching stops improving logical error rate at distance 17 and above, and what to do about it.
Open PDF Raw source
Field Note May 6, 2026 published
A field note showing why verifier accept rate can dominate the other levers in cost-per-correct-answer economics.
Open PDF Raw source
Field Note May 6, 2026 published
A field note on why the operational unit of LLM inference economics is shifting from cost-per-token to cost-per-correct-answer.
Open PDF Raw source
Field Note May 3, 2026 published
A field note on token economics, runtime systems, model architecture, and the stack changes behind public LLM API price compression.
Open PDF Raw source
Definition Undated published
A post-training quantization method that protects high-signal weights using activation statistics.
Open
Definition Undated published
The constrained compute layer for running inference close to sensors, robots, drones, and devices.
Open
Definition Undated published
Navigation under unreliable or unavailable satellite positioning, using onboard sensing and inference.
Open
Definition Undated published
A family of sequence models that replace quadratic attention with selective state-space mechanisms.
Open
Definition Undated published
A decoding-time acceleration technique that drafts tokens with a smaller model and verifies them with a larger model.
Open
Definition Undated published
The study of cost-per-correct-answer and the verifier as a cost-and-value lever in reasoning systems.
Open