Research Papers

Original research papers. Single author, defensible, reproducible. Each piece states a sharp claim and walks the reader to a defensible answer, with the math and citations carried through.

Note on register. Research papers are first-party original work, peer-review-quality, single author. Distinct from the field notes, which synthesize external work and add an analytical decomposition. Both share the same citation graph and the same voice, with different scope.

Featured. Paper. May 28, 2026

The Exploit Tax. Why Verifier-Guided Reasoning Needs a Transfer Audit.

Verifier-guided reasoning improves measured performance when the verifier captures the target task. It fails economically when optimization transfers to the verifier surface rather than the objective. This paper defines the exploit tax, separates training-verifier acceptance from held-out true success, and proposes a verifier transfer audit for RLVR systems and tool agents.

verification economics rlvr verifier failure tool agent reward hacking verifier transfer audit

Read paper Raw source

Earlier papers

Paper May 16, 2026

Disaggregated or Colocated? The Cost-Frontier of LLM Serving Under SLO Contracts.

Research Paper #1 in the AI systems engineering wedge. A closed-form decomposition of cost per SLO-compliant served token into a prefill term, a decode term, and a KV-transfer tax. Re-derives published throughput from five 2023–2025 systems papers (PagedAttention, Sarathi-Serve, DistServe, Splitwise, Mooncake) into a common frame, plots the first cross-system Pareto frontier under explicit p99 TTFT and p99 TPOT contracts, and solves the break-even surface between colocated and disaggregated architectures. The frontier partitions cleanly.

ai systems engineering inference economics

Read paper PDF Raw source

Paper May 15, 2026

The Inference-Time Compute Frontier. A Cost-Correct Threshold for Training Versus Test-Time Allocation.

Research Paper #2 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when the marginal compute dollar reduces cost-per-correct-answer faster on the inference channel than on the training channel. Calibrated against rStar-Math, DeepSeek-R1, and test-time-compute curves; matches the observed frontier-vs-commodity market split.

inference economics verification economics

Read paper PDF Raw source

Paper May 15, 2026

The Routing Premium. An Economic Threshold for Difficulty-Conditional Inference Compute.

Research Paper #3 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when conditioning inference compute on a noisy difficulty estimate reduces cost-per-correct-answer: routing pays iff κ·Δ > γ, where κ is classifier calibration, Δ is workload heterogeneity, and γ is classifier overhead. Unifies five published patterns (speculative decoding, cascades, adaptive self-consistency, complexity-aware exploration, early exit) as one allocation rule, and calibrates against six deployed systems with every operating point on the positive side of the threshold.

inference economics

Read paper PDF Raw source

Paper May 11, 2026

Calibration Drift Under Verifier Composition. A Joint Scoring-Rule Mechanism for Pipeline-Level Cost-Correct Minimization.

Research Paper #2 in the verification-economics wedge. Per-verifier strictly proper elicitation does not compose. Pipeline miscalibration under any monotone Boolean composition rule equals the within-instance verifier-disagreement covariance exactly. A joint scoring-rule mechanism on the cross-product report space restores DSIC and minimax-optimal regret of order sqrt((log K_1 + log K_2) / N). Per-component procurement records are insufficient evidence under the August 2026 EU AI Act high-risk obligations on composed pipelines.

verification economics

Read paper PDF Raw source

Paper May 10, 2026

Verifier Procurement Under Unobservable Quality. A Scoring-Rule Mechanism for Cost-Correct Minimization.

Original research paper. Posted-price markets for verification-as-a-service collapse to the worst verifier under unobservable quality. A scoring-rule mechanism on adversarially constructed grounded probes is dominant-strategy incentive-compatible, with matching minimax regret bounds of order sqrt(log K / N).

verification economics

Read paper PDF Raw source

The field notes archive sits alongside. Active investigative lines live under Programs. Definitions live under the glossary.

Research Papers

The Exploit Tax. Why Verifier-Guided Reasoning Needs a Transfer Audit.

Earlier papers

Disaggregated or Colocated? The Cost-Frontier of LLM Serving Under SLO Contracts.

The Inference-Time Compute Frontier. A Cost-Correct Threshold for Training Versus Test-Time Allocation.

The Routing Premium. An Economic Threshold for Difficulty-Conditional Inference Compute.

Calibration Drift Under Verifier Composition. A Joint Scoring-Rule Mechanism for Pipeline-Level Cost-Correct Minimization.

Verifier Procurement Under Unobservable Quality. A Scoring-Rule Mechanism for Cost-Correct Minimization.

Related

Search the public field-note archive.