Archive

Complete Archive

A single index for public artifacts. Chronology matters less than durability, but this page keeps the corpus inspectable from one place.

13papers

6definitions

0systems

Public Corpus

Paper May 16, 2026 published

Disaggregated or Colocated? The Cost-Frontier of LLM Serving Under SLO Contracts.

Research Paper #1 in the AI systems engineering wedge. A closed-form decomposition of cost per SLO-compliant served token into a prefill term, a decode term, and a KV-transfer tax. Re-derives published throughput from five 2023–2025 systems papers (PagedAttention, Sarathi-Serve, DistServe, Splitwise, Mooncake) into a common frame, plots the first cross-system Pareto frontier under explicit p99 TTFT and p99 TPOT contracts, and solves the break-even surface between colocated and disaggregated architectures. The frontier partitions cleanly.

Paper May 15, 2026 published

The Inference-Time Compute Frontier. A Cost-Correct Threshold for Training Versus Test-Time Allocation.

Research Paper #2 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when the marginal compute dollar reduces cost-per-correct-answer faster on the inference channel than on the training channel. Calibrated against rStar-Math, DeepSeek-R1, and test-time-compute curves; matches the observed frontier-vs-commodity market split.

Paper May 15, 2026 published

The Routing Premium. An Economic Threshold for Difficulty-Conditional Inference Compute.

Research Paper #3 in the inference-economics wedge. Derives a closed-form threshold under the Cost-correct decomposition for when conditioning inference compute on a noisy difficulty estimate reduces cost-per-correct-answer: routing pays iff κ·Δ > γ, where κ is classifier calibration, Δ is workload heterogeneity, and γ is classifier overhead. Unifies five published patterns (speculative decoding, cascades, adaptive self-consistency, complexity-aware exploration, early exit) as one allocation rule, and calibrates against six deployed systems with every operating point on the positive side of the threshold.

Paper May 11, 2026 published

Calibration Drift Under Verifier Composition. A Joint Scoring-Rule Mechanism for Pipeline-Level Cost-Correct Minimization.

Research Paper #2 in the verification-economics wedge. Per-verifier strictly proper elicitation does not compose. Pipeline miscalibration under any monotone Boolean composition rule equals the within-instance verifier-disagreement covariance exactly. A joint scoring-rule mechanism on the cross-product report space restores DSIC and minimax-optimal regret of order sqrt((log K_1 + log K_2) / N). Per-component procurement records are insufficient evidence under the August 2026 EU AI Act high-risk obligations on composed pipelines.

Definition Undated published

AWQ Quantization

A post-training quantization method that protects high-signal weights using activation statistics.

Definition Undated published

Edge AI Silicon

The constrained compute layer for running inference close to sensors, robots, drones, and devices.

Definition Undated published

GPS-Denied Navigation

Navigation under unreliable or unavailable satellite positioning, using onboard sensing and inference.

Definition Undated published

Mamba And State-Space Models

A family of sequence models that replace quadratic attention with selective state-space mechanisms.

Definition Undated published

Speculative Decoding

A decoding-time acceleration technique that drafts tokens with a smaller model and verifies them with a larger model.

Definition Undated published

Verification Economics

The study of cost-per-correct-answer and the verifier as a cost-and-value lever in reasoning systems.