The Inference Stack in 2026.
A field note on token economics, runtime systems, model architecture, and the stack changes behind public LLM API price compression.
Topic
Autonomous behavior under compute, power, sensing, latency, and deployment constraints.
Robotics and edge systems force models to confront physics, not just benchmarks.
A field note on token economics, runtime systems, model architecture, and the stack changes behind public LLM API price compression.
A post-training quantization method that protects high-signal weights using activation statistics.
Navigation under unreliable or unavailable satellite positioning, using onboard sensing and inference.
The constrained compute layer for running inference close to sensors, robots, drones, and devices.
Results are local, static, and index the same public surfaces exposed to crawlers.