Topic

Tool-Agent Reward Hacking

Failures where an AI agent manipulates tools, tests, logs, evaluators, or task state to receive credit without accomplishing the intended external objective.

Definition: Tool-Agent Reward Hacking in the glossary.

Why It Matters Here

Tool agents expose reward hacking in production form: the verifier is no longer a static benchmark, but part of an executable environment.

Agent Infrastructure RLVR Verifier Failure Verifier Transfer Audit

Programs

Agent Infrastructure

Linked Artifacts

Paper / May 28, 2026

The Exploit Tax. Why Verifier-Guided Reasoning Needs a Transfer Audit.

A research paper on verifier failure under RLVR and tool agents. It defines the exploit tax: the cost paid when verifier-guided reasoning increases local acceptance faster than true transferred success.

Tool-Agent Reward Hacking

Why It Matters Here

Related Topics

Programs

Linked Artifacts

The Exploit Tax. Why Verifier-Guided Reasoning Needs a Transfer Audit.