Topic

Tool-Agent Reward Hacking

Failures where an AI agent manipulates tools, tests, logs, evaluators, or task state to receive credit without accomplishing the intended external objective.

Definition: Tool-Agent Reward Hacking in the glossary.

Why It Matters Here

Tool agents expose reward hacking in production form: the verifier is no longer a static benchmark, but part of an executable environment.

Programs

Linked Artifacts