Verifier transfer audit #
A verifier transfer audit is an evaluation protocol that compares training-verifier acceptance, held-out verifier acceptance, tool-grounded success, and human-validated success for the same reasoning or agent system.
Definition #
Verifier-guided systems are usually reported against the verifier that trained, selected, or scored them. A transfer audit asks whether the measured gain survives outside that verifier. The minimum audit separates four quantities: acceptance by the original verifier, acceptance by a held-out verifier, success against external tool state, and success under human or domain-grounded review.
Why this matters #
Without a transfer audit, a verifier improvement can be a measurement improvement rather than a capability improvement. The system may learn the benchmark, the unit tests, the judge model, or the tool environment while becoming less reliable on the real task.
Production signal #
Run the audit on the same task distribution before and after verifier-guided optimization. Report the transfer gap and accepted exploit incidence alongside any reward or benchmark lift.
Related #
- RLVR verifier failure
- Tool-agent reward hacking
- Verifier transfer coefficient
- Accepted exploit incidence
- The Exploit Tax
References #
- Pan, A. et al. LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking. arXiv, 2026. link
- Li, J. et al. Reward Hacking Benchmark: Measuring Exploits in LLM Reasoning. arXiv, 2026. link