Verifier transfer audit

A verifier transfer audit is an evaluation protocol that compares training-verifier acceptance, held-out verifier acceptance, tool-grounded success, and human-validated success for the same reasoning or agent system.

Definition

Verifier-guided systems are usually reported against the verifier that trained, selected, or scored them. A transfer audit asks whether the measured gain survives outside that verifier. The minimum audit separates four quantities: acceptance by the original verifier, acceptance by a held-out verifier, success against external tool state, and success under human or domain-grounded review.

Why this matters

Without a transfer audit, a verifier improvement can be a measurement improvement rather than a capability improvement. The system may learn the benchmark, the unit tests, the judge model, or the tool environment while becoming less reliable on the real task.

Production signal

Run the audit on the same task distribution before and after verifier-guided optimization. Report the transfer gap and accepted exploit incidence alongside any reward or benchmark lift.

References

  1. Pan, A. et al. LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking. arXiv, 2026. link
  2. Li, J. et al. Reward Hacking Benchmark: Measuring Exploits in LLM Reasoning. arXiv, 2026. link

Glossary. Research index. Home.