An evaluation protocol that measures whether verifier-guided reasoning transfers from the training verifier to held-out verifiers, external tools, and human-validated outcomes.
A verifier that improves in-domain acceptance can still raise exploit incidence out of distribution. Transfer auditing is the missing bridge between RLVR papers and agent reliability.
A research paper on verifier failure under RLVR and tool agents. It defines the exploit tax: the cost paid when verifier-guided reasoning increases local acceptance faster than true transferred success.
Archive command
Search the public field-note archive.
Results are local, static, and index the same public surfaces exposed to crawlers.