5 min read
The Verifier's Dividend
How much task horizon does a checker buy an agent? A closed-form answer, three interactive instruments, and one uncomfortable thing the METR data has been saying all along.
agentsreliabilitymath
Field notes
What we learn building agentic systems, written down while it's still sharp. Eval design, failure taxonomies, tooling — and the occasional detour into the weeds.