Warden
The codebase custodian
The problem
Every engineering team carries an invisible second job: the custody of its own codebase. Issue triage, bug reproduction, dependency patching, flaky-test forensics, the long tail of "known, shallow, and nobody's priority." It is the work that makes everything else possible, and it loses the prioritization argument every single sprint.
Coding agents made it cheap to generate changes. They did not make it cheap to trust them. A pull request without a reproduction, a failing-then-passing test, and a scoped diff is not help — it's review burden wearing a helpful expression.
The bet
Warden is built on a simple inversion: the agent's job is not to write code, it's to produce evidence. Code is a byproduct.
For every piece of work it takes on, Warden assembles a case file:
- Reproduce. The bug is confirmed in a disposable sandbox, with a minimal failing test checked in before any fix is attempted.
- Fix narrowly. The smallest diff that turns the test green. No drive-by refactors, no opportunistic cleanup.
- Prove. Full suite, lints, type checks, and a risk note: what was touched, what could regress, what wasn't verified.
- Stand for review. The PR opens with the case file attached. A reviewer reads evidence, not vibes.
The same loop covers dependency custody — supply-chain-aware updates that quarantine new releases, read changelogs and diffs, and ship upgrades with proof that nothing observable changed.
What makes it different
Most autonomous-coding products chase breadth: more languages, bigger tasks, flashier demos. Warden chases custody — the narrow, deep set of maintenance work where correctness is checkable and autonomy is therefore provable. Its autonomy budget is explicit: actions are tiered, every tier is gated on its own track record, and the gate moves only when the eval record says it has earned the move.
Where it stands
Warden runs against the lab's own repositories today — this site included — and is in development toward a private beta with a small set of teams who maintain serious production codebases.