On March 30, 2026, I ran a twenty-minute QA audit on software I had been building for six weeks.
It found over a hundred critical bugs. Broken mutations. Nil UUIDs going to production. Stored XSS in input fields. Timezone corruption on every write. Dead routes that returned 200s and did nothing. An IDOR that let any user read any other organization’s data by guessing a URL.
The test suite had four thousand, eight hundred passing tests. None of them had caught any of it.
I deleted all 4,800 tests that weekend.
Why the tests were fake
The technical diagnosis took an afternoon. Every test mocked everything below it. vi.mock("./repository", () => ({ fetch: () => ({ id: "abc" }) })) set up a canned return value, the test called the procedure under test, the procedure called the mocked repository, got the canned value back, and the test then asserted that the canned value matched itself. The production code path was never executed. If the entire repository had been deleted, the tests would still pass.
This is the circular-mock pattern and it is produced reliably by AI coding agents when asked to write tests. The agent reads the procedure under test, sees it depends on a repository, mocks the repository to return something plausible, and writes an assertion against the mock’s return value. The resulting test is a tautology dressed as verification. It never fails because it cannot fail — the only code it exercises is its own setup.
The reason it passes human review is that the test file looks correct. The imports are right. The describe and it blocks are named reasonably. The assertions use real matchers. Only a reader who traces the mock dependency back to production code notices that nothing real is being tested. Most reviewers don’t do that trace, especially under time pressure.
The diagnostic that catches this at write-time
Two ast-grep rules, codified over the following week, closed the class of failure.
id: require-caller-in-tests
language: TypeScript
rule:
all:
- pattern: it($DESC, $FN)
- not:
has:
any:
- pattern: createCaller($$$)
- pattern: createMockCaller($$$)
message: |
Every it() block must call createCaller(). A test that mocks everything
and asserts on the mock is a tautology. See fake-tests-weekend incident.
The first rule, require-caller-in-tests, scans every it() block and fails if the block does not contain a call to createCaller(...) or equivalent — the tRPC helper that produces a real, ctx-bound caller against the actual router. A test that doesn’t construct a caller is not testing production code.
The second rule, require-router-import-in-tests, fails if a test file does not import a tRPC router. A test file with no router import cannot be testing the procedures in any router, which means it’s testing mocks only. Both rules run at gate-time via pnpm gate. Neither can be silenced short of disabling the gate.
What the incident produced
The 4,800 deleted tests were replaced over three weeks by approximately 1,800 tests that followed the new constraints. The new suite found eight production bugs in its first week — bugs that the old suite had missed for months. The delta between “tests that run production code” and “tests that run mocks” is not gradual. It is binary. Either the test exercises the real path or it doesn’t.
The broader shift began that weekend. Spec-driven development — the methodology I had been following — places the specification alongside the code as peer artifacts, to be kept in sync through review. But review decays. Agents drift. A spec that is not mechanically coupled to the build gate does not survive contact with a year of commits. The methodology that replaced it is documented in the manifesto: domain rules as machine-testable specifications, source code with zero authority, mechanical enforcement at every boundary where drift could occur.
The fake-tests incident is the kind of failure the new methodology exists to prevent. It did not prevent this one — it was produced by this one. Every mechanical rule in the system is a frozen incident. This is its frame.