monorepo drift log
Drift Log for tracking architectural promises vs reality — milestone close gates, resolution milestones, examples.
Drift Log: How a Polyglot Monorepo Catches Architectural Lies Before They Compound
Last quarter I grepped our monorepo for services/agent_pool.py. We'd promised to delete it two milestones earlier. There it was \u2014 four importers, eight sockets at module import, completely untouched. Nobody had lied. The deletion just slipped at the deadline, and no mechanism noticed.
I've watched this play out three times in two years. The architecture document keeps making promises. The repo keeps falling short. Code review can't catch the gap, because reviewers look at the diff, and the drift lives in what the diff didn't touch.
The fix turned out to be a small markdown table \u2014 a drift log \u2014 wired into a milestone close gate. This article walks through the shape of the log, where it sits in a polyglot monorepo, the gate that enforces it, and three patterns it catches that nothing else does.
Why architecture docs lie by default
Here is the uncomfortable truth: a target architecture document starts decaying the moment a milestone ships. It is a snapshot of intent; the repo is a snapshot of reality, and the two diverge on every subsequent milestone unless something forces them back together.
Three drift sources show up over and over:
- Scope cuts mid-milestone. A milestone promised to delete three legacy modules, but the deadline approached and the operator cut the deletion scope to ship the new feature on time. The new feature lands. The legacy modules stay. Nothing in the merge commit says "we cut scope."
- Cross-milestone dependencies. Milestone A promised to move file X to layer L3. Milestone B promised to delete file Y, which imports X. A slipped, B shipped anyway by adding a workaround import. Now X is in two layers and Y has a workaround that nobody scheduled to remove.
- Lazy deprecation paths. A new component shipped. The old one was "kept for one milestone for backwards compatibility." That milestone passed two quarters ago.
None of these show up in code review. Reviewers look at the diff. The diff is correct. The drift is in what the diff did NOT touch.
Anatomy of a useful drift log
Why six columns and not five or seven? Six is the number where any row carries enough context to act on without re-reading the original milestone notes \u2014 and any more turns the log into a form that nobody fills out.
| Discovered | Drifted milestone | Expected | Actual | Resolution milestone | Status |
|---|---|---|---|---|---|
| 2026-04-28 | Y2 | services/agent_pool.py deleted | Still alive, 4 importers | Y4-cleanup | OPEN |
| 2026-04-28 | Y2 | routes/memory.py uses canonical handler | BUG-ROUTE bypass added for hotfix | Y3-routes-fix | OPEN |
| 2026-05-06 | S4 | Lighthouse baselines captured | Smoke test gated by operator | S4-resume | PAUSED |
Six columns, six rules:
- Discovered is the date the drift was noticed, NOT the date the original milestone shipped. The age of a row is a signal \u2014 a six-month-old OPEN row is a different problem than a six-day-old one.
- Drifted milestone is the milestone that made the promise. That lets a future reader open the right plan document to understand what was originally intended.
- Expected is the target architecture state, copied verbatim from the original plan or the target architecture document. No paraphrasing. If you can't find the exact sentence, the drift isn't real \u2014 it's a vibe.
- Actual is what the repo currently does, with file paths or component names. "Still has the old pool" is useless. "
services/agent_pool.pystill allocates 8 sockets at module import" is useful. - Resolution milestone is the milestone where the drift will be fixed. Every OPEN row must point to a real milestone in the roadmap, even if that milestone is "Y4-cleanup" and contains nothing but drift fixes. A row without a resolution milestone is a wishlist, not a debt.
- Status is one of
OPEN,PAUSED,RESOLVED. Resolved rows stay in the log forever \u2014 they're the audit trail.
The log lives at the bottom of your ROADMAP.md (or wherever your milestone DAG lives). One file, one source of truth.
Where the log fits in a polyglot monorepo
Last week a colleague asked me why we couldn't "just use the Rust linter" to catch the drift our Python services were causing \u2014 and that question is exactly the trap. Each language enforces architecture through its own machinery: Rust uses pub/pub(crate), Python relies on __init__.py imports plus external linters, TypeScript leans on tsconfig.json paths. None of these talk to each other, and the drift always lives in the seam between them.
Here's the structure I've settled on for your_project/:
your_project/
docs/
architecture/
TARGET-ARCHITECTURE.md # the promises
COMPONENTS.md # the inventory
ROADMAP.md # milestone DAG + Drift Log (at the bottom)
LAYERS.md # per-component layer definitions
RULES.md # compact pre-flight checklist
workflows/
{service-name}/
PLAN.md # current milestone tasks
WORK.md # current session context
scripts/
check_arch.py # Python layer linter
check_layers.rs # Rust layer check (cargo xtask)
The drift log sits inside ROADMAP.md because the resolution milestone column has to link back to other rows of the roadmap. Splitting the log into its own file means every update touches two files instead of one, which is exactly the friction that kills logs.
TARGET-ARCHITECTURE.md and COMPONENTS.md are upstream of the log \u2014 they're what the log compares against. RULES.md is downstream \u2014 it codifies the gate that reads the log at milestone close.
The milestone close gate
A drift log without a gate is a wiki page. A gate without a log is a checklist that grows stale. The pair is what works.
The close gate is a single script that runs when a milestone is marked done. It has three jobs:
- Read the milestone's exit criteria from
ROADMAP.md. - Verify each exit criterion against the current repo state.
- For each unmet exit criterion, BLOCK the close unless the drift log has a row with that criterion in the
Expectedcolumn and a resolution milestone.
Here's a sketch in Python you can adapt. It assumes exit criteria are written as bullet points with a [VERIFY: <command>] suffix:
import re
import subprocess
from pathlib import Path
ROADMAP = Path("docs/architecture/ROADMAP.md")
DRIFT_PATTERN = re.compile(r"^\|\s*\d{4}-\d{2}-\d{2}\s*\|", re.MULTILINE)
def verify_milestone(milestone_id: str) -> list[str]:
text = ROADMAP.read_text()
section = extract_milestone_section(text, milestone_id)
exit_criteria = parse_exit_criteria(section)
failures = []
for criterion in exit_criteria:
verify_cmd = criterion["verify_cmd"]
result = subprocess.run(
verify_cmd, shell=True, capture_output=True, text=True
)
if result.returncode != 0:
failures.append(criterion["description"])
return failures
def check_drift_coverage(failures: list[str], drift_log: str) -> list[str]:
uncovered = []
for f in failures:
if f not in drift_log:
uncovered.append(f)
return uncovered
if __name__ == "__main__":
failures = verify_milestone("Y2")
drift_log = extract_drift_log(ROADMAP.read_text())
uncovered = check_drift_coverage(failures, drift_log)
if uncovered:
print("BLOCKED \u2014 drift not logged:")
for u in uncovered:
print(f" - {u}")
raise SystemExit(1)
print("Milestone close approved")
Two design choices worth calling out. First, exit criteria carry their own verify command \u2014 a [VERIFY: test -f path/to/deleted-file] next to the bullet. That avoids a brittle "interpret the English of the criterion" step and lets you mix grep, file existence checks, layer linters, and unit tests in the same gate. Second, the gate doesn't auto-resolve drift. It only checks whether unmet criteria have a corresponding OPEN row. The actual fix lives in the resolution milestone.
A useful comparison: a CI status check enforces that the code compiles. A drift log gate enforces that the architecture compiles. Both are pass/fail. Both block the merge (or the milestone close). The difference is the unit of work \u2014 CI runs on commits, the drift gate runs on milestones.
Three drift patterns the log catches
The log earns its keep on a small number of recurring patterns. Here are three I keep seeing across polyglot monorepos.
Pattern 1: the zombie module
A milestone promised to delete a module. The deletion was descoped. The module stayed importable. New code keeps importing it because grep finds it and the IDE autocompletes it.
The drift log catches this because the close gate runs python3 scripts/check_arch.py --no-zombie services/agent_pool.py as an exit criterion. The check fails on close. The drift gets a row. The resolution milestone is named cleanup-Y2 and contains exactly two tasks \u2014 delete the file and update the importers.
The reason the log is the right tool here, rather than "just delete it later," is that the zombie module is visible. Anyone landing in the codebase sees services/agent_pool.py and assumes it's the canonical path. They write new code against it. Six months later the deprecation is harder, not easier. The log keeps the deletion on someone's plate.
Pattern 2: the bypass that became permanent
A hotfix introduced a workaround. The workaround had a # TODO: remove after Y3 comment. Y3 shipped, the comment is still there, the workaround is now load-bearing because three other features were built on top of it assuming the workaround stayed.
This is the worst kind of drift because the original promise is buried in a code comment that nobody reads. A drift log row pulls that comment out into a tracked location:
| 2026-04-28 | Y2 | routes/memory.py routes through canonical handler |
BUG-ROUTE bypass at line 47 routes directly to legacy adapter |
Y3-routes-fix | OPEN |
The win isn't just visibility \u2014 it's the resolution milestone. Without one, the bypass slowly attracts callers. With one, the bypass has a removal date, and any new caller of the bypass forces an explicit conversation: "are you aware this is scheduled for removal in Y3, and are you adding work to that milestone?"
Pattern 3: the half-migrated layer
A milestone promised to move all files from services/ to a 7-layer structure (L1 domain, L2 ports, L3 application, L4 adapters, L5 infrastructure, L6 composition, L7 interfaces). The migration was 80% done at milestone close. The remaining 20% sits in four files that each have non-trivial dependencies.
The log catches this with a row per remaining file, all pointing at the same resolution milestone. A layer linter as the verify command makes that concrete:
def check_layer_compliance(file_path: str, expected_layer: str) -> bool:
actual_layer = classify_file(file_path)
if actual_layer == expected_layer:
return True
return False
The comparative question: should the milestone close be blocked at 80% complete? Most teams say no \u2014 the new architecture is better than the old even partially-migrated, and gating on 100% creates pressure to ship a rushed deletion. But the drift log makes the partial state explicit. Without the log, "the migration is done" becomes the assumed state by next quarter, and the remaining 20% becomes invisible debt.
Anti-patterns to avoid
A few ways the log goes wrong:
- Aspirational rows. A row that says "we should consider deleting X" with no resolution milestone is noise. Either commit to a resolution milestone or don't log the row. The log isn't a backlog.
- Vague Expected columns. "Cleaner architecture" isn't an expectation you can verify. The expectation has to be copy-pasted from a real architecture document. If you can't find the source sentence, the drift isn't real.
- Resolution milestones that never run. A row pointing at a milestone that's perpetually three quarters out is the same as no resolution milestone. The log audit (a 10-minute review during planning sessions) should flag rows whose resolution milestone has slipped twice.
- Auto-closing rows. Resist the temptation to write a script that auto-marks rows RESOLVED when the file disappears. The log is read by humans, not by the gate \u2014 the audit trail value is in the human-readable record of what drifted and why. A scripted close erases the why.
When you do not need a drift log
A drift log is overhead. It pays off in two specific situations.
The first is polyglot monorepos where no single linter sees the whole picture. A pure-Rust workspace can lean on cargo check, module visibility, and clippy to enforce architecture. A pure-TypeScript workspace can lean on tsconfig.json path restrictions and ESLint boundaries. When you mix three languages with three sets of conventions, the linters can't cover the cross-language seams, and the drift log fills the gap.
The second is repos with a documented target architecture. If your only architecture doc is the src/ tree as it exists today, there's nothing to drift FROM \u2014 your code IS the architecture. The log only makes sense once you've written down what the architecture should be, separately from what it is.
Below that bar, the log is bureaucracy without payoff. Stick to commit messages and code review.
The 10-minute monthly audit
The log itself is cheap. The discipline that keeps it useful is a 10-minute monthly review:
- Walk every
OPENrow. - For each row, check whether the resolution milestone is still on the roadmap and still scheduled within two quarters.
- If the milestone has slipped or been removed, the row gets either a new resolution milestone or it gets marked WONTFIX with a one-sentence rationale.
A WONTFIX row isn't a failure \u2014 it's an explicit decision to accept the drift forever. That's a different state from OPEN, and it's the right state for some drift. Some legacy code is genuinely fine to keep around. Calling it out as WONTFIX prevents the row from haunting future audits.
The audit itself takes longer to schedule than to run. Pin it to the same calendar as your monthly retrospective and it disappears into existing process.
What you ship after one quarter
A polyglot monorepo that's been running a drift log for one quarter has three artifacts the codebase didn't have before:
- A list of every architectural promise that was deferred, with dates and resolution milestones
- A history of which milestones consistently produce drift (usually 1-2 milestones do, the rest are clean)
- A short set of WONTFIX rows that document deliberate exceptions to the target architecture
The first is the obvious win. The third is the underrated one. A WONTFIX row is the only place in your codebase where you can say "we know this violates the architecture, here's why, and we're not fixing it." That single sentence saves the next engineer a week of trying to "clean up the inconsistency."
The drift log isn't the architecture. The architecture is the code. The log is the receipt that says you noticed when the code stopped matching the documents \u2014 and what you intend to do about it.
References: