monorepo.
monorepo59 min read

Docker Multistage Build Cache Uv Bun Cargo

If you maintain a polyglot repo where Rust, Python, and TypeScript live side by side, you have probably watched a Docker build spend four minutes recompiling the same crate graph, re-downloading the same wheels, and re-fetching the same node-style packages that did not change since yesterday. The pain compounds in CI: every push triggers a cold pull on a fresh runner, every PR rebuild looks identical to the last one, and the cache layer you carefully crafted gets invalidated the moment someone touches a single source file. The naive Dockerfile that worked when the repo had one language no longer scales when three toolchains fight over the same layer cache.

This article walks through turning that slow, repetitive build into a multistage Dockerfile that uses BuildKit cache mounts, lockfile-first copies, and a registry-backed GitHub Actions cache to keep warm builds in the seconds-not-minutes range. You will end with a working polyglot skeleton that exercises uv for Python, bun for TypeScript, and cargo workspaces for Rust, plus a Dockerfile that splits dependency resolution into per-toolchain stages, mounts the cargo registry and target directory, caches the uv wheel store and the bun install directory, and finishes with a slim runtime image running as a non-root user behind tini and a healthcheck. The final piece is a GitHub Actions workflow that pushes BuildKit cache layers to a registry so every runner, not just the lucky warm one, benefits from previous builds.

This is for engineers maintaining mixed-language monorepos who have outgrown a single-stage Dockerfile but have not yet wired BuildKit cache mounts end to end. By the end you will have a reproducible build recipe, a measurable cold-versus-warm baseline, and a CI pipeline that stops paying for the same work twice.

Step 1: Anchoring a Polyglot Monorepo with uv, bun, and cargo Workspaces

Before any of the Docker multistage caching tricks in later steps can pay off, the repository itself needs a shape that three different package managers will recognize as their own. This step bootstraps a single git repository where uv, bun, and cargo each see a workspace root they can drive, and where each toolchain owns exactly one starter member service.

We start from an empty git repo and finish with three running test suites — pytest, bun test, and cargo test --workspace — all green. Nothing here depends on Docker yet, but the layout we pick now is what the multistage Dockerfile will key off in step 2 to slice cache layers cleanly per language.

Setup

The end state of this step is a repository with three workspace anchors at the root and three member services nested under services/. We create the following files:

  • pyproject.toml — declares a [tool.uv.workspace] with services/api as the lone Python member, plus a dev group that pins pytest.
  • package.json — declares a private workspaces array containing services/web, so bun treats the root as its workspace anchor.
  • Cargo.toml — declares a [workspace] with services/edge as the only Rust member and pins edition = "2021" plus rust-version = "1.75" at the workspace level.
  • services/api/ — a uv-managed Python package built with hatchling, exposing a tiny greet/workspace_anchor API and its pytest suite.
  • services/web/ — a TypeScript module run by bun test, mirroring the same greet/workspaceAnchor shape so each language stays comparable.
  • services/edge/ — a Rust library crate with inline #[cfg(test)] tests for the same two functions.
  • .gitignore and README.md at the root to keep build artefacts out of git and document the layout.

The only third-party runtime dependency in this step is pytest>=8.0 (under the uv dev group). The TypeScript side relies on bun:test, which ships with the bun runtime, and the Rust side uses the built-in test harness — both stay dependency-free on purpose.

Implementation

The Python workspace root is the smallest interesting piece. package = false tells uv this pyproject.toml is a workspace anchor rather than an installable package, and the [tool.uv.workspace] table lists the members uv should treat as part of the same lockfile universe.

[project]
name = "polyglot-monorepo"
version = "0.1.0"
description = "Polyglot monorepo skeleton driven by uv, bun, and cargo workspaces."
requires-python = ">=3.10"

[tool.uv]
package = false

[tool.uv.workspace]
members = ["services/api"]

[tool.uv.sources]
api = { workspace = true }

[dependency-groups]
dev = ["pytest>=8.0"]

[tool.pytest.ini_options]
testpaths = ["services/api/tests"]
pythonpath = ["services/api/src"]
addopts = "-ra -q"

Two non-obvious choices live here. First, we centralise [tool.pytest.ini_options] at the root rather than inside the member so a developer can simply run pytest from the repo root without cd-ing. Second, the pythonpath injection avoids requiring an editable install on every uv sync — useful when later steps rebuild the image often and want pytest to discover the source tree directly.

The TypeScript anchor is similarly small. bun reads the workspaces array exactly the way npm and yarn do, and the test script forwards to the per-member test runner so the root command is uniform across languages.

{
  "name": "polyglot-monorepo",
  "version": "0.1.0",
  "description": "Polyglot monorepo skeleton driven by uv, bun, and cargo workspaces.",
  "private": true,
  "workspaces": ["services/web"],
  "scripts": {
    "test": "bun test services/web"
  }
}

Marking the root "private": true is important: it tells bun (and any future npm publish accident) that this manifest is a workspace shell, not a publishable package. The member at services/web/package.json keeps "type": "module" and points its module field at src/index.ts, which lets bun test resolve imports without a bundler step.

The Rust anchor pulls workspace-wide fields up into [workspace.package] so member crates can inherit them with version.workspace = true rather than restating each value.

[workspace]
members = ["services/edge"]
resolver = "2"

[workspace.package]
version = "0.1.0"
edition = "2021"
rust-version = "1.75"
license = "MIT"

[workspace.lints.clippy]
all = "warn"

Setting resolver = "2" is the modern default and matters once we add a second crate: it prevents accidental feature unification across [dev-dependencies] and [dependencies]. The shared [workspace.lints.clippy] block costs nothing now but means every future crate will start with clippy::all enabled — a small bit of forward-friction that keeps the project honest.

Each member service exposes the same two-function shape — greet(name) and an anchor accessor that returns the workspace tool name — so the three test suites are structurally identical. The Python side lives at services/api/src/api/greeting.py:

SERVICE_NAME = "api"
WORKSPACE_TOOL = "uv"


def greet(name: str) -> str:
    return f"hello, {name}, from the {SERVICE_NAME} service"


def workspace_anchor() -> str:
    return WORKSPACE_TOOL

The TypeScript mirror at services/web/src/index.ts keeps the same shape under camelCase conventions, and the Rust crate at services/edge/src/lib.rs keeps it under snake_case with #[must_use] annotations. Picking parallel shapes makes the later Docker build comparable: we will be able to point at one identical "tiny library + tiny test" payload per language when reasoning about layer reuse.

The .gitignore keeps every toolchain's build directory out of history (target/, node_modules/, __pycache__/, .pytest_cache/) so the next step's Docker context stays small.

Verification

With the three roots in place, each toolchain runs its own test suite from the repository root. The expected output of the full triplet is:

pytest && bun test && cargo test --workspace
============================= test session starts ==============================
collected 3 items

services/api/tests/test_greeting.py ...                                  [100%]

============================== 3 passed in 0.01s ===============================
bun test v1.x
services/web/tests/greeting.test.ts:
(pass) web service greeting > greet includes the target name
(pass) web service greeting > greet mentions the web service
(pass) web service greeting > workspace anchor identifies bun
 3 pass
 0 fail
 3 expect() calls
Ran 3 tests across 1 files.
   Compiling edge v0.1.0 (services/edge)
    Finished test [unoptimized + debuginfo] target(s) in 0.42s
     Running unittests src/lib.rs (target/debug/deps/edge-...)

running 3 tests
test tests::greet_includes_target_name ... ok
test tests::greet_mentions_edge_service ... ok
test tests::workspace_anchor_identifies_cargo ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Three suites, nine tests, zero failures. The fact that one shell line runs the full triplet — and that any one of them failing aborts the chain — is the contract step 2 builds on when we move the same commands into a Dockerfile.

What we built

We now have a single repository where three package managers each see a real workspace, not a vestigial config file. uv sync would resolve into the same lockfile as the api member; bun install would link the web member into the root node_modules/; cargo build --workspace would compile the edge crate against the workspace's shared toolchain pin.

Every member is the smallest possible meaningful unit: a tiny library, a handful of tests, and exactly enough manifest noise to be discoverable. That deliberate smallness matters because step 2 will treat each member as a cache-layer boundary in a multistage Dockerfile — and we want a clean signal, not a forest of incidental dependencies.

The repository also enforces two invariants we will lean on later. Every language has a single test command runnable from the repo root, and every language's build artefacts are gitignored — meaning the Docker build context will only contain source, not stale caches from a developer machine.

What this unlocks is the ability to start writing a Dockerfile that has a real polyglot project to compile, with a real test gate to satisfy. In step 2 we introduce the first multistage Dockerfile that compiles all three members and runs all three test suites inside the image.

Repository

The state of the code after this step: 6925497

Step 2: Baking a Naive Single-Stage Dockerfile and Recording the Cold/Warm Build Baseline

Step 1 left us with a repository that three package managers happily call their own — uv, bun, and cargo each see a workspace root with one tiny member service. The natural next move is to wrap that repository in a container, but before reaching for multistage tricks or BuildKit cache mounts we want a number to beat. This step writes the dumbest possible Dockerfile: one stage, one COPY . ., three install commands, and no cache discipline at all.

The deliverable is twofold. First, a Dockerfile at the repository root that produces a single image containing every toolchain plus the full source tree. Second, a small Python utility under tools/ that drives docker build twice — once cold, once warm — and serialises the timings to disk so subsequent steps can quote real before/after numbers rather than vibes.

Setup

The new files this step introduces live in three places. At the repo root we add a Dockerfile. Inside the existing tools/ collaboration zone we add tools/baseline.py (the timing harness), tools/dockerfile_shape.py (tiny structural lints that prove the Dockerfile is in fact single-stage), and a tools/tests/ folder with pytest coverage for both modules. We also extend the root pyproject.toml so pytest discovers tests under tools/tests and exposes tools/ on the import path:

[tool.pytest.ini_options]
testpaths = ["services/api/tests", "tools/tests"]
pythonpath = ["services/api/src", "tools"]
addopts = "-ra -q"

No new third-party dependencies are required. The baseline harness uses only subprocess, argparse, dataclasses, json, and time from the standard library. The Dockerfile inherits its toolchain installers from upstream — astral.sh/uv, bun.sh, and sh.rustup.rs — so the image itself stays dependency-pinned to whatever the public install scripts resolve at build time. That looseness is deliberate: this step's job is to be the worst-case baseline, not a hardened production image.

Implementation

The Dockerfile is intentionally short and intentionally wasteful. It starts from debian:12-slim, installs the OS packages each toolchain installer needs (curl, build-essential, unzip, git, ca-certificates), then runs three separate curl | sh lines for uv, bun, and rustup. Finally it copies the whole repository in one shot and runs every dependency-resolution command in a single RUN.

# syntax=docker/dockerfile:1.7
FROM debian:12-slim

ENV DEBIAN_FRONTEND=noninteractive
ENV PATH=/root/.cargo/bin:/root/.local/bin:/root/.bun/bin:$PATH

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates curl build-essential unzip git \
    && rm -rf /var/lib/apt/lists/*

RUN curl -LsSf https://astral.sh/uv/install.sh | sh
RUN curl -fsSL https://bun.sh/install | bash
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable

WORKDIR /workspace
COPY . .

RUN uv sync \
    && bun install \
    && cargo fetch

Every choice here is the wrong one for production, and that is the point. COPY . . before any dependency resolution means a one-character edit to services/api/src/api/greeting.py invalidates every downstream layer, including the three-minute cargo fetch. Combining uv sync, bun install, and cargo fetch into one RUN means any one of them failing trashes the work of the other two. There are no --mount=type=cache directives, no separate cargo-chef-style planner stage, and the apt cache is wiped without being remounted — exactly the things later steps will fix.

The timing harness in tools/baseline.py keeps the timing logic pure so it can be unit-tested without ever touching docker. The shell-out is injected as a Runner callable and the wall clock as a Timer callable; production code passes the real subprocess.run and time.monotonic, while tests pass fakes.

def measure_baseline(
    image_tag: str,
    context: str,
    *,
    runner: Runner = default_runner,
    timer: Timer = time.monotonic,
) -> list[BuildTiming]:
    cold_cmd = build_command(image_tag, context, no_cache=True)
    warm_cmd = build_command(image_tag, context, no_cache=False)
    cold = measure_one(cold_cmd, runner, timer, "cold")
    warm = measure_one(warm_cmd, runner, timer, "warm")
    return [cold, warm]

The split between build_command, measure_one, and measure_baseline exists so each piece can be tested in isolation. build_command confirms --no-cache is present for the cold run and absent for the warm run. measure_one validates that elapsed time is computed from the timer delta, not from the runner's own clock. measure_baseline ties the two together and locks in the ordering: cold first, then warm — because reversing them would hand the cold run a populated cache and silently zero out the gap we care about measuring.

The structural lints in tools/dockerfile_shape.py are eight short functions that read the Dockerfile, strip comments, and answer four boolean questions: does the file exist, does it have exactly one FROM, does it use exactly one COPY, and does it mention each of uv, bun, and rustup/cargo? These run under pytest so any future step that accidentally drifts the baseline Dockerfile back into single-stage shape will fail loudly. That guard matters more than it first appears: the entire pedagogical value of the next steps depends on this Dockerfile staying recognisable as "the naive one".

Verification

The full local check from the repository root is a single pytest invocation — the harness, the shape lints, and the step-1 service test all run together:

pytest
................                                                         [100%]
16 passed in 0.07s

For the timing run itself, the entry point is python tools/baseline.py --image polyglot:naive --context ., which writes a baseline.json and prints a two-line summary. A representative run on a laptop with a freshly pruned builder cache looks like this:

python tools/baseline.py --image polyglot:naive --context .
cold: 318.42s (exit 0)
warm: 4.17s (exit 0)

The cold-to-warm ratio is the headline number — roughly 76x on this hardware. That gap is almost entirely the three toolchain installer layers plus the unified uv sync && bun install && cargo fetch step. Both halves are exactly what the multistage refactor in later steps will attack.

What we built

We now have a containerised build of the polyglot monorepo that actually works end to end. Running docker build -t polyglot:naive . produces an image with uv, bun, and a stable Rust toolchain installed, every workspace member's dependencies fetched, and the source tree available at /workspace ready for the test commands from step 1.

Alongside that image, the repository now ships a reproducible way to measure how slow the build is. tools/baseline.py records cold and warm timings into a JSON file that lives next to the Dockerfile, and the pytest suite for the harness guarantees the measurement code itself does not silently regress. The lints in tools/dockerfile_shape.py pin the baseline Dockerfile to its naive shape so the comparison stays honest across future commits.

Two invariants are now load-bearing. First, the baseline Dockerfile has exactly one FROM and exactly one COPY — a property the test suite asserts, not just one we hope holds. Second, every measurement run captures both cold and warm times in the same execution, so we never compare cold-from-Monday against warm-from-Friday on a different machine state.

What this unlocks is the entire argument arc of the rest of the tutorial. From step 3 onward we can introduce one multistage technique at a time — separating dependency resolution from source copy, adding BuildKit --mount=type=cache for each package manager, factoring a per-language builder stage — and quote the cold/warm delta against this same baseline.json. Without this step, every later "X% faster" claim would be unfalsifiable.

Repository

The state of the code after this step: 813905c

Step 3: Carving the Dockerfile into Per-Toolchain Dependency Stages So One Source Edit Stops Invalidating Three Installers

Step 2 left us with a single-stage Dockerfile that bundles three toolchain installers and one giant COPY . . into one image, plus a baseline harness that proved the warm rebuild is fast only because nothing changed. The interesting question — and the one the rest of the tutorial is built around — is what happens when something does change. With the naive Dockerfile, editing a single Python source file invalidates the COPY . . layer, which invalidates the combined uv sync && bun install && cargo fetch layer, which means the next build re-resolves cargo fetch even though no Cargo.toml was touched. This step fixes the structural part of that problem without yet introducing any cache mounts.

The deliverable is a multistage Dockerfile with one shared base stage that installs the three toolchains exactly once, three sibling dependency stages (uv-deps, bun-deps, cargo-deps) that each copy only the manifest files their package manager actually reads, and a final runtime stage that pulls the resolved-deps directory out of each sibling. The shape lints from step 2 are extended so the new structure is a tested invariant rather than a hopeful comment.

Setup

No new third-party dependencies are introduced. The Dockerfile at the repository root is rewritten in place, and tools/dockerfile_shape.py grows three new helpers — stage_names, has_stage, and copy_from_sources — that the test suite needs to assert the multistage shape. The existing tests in tools/tests/test_dockerfile_shape.py are replaced with multistage-shaped ones: the old test_dockerfile_is_single_stage would now fail by design, so it is removed and a test_dockerfile_is_multistage plus per-stage assertions take its place.

Nothing changes in the services/ tree. The toolchain installers, the apt prerequisites, and the PATH exports keep their step-2 form — they just move from the body of the Dockerfile into the new base stage so each downstream stage starts from a single shared image rather than re-running three curl | sh lines. The baseline.py harness is untouched: it will keep producing comparable cold / warm numbers against the new Dockerfile in later steps, which is exactly the point of having pinned it in step 2.

Implementation

The new Dockerfile starts by naming the existing setup as a reusable stage. FROM debian:12-slim AS base does almost nothing at the file level but unlocks the structural split: every later stage can write FROM base AS … and pick up the toolchains for free without repeating the install lines.

# syntax=docker/dockerfile:1.7
FROM debian:12-slim AS base

ENV DEBIAN_FRONTEND=noninteractive
ENV PATH=/root/.cargo/bin:/root/.local/bin:/root/.bun/bin:$PATH

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates curl build-essential unzip git \
    && rm -rf /var/lib/apt/lists/*

RUN curl -LsSf https://astral.sh/uv/install.sh | sh
RUN curl -fsSL https://bun.sh/install | bash
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable

WORKDIR /workspace

The three sibling dependency stages are the heart of this step. Each one copies only the manifest files its package manager reads — never source code, never sibling toolchain manifests — and then runs its resolver. The blast radius of a source edit is now bounded by which manifest file changed, which is almost always "none of them".

FROM base AS uv-deps
COPY pyproject.toml ./
COPY services/api/pyproject.toml services/api/pyproject.toml
RUN uv sync --no-install-project --no-dev || uv sync --no-install-project

FROM base AS bun-deps
COPY package.json ./
COPY services/web/package.json services/web/package.json
RUN bun install --no-save

FROM base AS cargo-deps
COPY Cargo.toml Cargo.lock ./
COPY services/edge/Cargo.toml services/edge/Cargo.toml
RUN cargo fetch

Two details matter more than they look. The uv sync line is written as --no-install-project --no-dev with a fallback to --no-install-project because uv refuses to install a workspace project that has no source tree present yet — the || keeps the stage building before we have copied the package code. The bun install --no-save flag prevents bun from mutating package.json mid-build if a lockfile drift is detected; we want this stage to be a pure function of the manifests it copied in. cargo fetch is used instead of cargo build because at this point we still have no source — the goal is to populate ~/.cargo/registry, not to compile anything.

The runtime stage is what stitches the three sibling stages back into one image. It starts from the same base (so it inherits the toolchain binaries), then uses COPY --from=<stage> to pull each resolver's output directory across without re-running the resolver, and finally drops the full source tree in.

FROM base AS runtime
WORKDIR /workspace

COPY --from=uv-deps /workspace/.venv /workspace/.venv
COPY --from=bun-deps /workspace/node_modules /workspace/node_modules
COPY --from=cargo-deps /root/.cargo/registry /root/.cargo/registry

COPY . .

The structural lints in tools/dockerfile_shape.py are extended to make this shape a test, not a hope. stage_names parses FROM ... AS <name> lines, has_stage answers the obvious question, and copy_from_sources returns the list of stages referenced by any COPY --from= directive. The test suite then asserts the file has at least four stages, that base, uv-deps, bun-deps, cargo-deps, and runtime all exist by name, that every FROM carries an AS clause, and that the runtime stage COPY --from=s each of the three dependency siblings. The pedagogical value of "the dep stages are siblings, not a chain" depends on that last assertion holding across future commits.

Verification

The full local check is still the same single pytest invocation. The harness tests from step 2 still pass, and the new shape tests assert the multistage invariants:

pytest
......................                                                   [100%]
22 passed in 0.07s

Twenty-two passing tests covering both the step-2 baseline harness and the new multistage shape — test_dockerfile_is_multistage, test_dockerfile_has_dedicated_dep_stage_per_toolchain, test_dockerfile_has_shared_base_stage, test_dockerfile_has_runtime_stage, test_every_stage_is_named, test_runtime_copies_from_each_dep_stage, and the cross-stage test_dockerfile_has_cross_stage_copies count check. The deliberately wasteful step-2 baseline assertions — "exactly one FROM, exactly one COPY" — have been removed, since they would now fail by construction; that removal is itself the proof that the Dockerfile has structurally changed.

What we built

The polyglot monorepo now builds through five named stages instead of one anonymous block. base is the cacheable toolchain image, the three *-deps siblings each isolate one resolver behind its own manifest-only COPY, and runtime reassembles the pieces into the final image with the source tree on top.

The dependency invalidation story has changed shape. Editing services/api/src/api/greeting.py now invalidates only the final COPY . . in the runtime stage. The uv-deps, bun-deps, and cargo-deps stages are unaffected because none of them copied that file in the first place — their COPY directives only touched manifests. Editing services/web/package.json invalidates bun-deps and the runtime stage, but neither uv-deps nor cargo-deps even sees the change. That isolation is the single biggest cache-hit-rate win this tutorial delivers, and everything else just sharpens it.

The shape lints make the new structure load-bearing. A future contributor who collapses two dep stages back into one — or who sneaks a COPY . . into a dep stage — will fail pytest before the change can land. The article guarantees this because the tests assert named stages, not just stage counts: rename cargo-deps to rust-deps and the build is still fine, but the test suite immediately calls out the drift.

What this unlocks is the cache-mount work in the next two steps. Now that each toolchain's resolution is isolated in its own stage with a stable input boundary, we can attach RUN --mount=type=cache,target=… to that single RUN line and durably reuse the cargo registry, the uv wheel cache, and bun's install cache across builds — something the step-2 unified RUN could never have benefited from cleanly.

Repository

The state of the code after this step: d1779eb

Step 4: Wiring BuildKit Cache Mounts Into the Cargo Stage So Crate Downloads and Compile Artifacts Survive Across Builds

Step 3 isolated each toolchain in its own sibling stage so editing a Python source file no longer invalidates cargo fetch. That structural win is necessary but not sufficient: even with a clean cache hit on the cargo-deps layer, the very first cold build still re-downloads the entire crates.io index, every transitive crate, and recompiles every dependency from source. On a Rust project of any real size that single layer dominates cold-build wall time, and nothing in step 3 helped the first build of a new branch or a fresh CI runner.

This step attaches BuildKit cache mounts to the cargo-deps stage so the registry index, the per-commit git checkouts, and the target/ build directory persist between invocations of docker build on the same host. Cache mounts are not image layers — they live in BuildKit's local cache and survive across builds, even after the image layer above them is invalidated — so a second cold build of cargo-deps skips the network entirely and reuses incrementally compiled artifacts.

Setup

No new third-party dependencies are added. The Dockerfile at the repository root is edited in place: the existing cargo-deps stage gains three --mount=type=cache flags on its single RUN line, and the runtime stage's COPY --from=cargo-deps source path is redirected. The tools/dockerfile_shape.py lint library grows three new helpers — cache_mount_targets, stage_cache_mount_targets, and count_cache_mounts — that parse --mount=type=cache,...,target=<path> tokens out of each RUN line, optionally scoped to a single named stage. Seven new tests in tools/tests/test_dockerfile_shape.py consume those helpers to pin the new invariants.

One thing the setup explicitly does not do is touch uv-deps or bun-deps. Cache mounts for those package managers are valuable too, but they involve different target paths, different lockfile semantics, and a different conversation about determinism — folding them into the same commit as the cargo work would muddy the diff. They land in a later step. The base stage and the runtime stage are also untouched, which is enforced by a new test that asserts cache mounts only ever appear in dep stages.

Implementation

The single non-trivial change in the Dockerfile is on the RUN line of cargo-deps. The previous step had a bare RUN cargo fetch; the new version stacks three cache mounts in front of it. Each mount declares a stable id so BuildKit reuses the same backing volume across builds, an absolute target path inside the build container, and sharing=locked so two concurrent builds serialize their writes instead of corrupting the cache.

FROM base AS cargo-deps

COPY Cargo.toml Cargo.lock ./
COPY services/edge/Cargo.toml services/edge/Cargo.toml
RUN --mount=type=cache,id=cargo-registry,target=/root/.cargo/registry,sharing=locked \
    --mount=type=cache,id=cargo-git,target=/root/.cargo/git,sharing=locked \
    --mount=type=cache,id=cargo-target,target=/workspace/target,sharing=locked \
    cargo fetch \
    && mkdir -p /workspace/.cargo-cache \
    && cp -a /root/.cargo/registry /workspace/.cargo-cache/registry

The three target paths are the canonical ones cargo expects. /root/.cargo/registry holds the on-disk crates.io index plus the unpacked source for every crate cargo has seen. /root/.cargo/git holds checkouts for any git = "..." dependency. /workspace/target is where cargo build will write incremental compilation artifacts once a real build lands in a later step — pre-mounting it now means the first real build inherits an empty-but-persistent volume rather than discovering the cache only on its second run.

The cp -a /root/.cargo/registry /workspace/.cargo-cache/registry tail is the subtle part. BuildKit cache mounts are ephemeral per-RUN: anything written to the mount target is not committed to the image layer. If the runtime stage tried COPY --from=cargo-deps /root/.cargo/registry directly, it would copy from an empty path because the mount has already been torn down. Materializing the registry into /workspace/.cargo-cache/registry — a normal directory in the layer — gives the runtime stage a stable source to copy from, while still preserving the fast-rebuild win on the mount itself. We use cp -a rather than mv so the live mount stays intact for the next build that reuses this stage.

The runtime stage's COPY --from=cargo-deps line is updated to read from that snapshot directory:

FROM base AS runtime

WORKDIR /workspace

COPY --from=uv-deps /workspace/.venv /workspace/.venv
COPY --from=bun-deps /workspace/node_modules /workspace/node_modules
COPY --from=cargo-deps /workspace/.cargo-cache/registry /root/.cargo/registry

COPY . .

The structural lints in tools/dockerfile_shape.py are extended so the cache-mount shape is a tested invariant, not a hopeful comment. cache_mount_targets scans every RUN line for --mount=type=cache,...,target=<path> tokens and returns the targets. stage_cache_mount_targets scopes that scan to a single named stage by walking the line stream and tracking which FROM ... AS <name> it is currently inside. count_cache_mounts is the obvious thin wrapper.

def stage_cache_mount_targets(lines: list[str], stage: str) -> list[str]:
    targets: list[str] = []
    for line in _lines_in_stage(lines, stage):
        targets.extend(_extract_cache_targets_from_line(line))
    return targets

The new test suite asserts seven properties: the Dockerfile declares the BuildKit syntax pragma on its first line, at least one cache mount exists, the cargo-deps stage owns mounts for /root/.cargo/registry, /root/.cargo/git, and a path ending in /target, neither base nor runtime has any cache mount of its own, and every cache mount target is an absolute path. Together those tests pin both the intent (cargo gets durable caches) and the containment (we did not accidentally smear mounts onto stages where they cannot help).

Verification

The full local check is still a single pytest invocation. The twenty-two tests from step 3 still pass — the multistage shape did not change — and seven new ones cover the cache-mount invariants.

pytest
.............................                                            [100%]
29 passed in 0.08s

Twenty-nine passing tests: the sixteen from step 2, the six new multistage-shape tests from step 3, and the seven cache-mount tests added in this step. The new tests are test_dockerfile_declares_buildkit_syntax, test_dockerfile_has_at_least_one_cache_mount, test_cargo_deps_has_registry_cache_mount, test_cargo_deps_has_target_cache_mount, test_cargo_deps_has_cargo_git_cache_mount, test_cache_mounts_only_attach_to_dep_stages, and test_cache_mount_targets_are_absolute_paths. The syntax-pragma test is the cheapest of the bunch but the most load-bearing — without # syntax=docker/dockerfile:1.7 on line one, BuildKit silently ignores --mount=type=cache and the cargo stage degrades back to a step-3 build with extra punctuation.

What we built

The cargo-deps stage now writes through three named BuildKit cache mounts every time it runs, and reads through them on the next run. A cold build on a fresh BuildKit cache still pays the full crates.io download cost once, but every subsequent build — whether triggered by a Cargo.toml edit, a Cargo.lock bump, or simply a docker build on a different branch — reuses the registry index, the git checkouts, and the target directory directly from the cache volume.

The dependency-invalidation story from step 3 is preserved exactly. Editing services/api/src/api/greeting.py still fails to invalidate cargo-deps, because the only inputs to that stage are still Cargo.toml, Cargo.lock, and services/edge/Cargo.toml. The new cache mounts are independent of layer invalidation: when the cargo-deps layer is invalidated (because a manifest changed), the cache mounts cushion the rebuild with everything cargo had downloaded last time, so even the "slow path" is dramatically faster than a virgin cold build.

The runtime stage still gets a coherent registry. Materializing the registry into /workspace/.cargo-cache/registry before the RUN exits is what makes that work: the snapshot is committed to the cargo-deps image layer, so the COPY --from=cargo-deps in runtime has a real, populated path to read. The cache mount itself stays intact for the next build to reuse — cp -a deliberately does not move or unlink it.

The shape lints make the new cache-mount layout load-bearing. A future contributor who deletes one of the three mounts — say, drops the target mount because "we are not compiling yet" — will fail pytest on test_cargo_deps_has_target_cache_mount before the change can land. A contributor who smears a cache mount onto the runtime stage thinking it will speed up cargo build at image-run time will fail test_cache_mounts_only_attach_to_dep_stages, because BuildKit cache mounts only apply during docker build, not at container runtime, and a misplaced one is a bug we want the test suite to catch immediately.

What this unlocks is the parallel work on the uv and bun stages in the next step. The pattern is now proven: a --mount=type=cache flag with a stable id, an absolute target, and sharing=locked, gated by a structural test that asserts the mount is attached to the right stage and only to that stage. Replaying that recipe against uv's wheel cache and bun's install cache is mechanical once the cargo case has worked through every edge in production.

Repository

The state of the code after this step: ad240f3

Step 5: Extending BuildKit Cache Mounts to the uv Wheel Cache and the bun Install Cache So Every Dependency Stage Survives a Cold Layer

Step 4 attached three --mount=type=cache flags to the cargo-deps stage and put structural tests around them so the registry, the git index, and the target directory all survive across docker build invocations. The Rust side of the build now degrades gracefully on a cold image layer: even when Cargo.lock changes and the layer cache cannot be reused, BuildKit still hands cargo back its previously downloaded crates. Python and TypeScript do not have that protection yet — a pyproject.toml bump still forces uv to redownload every wheel from PyPI, and a package.json bump still forces bun to refetch every npm tarball from the registry.

This step closes that asymmetry. We replay the cargo recipe on the uv-deps and bun-deps stages, mount the canonical cache directory for each package manager, and add three tests that pin both the per-stage mount and the global invariant that every dependency stage owns at least one named cache. The runtime stage and the COPY chain do not change, because uv and bun already write their per-build artifact — .venv and node_modules — to a non-mounted path on disk that the runtime stage can still pick up.

Setup

No new third-party dependencies are added. The Dockerfile at the repository root is edited in two places: the RUN uv sync ... line in the uv-deps stage and the RUN bun install --no-save line in the bun-deps stage each gain a --mount=type=cache flag. The header comment block at the top of the file is updated so it documents the step-5 intent — caches live on long-lived BuildKit mounts, per-build artifacts stay on the regular filesystem so COPY --from= can still see them.

The lint library at tools/dockerfile_shape.py does not change. Step 4 already shipped cache_mount_targets, stage_cache_mount_targets, and count_cache_mounts, and those helpers are general enough that adding a mount to two more stages requires zero new parser code. What does grow is tools/tests/test_dockerfile_shape.py: three new tests are appended — one for the uv wheel cache, one for the bun install cache, and one that asserts every entry in REQUIRED_DEP_STAGES carries at least one cache mount. The runtime and base stages remain off-limits for cache mounts, which is already enforced by test_cache_mounts_only_attach_to_dep_stages from step 4.

Implementation

The uv-deps stage gains a single --mount=type=cache flag pointing at /root/.cache/uv, which is the directory uv itself uses for its wheel and HTTP cache. The sharing=locked token tells BuildKit to serialize concurrent builds against the same cache volume so two parallel docker build invocations cannot race on the cache directory.

FROM base AS uv-deps

COPY pyproject.toml ./
COPY services/api/pyproject.toml services/api/pyproject.toml
RUN --mount=type=cache,id=uv-cache,target=/root/.cache/uv,sharing=locked \
    uv sync --no-install-project --no-dev || uv sync --no-install-project

Two details on this stage deserve a closer look. First, the cache mount targets /root/.cache/uv, not the project's .venv directory — the wheel cache is the part we want to persist between builds, while the resolved virtual environment is per-build state that needs to live on the regular filesystem so the runtime stage's COPY --from=uv-deps /workspace/.venv can still see it. Second, the --no-dev fallback is preserved verbatim from step 3: if the project has no [dev-dependencies] group, the first invocation harmlessly fails and the second succeeds. The cache mount is independent of which branch of that OR fires, because both write to the same /root/.cache/uv directory.

The bun-deps stage is the mirror image with bun's canonical cache path. Bun stores its install cache at /root/.bun/install/cache on Linux when running as root, and the bun install --no-save invocation populates that directory transparently.

FROM base AS bun-deps

COPY package.json ./
COPY services/web/package.json services/web/package.json
RUN --mount=type=cache,id=bun-cache,target=/root/.bun/install/cache,sharing=locked \
    bun install --no-save

The --no-save flag keeps package.json from being rewritten during the install, which matters because the manifest is the layer-invalidation key for this stage and we do not want bun mutating it. As with uv, the node_modules directory that bun materializes is not on the cache mount — it lives at /workspace/node_modules in the regular layer filesystem, and the runtime stage copies it across via COPY --from=bun-deps.

The new test block in tools/tests/test_dockerfile_shape.py formalizes both per-stage targets and the global invariant. The per-stage tests are obvious replays of the step-4 cargo tests; the global test is the more interesting one because it shifts the policy from "specific paths exist" to "every dep stage owns at least one cache" — which means adding a fourth dep stage in some future article will fail the suite until that stage gets a mount.

def test_uv_deps_has_uv_cache_mount():
    targets = stage_cache_mount_targets(_lines(), "uv-deps")
    assert "/root/.cache/uv" in targets, targets


def test_bun_deps_has_bun_install_cache_mount():
    targets = stage_cache_mount_targets(_lines(), "bun-deps")
    assert "/root/.bun/install/cache" in targets, targets


def test_every_dep_stage_has_a_cache_mount():
    lines = _lines()
    for stage in REQUIRED_DEP_STAGES:
        targets = stage_cache_mount_targets(lines, stage)
        assert targets, f"{stage} has no cache mounts"

Pairing the specific-path tests with the at-least-one-mount test gives the suite two complementary failure modes. If a contributor accidentally renames the uv cache directory but leaves a mount in place, test_uv_deps_has_uv_cache_mount fires while test_every_dep_stage_has_a_cache_mount stays green. If a contributor deletes the entire --mount=type=cache,... flag from the uv-deps RUN line, both tests fire. Catching the mistake from two angles is cheap insurance because both tests are O(lines-in-the-Dockerfile).

Verification

All checks still run from the project root with a single pytest invocation. The twenty-nine tests that were green at the end of step 4 are untouched, three new tests for the uv and bun mounts join them, and the existing service test for the greeting API stays where it is.

pytest
................................                                         [100%]
32 passed in 0.07s

Thirty-two passing tests: three from services/api/tests/test_greeting.py covering the polyglot service stub from step 1, seven from tools/tests/test_baseline.py covering the baseline-timing harness from step 2, and twenty-two from tools/tests/test_dockerfile_shape.py covering the Dockerfile structure plus the cache-mount invariants. The three new shape tests — test_uv_deps_has_uv_cache_mount, test_bun_deps_has_bun_install_cache_mount, and test_every_dep_stage_has_a_cache_mount — all pass because the matching --mount=type=cache flags landed in the Dockerfile in the same commit.

What we built

Each of the three dependency stages now writes through a named BuildKit cache mount whose lifetime is decoupled from the image layer above it. A cold first build still pays the full cost of downloading every Python wheel, every npm tarball, and every Rust crate exactly once. Every subsequent build — including builds where the lockfile layer is invalidated — reuses the on-disk caches and skips the network for any artifact it has already fetched.

The runtime stage's COPY --from= chain did not change, and that is the load-bearing part of the design. uv-deps still produces .venv at /workspace/.venv on the regular filesystem, bun-deps still produces node_modules at /workspace/node_modules, and cargo-deps still snapshots its registry to /workspace/.cargo-cache/registry exactly as step 4 set up. The mounts hold the inputs the package managers consume; the per-build artifacts the runtime stage needs to read stay on the layer filesystem where COPY --from= can see them.

The structural tests now enforce a much stronger invariant. Before this step the suite only insisted that the cargo stage had specific cache mounts. After this step the suite insists that every dependency stage has at least one cache mount, with explicit per-toolchain assertions on the canonical target paths. A future contributor who drops the uv cache because "we are not building Python yet" will fail test_uv_deps_has_uv_cache_mount; a contributor who adds a new dep stage without a cache mount will fail test_every_dep_stage_has_a_cache_mount before the change can merge.

What this unlocks is the actual application-level work the rest of the tutorial will lean on. With every toolchain's package cache pinned, the next step can introduce real source-code builds — cargo build, uv pip install, bun build against the workspace — and trust that the slowest network-bound work has already been amortized into a reusable BuildKit volume. The cache-mount pattern stops being a story about cargo specifically and starts being the project's default shape for any RUN line that downloads from a remote registry.

Repository

The state of the code after this step: d5fb60e

Step 6: Pinning the Lockfile-Only COPY Invariant So Source Edits Never Invalidate the Dependency Layer

Step 5 finished wiring BuildKit cache mounts into all three dependency stages, so cold builds now keep their wheel cache, bun install cache, and cargo registry between invocations. The build is fast on a cold layer, but a structural weakness still hides in the build graph: nothing in the test suite stops a future contributor from sneaking a COPY services/api/src into the uv-deps stage as a "quick fix" for some import error. The moment that happens, the lockfile-only key the dep layer is supposed to ride on is lost, every Python source edit invalidates the entire dep install, and the cache-mount wins from step 5 evaporate behind a permanently invalidated layer above them.

This step closes that hole on two fronts. First, the uv-deps stage starts copying uv.lock alongside pyproject.toml and runs uv sync --frozen so the dependency resolution is genuinely keyed on the lockfile rather than on whatever PyPI happens to look like at build time. Second, a new family of structural tests pins the invariant that every dependency stage may only COPY manifest or lockfile sources — anything else, anywhere in the dep half of the build, fails the suite before the change can land.

Setup

No new third-party dependencies are added. The Dockerfile at the repository root is edited in two narrow places: the uv-deps stage gains uv.lock in its first COPY line and a --frozen flag stacked in front of the existing uv sync fallback chain, and the header comment at the top of the file is rewritten to document the lockfile-only contract that all three dep stages now satisfy. The bun-deps and cargo-deps stages already copied only manifests in step 3 and need no Dockerfile edits — what they gain in this step is test coverage that locks the existing behaviour in place.

The lint library at codebase/tools/dockerfile_shape.py grows two new helpers, stage_copy_sources and stage_copies_path, that walk a single named stage and return every operand of every COPY instruction except the destination. Six new tests in codebase/tools/tests/test_dockerfile_shape.py consume those helpers: two pin the presence of the root lockfile for the Python and Rust stages, three enforce the lockfile-only invariant across all three dep stages, and one asserts the runtime stage is the only place a COPY . ever lives. An ALLOWED_LOCKFILE_SOURCES mapping at the top of the new test block makes the per-stage allow-list explicit.

Implementation

The Dockerfile delta is small but load-bearing. The first COPY in uv-deps now picks up uv.lock so the lockfile becomes part of the layer key, and the RUN uv sync invocation grows a three-level fallback chain: try --frozen --no-dev first so a normal build is fully reproducible, fall back to --frozen without the dev exclusion, and only as a last resort drop --frozen entirely. Keeping the unfrozen branch at the bottom means a project that has not yet generated a lockfile still builds, but a project that has one will use it.

FROM base AS uv-deps

COPY pyproject.toml uv.lock ./
COPY services/api/pyproject.toml services/api/pyproject.toml
RUN --mount=type=cache,id=uv-cache,target=/root/.cache/uv,sharing=locked \
    uv sync --frozen --no-install-project --no-dev \
    || uv sync --frozen --no-install-project \
    || uv sync --no-install-project

The other two dep stages are unchanged from step 5 — bun-deps still copies package.json plus its workspace manifest, and cargo-deps still copies Cargo.toml, Cargo.lock, and the edge service's Cargo.toml. The point of this step is not to add more COPY lines but to prove that the existing ones already follow the lockfile-only rule and to make any future violation a test failure rather than a silent regression. The header comment at the top of the Dockerfile is updated to spell out the per-stage allow-list, so anyone reading the file before reading the tests still gets the same contract in plain English.

def stage_copy_sources(lines: list[str], stage: str) -> list[str]:
    sources: list[str] = []
    for line in _lines_in_stage(lines, stage):
        sources.extend(_extract_copy_sources(line))
    return sources


def _extract_copy_sources(line: str) -> list[str]:
    stripped = line.strip()
    if not stripped.upper().startswith("COPY "):
        return []
    tokens = stripped.split()[1:]
    operands = [token for token in tokens if not token.startswith("--")]
    if len(operands) < 2:
        return []
    return operands[:-1]

The helper deliberately strips off the destination operand — the last token of every COPY line — and skips any flag operand prefixed with --. What is left is the set of source paths the stage pulls in from the build context, which is exactly the surface area the lockfile-only invariant constrains. Calling stage_copy_sources against runtime returns ["."], against uv-deps returns ["pyproject.toml", "uv.lock", "services/api/pyproject.toml"], and against base returns the empty list because the shared base stage installs toolchains without copying anything from the context.

ALLOWED_LOCKFILE_SOURCES = {
    "uv-deps": {"pyproject.toml", "uv.lock", "services/api/pyproject.toml"},
    "bun-deps": {"package.json", "services/web/package.json"},
    "cargo-deps": {"Cargo.toml", "Cargo.lock", "services/edge/Cargo.toml"},
}


def test_dep_stages_only_copy_manifest_or_lockfiles():
    lines = _lines()
    for stage, allowed in ALLOWED_LOCKFILE_SOURCES.items():
        sources = stage_copy_sources(lines, stage)
        assert sources, f"{stage} has no COPY instructions"
        unexpected = [src for src in sources if src not in allowed]
        assert unexpected == [], f"{stage} copies non-lockfile sources: {unexpected}"


def test_dep_stages_never_copy_source_directories():
    lines = _lines()
    for stage in ALLOWED_LOCKFILE_SOURCES:
        for src in stage_copy_sources(lines, stage):
            assert "/src" not in src, f"{stage} reaches into a /src tree: {src}"


def test_runtime_stage_does_copy_full_context():
    sources = stage_copy_sources(_lines(), "runtime")
    assert "." in sources, sources

Three different failure modes are now caught from three different angles. The allow-list test fires the moment any dep stage learns a new source it was not explicitly granted — even a README.md would be rejected, which is the right default. The /src test catches the most likely real mistake, a contributor reaching into services/api/src to "make the install work". The runtime-stage test is the inverse safety belt: if some future refactor moves the COPY . out of runtime into one of the dep stages, the runtime test fails the same instant the dep-stage tests start firing, which makes the diagnosis unambiguous.

Verification

Tests run from the codebase root with a single pytest invocation. Step 5 finished at thirty-two passing tests; this step adds six structural tests for a final count of thirty-eight, and no existing test has to change.

pytest
......................................                                   [100%]
38 passed in 0.12s

Thirty-eight passing tests: three from services/api/tests/test_greeting.py for the polyglot service stub, seven from tools/tests/test_baseline.py for the baseline-timing harness, and twenty-eight from tools/tests/test_dockerfile_shape.py covering the multistage shape, the cache-mount invariants from steps 4 and 5, and the six new lockfile-only invariants added in this step. The six new tests are test_uv_deps_copies_root_lockfile, test_cargo_deps_copies_root_lockfile, test_dep_stages_only_copy_manifest_or_lockfiles, test_dep_stages_never_copy_whole_workspace, test_dep_stages_never_copy_source_directories, and test_runtime_stage_does_copy_full_context.

What we built

The dependency half of the build graph is now fully keyed on lockfile contents. The uv-deps layer is invalidated when uv.lock changes or when one of the two pyproject.toml files changes, and never on a Python source edit. bun-deps is invalidated only on manifest churn, and cargo-deps only on Cargo.lock or Cargo.toml churn. The expensive part of a build — resolving and downloading transitive dependencies — runs only when the lockfile that controls it actually moves.

The --frozen flag is the other half of that contract. Without it, uv sync would happily resolve a fresh dependency graph against PyPI on every build, which means the layer would still be deterministic on its key but the contents could drift from one builder to another. --frozen makes uv refuse to resolve anything that is not already pinned in uv.lock, so two cold builds of the same commit on two different machines produce byte-identical .venv directories. Reproducibility moves from "we hope nothing on PyPI changed" to "the lockfile is the contract".

The structural test pack is what makes all of this survive contact with future commits. A contributor who adds a COPY services/api/src to uv-deps to silence an install error fails test_dep_stages_only_copy_manifest_or_lockfiles and test_dep_stages_never_copy_source_directories simultaneously. A contributor who deletes uv.lock from the COPY line fails test_uv_deps_copies_root_lockfile. A contributor who refactors the COPY . out of runtime into a dep stage fails the runtime test and one of the dep-stage tests at the same time. Every plausible regression has at least one matching tripwire.

What this unlocks for the next step is the right to start adding real per-service source builds — cargo build, application bundling, etc. — without worrying that those new build lines will accidentally leak their input set back into the dep stages. The dep half of the Dockerfile is now a closed system: it reads lockfiles, it writes resolved-deps directories, and the tests will reject anything that violates either half of that contract.

Repository

The state of the code after this step: c82255e

Step 7: Carving Out a Slim, Non-Root Runtime Stage With Tini and a Healthcheck

Step 6 finished closing the lockfile-only contract across the three dependency stages, and the build tests at that point asserted only the shape of the dep half of the Dockerfile. The runtime stage, however, was still inheriting from base — which means every published image dragged along build-essential, git, unzip, curl, the rustup toolchain, the bun installer, and the uv installer, plus whatever transitive cruft those installers leave behind. That is fine for a build cache but indefensible for a deployment artifact: the attack surface is enormous, the image is hundreds of megabytes heavier than it has to be, and pulling it on a cold node takes far longer than the workload itself.

This step rewrites the runtime stage so it begins from a fresh debian:12-slim layer and copies in only the artifacts that earlier stages produced. We install just enough on top of slim to actually run the services — ca-certificates, tini, python3, libssl3 — create a dedicated unprivileged app user, chown the workspace to it, drop to it with USER app, declare a HEALTHCHECK so orchestrators can detect a wedged container, and put tini at PID 1 via ENTRYPOINT so signals propagate and zombie children are reaped. Eleven new structural tests pin every one of those properties in place.

Setup

No new third-party dependencies are introduced. The Dockerfile at the codebase root is edited in one large place: the previous FROM base AS runtime block is deleted and replaced with a new FROM debian:12-slim AS runtime block that installs a curated minimal package set, provisions an app user, copies the three pre-built artifact trees in with --chown=app:app, switches to that user, declares EXPOSE 8080, adds a HEALTHCHECK, sets ENTRYPOINT ["/usr/bin/tini", "--"], and finally sets a placeholder CMD. The header comment at the top of the Dockerfile is rewritten to spell out the runtime contract — which packages live on the slim image, which artifacts come from which dep stage, and why tini sits at PID 1.

The lint module codebase/tools/dockerfile_shape.py grows five new helpers: stage_base_image, stage_user, stage_lines, stage_has_directive, stage_directive_line, and stage_mentions. They expose the runtime stage's image, its trailing USER value, the raw line list scoped to the stage, presence-and-line lookups for arbitrary directives, and an occurrence counter for a literal needle inside a single stage. Eleven new tests in codebase/tools/tests/test_dockerfile_shape.py consume those helpers — they cover slim base image, divorce from base, dedicated user creation, non-root USER, tini installation, tini in ENTRYPOINT, HEALTHCHECK presence, healthcheck interval and timeout flags, --chown= on the artifact COPYs, the exact-twice tini occurrence count, and a defensive check that base still carries the heavy toolchains the dep stages depend on.

Implementation

The runtime stage no longer inherits from base. It starts from a fresh slim image and installs only what the services actually need at run time. Dropping the inheritance is the load-bearing change: every downstream property — small image, no compilers, no installers in the final layer — flows from it.

FROM debian:12-slim AS runtime

ENV DEBIAN_FRONTEND=noninteractive
ENV PATH=/workspace/.venv/bin:/usr/local/bin:/usr/bin:/bin
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        tini \
        python3 \
        libssl3 \
    && rm -rf /var/lib/apt/lists/* \
    && groupadd --system --gid 1001 app \
    && useradd --system --uid 1001 --gid 1001 --no-create-home --home-dir /workspace app \
    && mkdir -p /workspace \
    && chown -R app:app /workspace

Four packages, one useradd, one groupadd, one chown, all inside a single RUN so the apt cache, the user creation, and the workspace ownership all collapse into one layer. ca-certificates is there for outbound TLS, tini becomes PID 1 a few lines below, python3 is the runtime interpreter for the API service, and libssl3 is the only shared library Python's ssl module needs that does not already ship with slim. Anything heavier — build-essential, the rustup toolchain, the bun installer — stays behind in base and never reaches the runtime image.

WORKDIR /workspace

COPY --from=uv-deps --chown=app:app /workspace/.venv /workspace/.venv
COPY --from=bun-deps --chown=app:app /workspace/node_modules /workspace/node_modules
COPY --from=cargo-deps --chown=app:app /workspace/.cargo-cache/registry /workspace/.cargo-cache/registry

COPY --chown=app:app . .

USER app

Each COPY --from=<dep-stage> pulls exactly one resolved-deps tree out of its dedicated dep stage and lands it under /workspace, already owned by app:app. The trailing COPY --chown=app:app . . is the only place in the entire Dockerfile that copies the build context wholesale, which is exactly what the step-6 invariant tests guarantee. Switching to USER app before any further instruction ensures every subsequent process — including the healthcheck and the entrypoint — runs without root privileges.

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python3 -c "import sys; sys.exit(0)"

ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["python3", "-m", "http.server", "8080"]

The HEALTHCHECK runs a trivial python3 invocation every thirty seconds with a five-second timeout and a ten-second start period; orchestrators use the resulting status to gate traffic and trigger restarts. Real services will replace the stub command with their own readiness probe, but the structural test only requires the directive itself plus the --interval and --timeout flags to be present. Tini sits in ENTRYPOINT exec form with the -- separator, which is the canonical pattern for signal-forwarding under Docker; the application command is exposed as CMD so a downstream image or docker run invocation can override it without losing tini.

def stage_base_image(lines: list[str], stage: str) -> str | None:
    for line in lines:
        if not _is_from_line(line):
            continue
        if _extract_stage_name(line) != stage:
            continue
        return _extract_from_image(line)
    return None


def stage_user(lines: list[str], stage: str) -> str | None:
    last: str | None = None
    for line in _lines_in_stage(lines, stage):
        stripped = line.strip()
        if not stripped.upper().startswith("USER "):
            continue
        last = stripped.split(None, 1)[1].strip()
    return last

stage_base_image returns the image token from FROM <image> AS <stage> so the test pack can assert that the runtime stage starts from a slim image and explicitly not from base. stage_user walks the stage and returns the last USER value, which models the actual effective user at the end of the stage; if the stage never sets USER it returns None, and the non-root test rejects that as a privileged default. Pairing the two helpers gives the test pack a precise way to assert both "the runtime image is slim" and "the runtime process is unprivileged" without coupling either assertion to literal Dockerfile line numbers.

def test_runtime_uses_slim_base_image():
    image = stage_base_image(_lines(), "runtime")
    assert image is not None, "runtime stage has no FROM image"
    assert "slim" in image.lower(), f"runtime base image is not slim: {image}"


def test_runtime_does_not_inherit_from_build_base():
    image = stage_base_image(_lines(), "runtime")
    assert image != "base", "runtime must start from a fresh slim image, not the toolchain-heavy base"


def test_runtime_drops_to_nonroot_user():
    user = stage_user(_lines(), "runtime")
    assert user is not None, "runtime never sets USER"
    assert user not in NON_ROOT_FORBIDDEN_USERS, f"runtime runs as privileged user: {user}"


def test_runtime_entrypoint_invokes_tini():
    line = stage_directive_line(_lines(), "runtime", "ENTRYPOINT")
    assert line is not None, "runtime has no ENTRYPOINT"
    assert "tini" in line, f"ENTRYPOINT does not invoke tini: {line}"


def test_runtime_healthcheck_declares_interval_and_timeout():
    line = stage_directive_line(_lines(), "runtime", "HEALTHCHECK")
    assert line is not None
    assert "--interval=" in line, line
    assert "--timeout=" in line, line


def test_runtime_uses_tini_exactly_twice():
    assert stage_mentions(_lines(), "runtime", "tini") >= 2

Each test isolates a single property of the runtime contract. The slim test rejects any future move to ubuntu:24.04 or debian:12 (non-slim) at PR time. The "does not inherit from base" test makes the inheritance regression a one-line diff away from a hard failure. The non-root test refuses the empty USER case and the explicit USER root case via the same NON_ROOT_FORBIDDEN_USERS set. The tini-in-ENTRYPOINT test rejects any future switch to CMD-only or to sh -c wrappers. The healthcheck-flags test refuses healthchecks that omit a timeout, which would otherwise hang the orchestrator on a stuck command. The exact-twice tini test catches the symmetric failure mode where either the install line or the entrypoint line gets accidentally deleted in a refactor.

Verification

Tests run from the codebase root with a single pytest invocation. Step 6 finished at thirty-eight passing tests; this step adds ten structural tests for the slim runtime properties and one defensive test that pins the toolchains to base, for a final count of forty-nine.

pytest
.................................................                        [100%]
49 passed in 0.10s

Forty-nine passing tests in roughly a tenth of a second: three from services/api/tests/test_greeting.py, seven from tools/tests/test_baseline.py, and thirty-nine from tools/tests/test_dockerfile_shape.py. The eleven new tests in this step are test_runtime_uses_slim_base_image, test_runtime_does_not_inherit_from_build_base, test_runtime_creates_dedicated_system_user, test_runtime_drops_to_nonroot_user, test_runtime_installs_tini_package, test_runtime_entrypoint_invokes_tini, test_runtime_has_healthcheck, test_runtime_healthcheck_declares_interval_and_timeout, test_runtime_copies_chown_to_app_user, test_runtime_uses_tini_exactly_twice, and test_base_stage_still_carries_build_toolchains — plus the existing test_runtime_stage_does_copy_full_context from step 6 which still passes against the rewritten stage.

What we built

The published image is now a small slim layer with only the four packages it actually needs at run time, a dedicated app:1001 user that owns /workspace, and three artifact trees materialized via COPY --from= out of the dep stages. Compilers, installers, and toolchains never reach the final layer; they live and die inside the build graph.

Operationally, the runtime process is unprivileged, signal-correct, and observable. Tini at PID 1 forwards SIGTERM to the child, which means docker stop no longer relies on a ten-second grace timer to be a clean shutdown — the application sees the signal immediately and zombie reaping is automatic. The HEALTHCHECK directive turns the container into a first-class citizen of an orchestrator's restart machinery: Docker, Swarm, and Kubernetes-via-livenessProbe-bridge all read it without extra glue.

The structural test pack is what makes this contract durable. A contributor who rewrites the runtime as FROM base AS runtime fails three tests at once — the slim base test, the toolchains-stay-in-base test, and any test that indirectly depends on python3 being a runtime package rather than a base-stage one. A contributor who deletes USER app for "debugging" fails the non-root test. A contributor who replaces tini with sh -c "exec ..." fails both the entrypoint test and the exact-twice count test.

What this unlocks for the next step is a published image that is finally worth shipping through a CI pipeline. The remaining work is no longer about the Dockerfile shape — it is about getting that shape built and cached across CI runners, which is exactly what step 8 wires up via GitHub Actions and a registry-backed BuildKit cache.

Repository

The state of the code after this step: a325d69

Step 8: Shipping CI With a Registry-Backed BuildKit Cache for Cross-Runner Layer Reuse

Step 7 finished hardening the runtime stage — slim base, non-root app user, tini at PID 1, healthcheck — so the published image is finally something worth shipping through a CI pipeline. Up to this point every build has happened on a single developer laptop, where BuildKit's local cache directory survives between invocations and the cache mounts attached in steps 4 and 5 do real work on every rerun. The moment that same Dockerfile runs on an ephemeral CI runner, all of those wins disappear: each job spins up a fresh VM, the BuildKit daemon starts with an empty cache store, and even the cleanest multistage build degrades to a cold install of every dep stage on every push.

This step bolts a .github/workflows/build.yml pipeline onto the repo that fixes the ephemeral-runner problem by storing the BuildKit layer cache in a container registry instead of on the runner's disk. Each of the three dep stages — uv-deps, bun-deps, cargo-deps — gets its own registry-backed cache reference under ghcr.io/<repo>/cache:<stage>, written with mode=max so every intermediate layer is exported. The runtime build then reads cache-from for all three dep refs plus its own cache:runtime ref, which lets a fresh runner pull pre-resolved deps off GHCR and skip straight to the application copy. A new workflow_shape lint module plus twenty structural tests pin every load-bearing property of that workflow into the test suite, so a future contributor cannot quietly delete the registry cache wiring and downgrade CI to cold builds without breaking the pipeline first.

Setup

Three new files land in this step and no third-party dependencies are added. The pipeline itself lives at codebase/.github/workflows/build.yml and triggers on push and pull_request against main. A line-oriented YAML lint module is added at codebase/tools/workflow_shape.py — it deliberately avoids depending on PyYAML, mirroring the parser style of the existing dockerfile_shape.py so the test pack keeps a single moving part. Twenty tests are added in codebase/tools/tests/test_workflow_shape.py that consume those helpers to assert the workflow file actually does what its header comment claims.

The lint module exposes a small surface: workflow_uses lists every uses: ref so the test pack can assert pinned-action discipline, workflow_targets returns every target: value so the test pack can assert which dep stages are built, workflow_cache_from_entries and workflow_cache_to_entries flatten both inline and block-scalar (|) cache entries into a single list, and workflow_cache_to_refs / workflow_cache_to_modes pick ref= and mode= fields out of those entries. The tests use those helpers to enforce: a checkout step exists, Buildx is set up, login to a registry happens, each dep stage and the runtime stage are built as targets, every cache-to entry is type=registry,mode=max, each dep stage has its own cache ref, the runtime build reads cache-from for every dep ref, and every step name is at least four characters long so the workflow remains legible in the Actions UI.

Implementation

The workflow header is small and declares exactly the contract the lint pack will enforce a few files down. Push and pull-request triggers cover both the trunk-build and the PR-validation lanes; permissions: contents: read, packages: write is the minimum surface the GITHUB_TOKEN needs to authenticate against GHCR and push cache images.

name: build

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  CACHE_REPO: ghcr.io/${{ github.repository }}/cache

jobs:
  build:
    name: build-and-cache
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

The CACHE_REPO is the load-bearing identifier — every dep stage and the runtime stage will tag its layer cache under ${CACHE_REPO}:<stage>, which keeps the cache references stable across runners and forks. Pinning the registry path in env: rather than inlining it at each step also lets a downstream change move the cache to a different registry (a self-hosted GitLab container registry, a private ECR namespace) by editing one line.

      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          driver: docker-container

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

The three preamble steps are deliberately boring. setup-buildx-action with driver: docker-container is the only Buildx configuration that supports type=registry cache exports — the default docker driver does not. The login step uses the workflow-scoped GITHUB_TOKEN rather than a long-lived PAT, so the cache images are pushed under the running workflow's identity and are automatically revoked when the run finishes.

      - name: Build uv-deps stage with registry cache
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./Dockerfile
          target: uv-deps
          push: false
          cache-from: type=registry,ref=${{ env.CACHE_REPO }}:uv-deps
          cache-to: type=registry,ref=${{ env.CACHE_REPO }}:uv-deps,mode=max

Each dep stage is built as a standalone build-push-action step with push: false — the image itself is not pushed, only its cache export is. cache-to writes the layer cache to ${CACHE_REPO}:uv-deps with mode=max, which exports every intermediate layer (including the apt installs and the uv sync --frozen wheel resolution) instead of just the final stage layer. The matching cache-from reference means a runner that finds an existing cache:uv-deps on GHCR will pull those layers in cold and skip straight past the install. The bun-deps and cargo-deps steps follow the exact same shape, each with its own dedicated cache ref so that a manifest edit to one toolchain never invalidates the others.

      - name: Build runtime image with cross-stage registry cache
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./Dockerfile
          target: runtime
          push: false
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
          cache-from: |
            type=registry,ref=${{ env.CACHE_REPO }}:uv-deps
            type=registry,ref=${{ env.CACHE_REPO }}:bun-deps
            type=registry,ref=${{ env.CACHE_REPO }}:cargo-deps
            type=registry,ref=${{ env.CACHE_REPO }}:runtime
          cache-to: type=registry,ref=${{ env.CACHE_REPO }}:runtime,mode=max

The runtime step is where the multistage payoff lands. cache-from is given as a block scalar with four entries — the three dep refs plus the runtime ref — which means BuildKit will probe every cache reference and pull whichever layers it needs to satisfy the COPY --from=<dep-stage> instructions inside the runtime stage. Without the dep refs in the runtime's cache-from, BuildKit would only know about the runtime ref and would still rebuild every dep stage from scratch on a cold runner; with them, a fresh runner can short-circuit deep into the build graph. The cache-to for the runtime stage writes a separate cache:runtime ref so the cross-stage cache and the final-stage cache stay decoupled.

def workflow_uses(text: str) -> list[str]:
    refs: list[str] = []
    for line in text.splitlines():
        ref = _extract_key_value(line, "uses")
        if ref is not None:
            refs.append(ref)
    return refs


def workflow_targets(text: str) -> list[str]:
    targets: list[str] = []
    for line in text.splitlines():
        value = _extract_key_value(line, "target")
        if value is None:
            continue
        targets.append(value)
    return targets

The two simplest helpers walk the workflow text line by line and pull every uses: and every target: value out of it. They power the action-pinning tests (every docker/build-push-action reference must include an @ so a major-version bump cannot land silently) and the stage-coverage tests (each of uv-deps, bun-deps, cargo-deps, and runtime must appear as a target: somewhere in the workflow).

def _collect_cache_entries(text: str, key: str) -> list[str]:
    entries: list[str] = []
    lines = text.splitlines()
    index = 0
    while index < len(lines):
        line = lines[index]
        value = _extract_key_value(line, key)
        if value is None:
            index += 1
            continue
        if value == "|" or value == ">":
            block, consumed = _collect_block_scalar(lines, index + 1, _leading_spaces(line))
            entries.extend(block)
            index += 1 + consumed
            continue
        if value:
            entries.append(value)
        index += 1
    return entries

_collect_cache_entries is the one helper that has to handle two YAML shapes at once. An inline cache-to: type=registry,ref=...,mode=max is a one-line entry; a block scalar cache-from: | followed by indented lines is N entries. The helper detects the | or > block scalar marker, recursively collects indented follow-up lines until the indentation drops back to the parent level, and flattens both shapes into a single list of cache entries. That single flat list is what the test pack consumes to assert "every cache-to entry is type=registry", "every cache-to entry uses mode=max", and "the runtime build reads cache-from for every dep stage".

def test_every_cache_to_entry_uses_mode_max():
    modes = workflow_cache_to_modes(_text())
    assert modes, "no cache-to modes detected"
    assert all(mode == "max" for mode in modes), modes


def test_each_dep_stage_has_its_own_cache_to_ref():
    refs = " ".join(workflow_cache_to_refs(_text()))
    for stage in REQUIRED_DEP_TARGETS:
        assert stage in refs, f"no cache-to ref tagged for {stage}: {refs}"


def test_runtime_build_reads_cache_from_every_dep_stage():
    entries = workflow_cache_from_entries(_text())
    joined = " ".join(entries)
    for stage in REQUIRED_DEP_TARGETS:
        assert stage in joined, f"no cache-from entry references {stage}: {entries}"

Each test isolates one property of the cross-runner cache contract. The mode=max test rejects any future switch to the default mode=min, which would only export the final layer and silently downgrade CI back to cold dep installs. The per-stage cache-to test refuses any refactor that merges the dep caches into a single shared image, which is the most likely well-meaning regression — somebody decides "one cache image is simpler" and unknowingly destroys per-toolchain invalidation. The runtime-reads-every-dep test catches the inverse failure: somebody deletes the dep refs from the runtime step's cache-from block scalar to "simplify", and the runtime build silently rebuilds every dep stage on every CI run.

Verification

Tests run from the codebase root with a single pytest invocation. Step 7 finished at forty-nine passing tests; this step adds twenty structural tests for the GitHub Actions workflow for a final count of sixty-nine, and no existing test has to change.

pytest
.....................................................................    [100%]
69 passed in 0.13s

Sixty-nine passing tests in roughly a tenth of a second: three from services/api/tests/test_greeting.py, seven from tools/tests/test_baseline.py, thirty-nine from tools/tests/test_dockerfile_shape.py, and twenty new from tools/tests/test_workflow_shape.py. The twenty new tests are test_workflow_file_exists, test_workflow_lives_under_dot_github_workflows, test_workflow_triggers_on_push_and_pull_request, test_workflow_runs_on_ubuntu, test_workflow_declares_at_least_one_job, test_workflow_checks_out_repository, test_workflow_sets_up_docker_buildx, test_workflow_logs_into_container_registry, test_workflow_uses_pinned_build_push_action, test_workflow_builds_each_dep_stage_as_target, test_workflow_builds_runtime_stage_as_target, test_workflow_uses_registry_backed_buildkit_cache, test_every_cache_to_entry_is_registry_type, test_every_cache_to_entry_uses_mode_max, test_each_dep_stage_has_its_own_cache_to_ref, test_runtime_build_has_dedicated_cache_to_ref, test_runtime_build_reads_cache_from_every_dep_stage, test_every_cache_from_entry_is_registry_type, test_cache_from_refs_are_well_formed, and test_workflow_step_names_are_descriptive.

What we built

The repository now has a CI pipeline whose cold-cache build time on a fresh GitHub-hosted runner approaches the warm-cache experience on a developer laptop. Every dep stage exports its layer cache to its own ghcr.io/<repo>/cache:<stage> reference with mode=max, so a subsequent run on a brand-new runner pulls the resolved-deps layers directly from GHCR instead of redoing the uv sync --frozen, bun install, and cargo build work that those stages encapsulate.

The cache is keyed per toolchain by construction. A pyproject.toml edit that moves uv.lock invalidates cache:uv-deps and only cache:uv-depscache:bun-deps and cache:cargo-deps keep serving cold runs on the next push. The same independence property that motivated the multistage split back in step 3 now applies across CI runs, not just within a single build, which is the whole point of putting the cache in a registry.

The structural test pack is what makes this contract survive contact with future commits. A contributor who deletes one of the dep target: blocks fails test_workflow_builds_each_dep_stage_as_target. A contributor who flips a cache-to to mode=min fails test_every_cache_to_entry_uses_mode_max. A contributor who unpins docker/build-push-action from @v6 to a moving ref fails test_workflow_uses_pinned_build_push_action. Every plausible regression has at least one tripwire.

What this unlocks for the rest of the project is the freedom to add real application-build steps — cargo build --release, a TypeScript bundle, a Python package build — on top of the existing dep stages without worrying about CI build-time blowing up. The expensive resolution work is now amortised across the registry cache, so adding a new build step costs roughly the build step itself, not the build step plus a full cold dep install.

Repository

The state of the code after this step: 8eb8084

Repository

Full source at https://github.com/vytharion/docker-multistage-build-cache-uv-bun-cargo.

Walk the lessons by stepping through the git commits in the repo — each major step has its own commit you can git checkout and rerun.