monorepo.
monorepo10 min read

share schema rust python typescript

Mirror schemas across Rust + Python + TypeScript — single source of truth, codegen options, drift detection.

Share Schema Rust Python TypeScript: How to Mirror Types Across a Polyglot Monorepo

The first time I shipped a polyglot stack, the drift took eleven days to surface. A teammate added one optional field to a Rust enum on a Tuesday, the Python orchestrator started silently rejecting that variant on Wednesday, and we only noticed the following Saturday when a customer asked why their job had never finished. The Rust side compiled. The Python side served traffic. Nobody had broken anything visible. The contract had cracked in the gap between three languages that all thought they agreed.

Let me show you the four strategies I've watched teams use to close that gap, plus the drift-detection layers that actually catch the problem before a customer does.

Why this gets hard fast

Why split a stack across three languages when one would compile? Rust handles the daemon that opens raw sockets. Python handles the orchestrator that calls LLM APIs. TypeScript handles the web client that ships to browsers. The moment those three start exchanging JSON over WebSocket or HTTP, a quieter problem appears: the schema lives in three places at once, and any one of them can drift without breaking the others until production.

The cost of that drift is not theoretical. Add one optional field to a Rust enum variant, forget to update the Pydantic model in the Python service, and the orchestrator silently rejects 100% of messages from that variant. The Rust side still compiles. The Python side still serves traffic. The contract has cracked, and only an end-to-end test catches it. So let's walk through the four realistic strategies for keeping share schema rust python typescript actually shared, plus the drift-detection patterns that make the mirror durable.

The four strategies, ranked by how much ceremony they demand

Most teams pick a schema-sharing strategy by defaulting to their team's favorite language \u2014 and that is almost always the wrong starting point.

  1. Rust-canonical with codegen out. Define every shared struct in Rust, derive a generator attribute, emit Python + TypeScript artifacts as build outputs. Smallest cognitive overhead if you already model your domain in Rust.
  2. JSON Schema as neutral hub. Write JSON Schema files in your_project/schemas/, generate Rust + Pydantic + TypeScript from those files. No language is privileged. Highest tooling cost.
  3. Protobuf / FlatBuffers / Cap'n Proto. Schema is .proto files. Codegen for all three languages exists upstream. Adds a wire format you may not want.
  4. OpenAPI as the contract. Define request and response bodies in OpenAPI YAML, generate clients and server stubs for all three. Natural fit if HTTP is your only transport.

Strategy 1 wins for internal monorepos where the same team owns all three sides. Strategy 4 wins when external consumers want a documented API. Strategy 2 is the most flexible but the least batteries-included. Strategy 3 is the right call for binary wire formats or RPC-heavy systems. I'll walk through each.

Strategy 1: Rust as source of truth with typeshare

A colleague once called Rust 'the teammate who insists on the awkward design-review question' \u2014 and that instinct is exactly what you want writing your schema. Rust's type system is unforgiving enough that getting the schema right in Rust forces you to confront optionality and exhaustiveness up front \u2014 which is exactly the conversation you want happening before the wire format is set. Tools like typeshare read annotated Rust structs and enums and emit equivalent TypeScript, Kotlin, Swift, and Scala. For Python, a complementary tool is schemars, which emits JSON Schema that you then feed into a Pydantic generator.

Here is the typical shape:

// your_project/packages/shared-types/src/lib.rs
use serde::{Serialize, Deserialize};
use typeshare::typeshare;

#[typeshare]
#[derive(Serialize, Deserialize)]
#[serde(tag = "kind", rename_all = "snake_case")]
pub enum WsMessage {
    AgentHello {
        agent_id: String,
        version: String,
    },
    JobDispatch {
        job_id: String,
        payload: serde_json::Value,
    },
    JobResult {
        job_id: String,
        ok: bool,
        error: Option<String>,
    },
}

#[typeshare]
#[derive(Serialize, Deserialize)]
pub struct AgentSpec {
    pub agent_id: String,
    pub capabilities: Vec<String>,
    pub max_concurrent: u32,
}

A typeshare invocation in your build pipeline produces a TypeScript file the web client imports directly:

// generated: your_project/packages/shared-types/dist/index.ts
export type WsMessage =
  | { kind: "agent_hello"; agent_id: string; version: string }
  | { kind: "job_dispatch"; job_id: string; payload: unknown }
  | { kind: "job_result"; job_id: string; ok: boolean; error: string | null };

export interface AgentSpec {
  agent_id: string;
  capabilities: string[];
  max_concurrent: number;
}

For Python, the workflow is a hop longer. schemars emits JSON Schema, then datamodel-code-generator turns that into Pydantic v2 models:

# generated: your_project/packages/py-shared/wsmessage.py
from typing import Literal, Optional, Union
from pydantic import BaseModel, Field

class AgentHello(BaseModel):
    kind: Literal["agent_hello"]
    agent_id: str
    version: str

class JobDispatch(BaseModel):
    kind: Literal["job_dispatch"]
    job_id: str
    payload: dict

class JobResult(BaseModel):
    kind: Literal["job_result"]
    job_id: str
    ok: bool
    error: Optional[str] = None

WsMessage = Union[AgentHello, JobDispatch, JobResult]

This three-step chain has a measurable cost. In my experience, a cargo build plus typeshare plus datamodel-code-generator run takes about 8 to 12 seconds on a warm cache for ~50 shared types. That feels slow, but compare it to the alternative of hand-syncing three files and you recover the time inside one sprint. Choose Rust-canonical when the Rust crate is already the most thoroughly tested piece of your stack, which is the common case if the Rust component is a long-running daemon.

Strategy 2: JSON Schema as the neutral hub

When no language has a credible claim to ownership, JSON Schema sits in the middle. You write .schema.json files in a shared directory, and every language consumes them through its own generator. Rust uses schemafy or typify. Python uses datamodel-code-generator directly. TypeScript uses quicktype or json-schema-to-typescript.

The advantage is symmetry. No language is privileged, no language carries the codegen toolchain weight. The disadvantage is that JSON Schema is verbose and expressively weaker than Rust's type system. Discriminated unions require careful oneOf and discriminator placement, recursive types need $ref, and the resulting Rust often comes out less ergonomic than hand-written code.

A reasonable rule of thumb I've landed on: 3\u00d7 the line count of an equivalent Rust enum. For a project with ~100 shared types, that translates to roughly 600 lines of Rust versus 1800 lines of JSON Schema. The schema files are the source of truth so the bloat is paid once, but reviewing them in PRs is slower and less pleasant than reviewing Rust.

Pick this strategy when at least two of your three languages have an existing convention to consume JSON Schema (Python via Pydantic's model_json_schema() round-trip, TypeScript via OpenAPI tooling). It is also the right call when an external system already publishes its types as JSON Schema and you are consuming rather than authoring.

Strategy 3: Protobuf when you also want a wire format

If you need a binary wire format, Protobuf is the path of least resistance. The .proto file is the canonical definition. protobuf.dev and the official codegen pipeline produce idiomatic Rust (via prost), Python, and TypeScript (via protobuf-ts or ts-proto). A single .proto becomes three matching modules.

// your_project/schemas/ws_message.proto
syntax = "proto3";
package agent.v1;

message AgentHello {
  string agent_id = 1;
  string version = 2;
}

message JobDispatch {
  string job_id = 1;
  bytes payload = 2;
}

message JobResult {
  string job_id = 1;
  bool ok = 2;
  optional string error = 3;
}

message WsMessage {
  oneof kind {
    AgentHello agent_hello = 1;
    JobDispatch job_dispatch = 2;
    JobResult job_result = 3;
  }
}

Strengths: field tagging gives you forward and backward compatibility for free. Adding a new field to AgentHello does not break older clients. The wire format is 30% to 50% smaller than equivalent JSON, which matters if your daemon emits thousands of messages per second.

Weaknesses: Protobuf imposes a wire format whether you wanted one or not. If your stack is exclusively HTTP JSON, you now have a parsing layer that converts JSON to Protobuf and back, which negates most of the size win. Protobuf's oneof is also less ergonomic on the Python side than Rust's native enums. Pick Protobuf when you already have or want gRPC, or when you genuinely care about wire-size and CPU on the encode and decode path.

Strategy 4: OpenAPI for HTTP-only systems

If your shared types are entirely "what does this HTTP endpoint accept and return," skip language-level codegen and write an OpenAPI 3.1 spec. The Rust side uses utoipa or paperclip to either generate the spec from annotated handlers or vice versa. Python uses fastapi-codegen or, more commonly, FastAPI's built-in OpenAPI emission. TypeScript uses openapi-typescript or openapi-fetch to generate a typed client.

The trade-off worth thinking about: Strategy 4 covers HTTP request and response bodies but not WebSocket payloads, internal queue messages, or types that flow through Redis or a database. If 80% of your shared schemas are HTTP, OpenAPI buys you the highest leverage per unit of tooling investment. If your share rate is closer to 30% HTTP and 70% WebSocket, OpenAPI leaves a hole that Strategy 1 or Strategy 2 fills better.

Drift detection: the part everyone underestimates

Here's the uncomfortable truth I learned the hard way: codegen prevents drift only if codegen actually runs. The failure mode is predictable. A Rust developer edits a struct, forgets to run the regenerator, commits the Rust change alone. CI compiles the Rust crate (which still passes), the Python service deploys (still works against last week's generated artifact), and the contract cracks a week later when a new variant goes through. Three layers of defense are worth wiring up.

Layer 1: regenerate in CI and diff. Run codegen as a CI step. If the working tree differs from the committed artifact, fail the build. This is one shell line per language and catches 90% of drift in practice.

Layer 2: snapshot test the wire format. For each shared type, serialize a canonical example to JSON in each language and assert byte-identical output. This catches subtler drift like a field rename that the codegen handled correctly on the Rust side but stalled on the Pydantic side because of a stale generator version.

Layer 3: end-to-end contract test. Spin up the actual services in docker-compose, send a message round-trip through all three languages, assert the payload survives. Slower (10 to 30 seconds per test) but catches the cases the other two miss, like wire-format compatibility regressions across Protobuf revisions.

The numbers worth budgeting: layer 1 costs about 5 seconds of CI time and prevents most regressions. Layer 2 costs about 2 seconds per type and prevents tooling drift. Layer 3 costs 30 seconds plus the docker-compose warmup and prevents the last 5% of cross-language compatibility bugs. Most teams stop at layer 1 and pay for it in production. Three layers feels excessive until the day layer 3 saves a customer.

Which strategy to pick

A short decision tree:

  • All three languages owned by the same small team, types are mostly internal? Strategy 1 with Rust as canonical. Fewest moving parts, best type ergonomics on the Rust side.
  • External API consumers? Strategy 4 with OpenAPI. The spec is also the documentation.
  • Need a binary wire format or gRPC? Strategy 3 with Protobuf. The wire format earns its keep.
  • Schemas authored by people who do not write Rust, or external JSON Schema already in flight? Strategy 2 with JSON Schema as the hub.

Most polyglot monorepos end up running two strategies at once. Strategy 1 for internal WebSocket and queue payloads, Strategy 4 for HTTP-exposed endpoints. That hybrid is fine, but document which strategy owns which file so contributors do not invent a third.

The thing that makes any of these work long-term is not the tool. It is the CI gate that fails the build when the working tree drifts. Pick the strategy that matches your wire transport, wire the gate before the second contributor lands, and you'll spend the rest of the project tweaking types instead of debugging silent contract breaks.

References: