Code Crunch Labs · Tier IISub-brand · Mesh24 weeks · semesterGPL-3.0

Crunch Mesh.

Twenty-four weeks of production-grade distributed systems. From CAP, PACELC, and Raft on day one to a polyglot, multi-region platform on graduation: gRPC and Protobuf as the typed contract surface, Kafka and NATS for the event spine, Temporal for the long-running workflows, Istio and Linkerd at the network edge, OpenTelemetry through every seam, sagas with exactly-once consumers, and an active-active deployment that survives a region.

24weeks
Program length
864hrs
Total workload
24+1
Labs + capstone
$0
Tuition · always

§ I · The Program

Architectures that hold.

Crunch Mesh is the microservices and distributed-systems specialization of the Code Crunch academy — the Labs-tier course for engineers who have already shipped backend systems and now need to design the platforms those systems live on. It begins with the literature: CAP, PACELC, FLP, Raft, Lamport, CRDTs. It ends with a polyglot platform spread across two regions, instrumented end-to-end, and submitted to chaos drills with a postmortem.

You author Protobuf as a moral position, not a serialization format. You operate Kafka, Redpanda, and NATS JetStream side by side and pick a partition key with evidence. You run Temporal for the workflows that orchestration owns and sagas for the workflows that choreography owns. You stand Istio up, migrate services into it, and roll a weighted canary while a 200 ms fault is injected at the mesh layer. You watch a trace cross a Kafka boundary and a metric exemplar jump to the span that produced it. By week 24 you can defend an architecture at staff level — with the diagram, the tradeoffs, and the evidence.

"Distributed-systems engineering is dominated by reasoning errors more than coding errors, and the corrective is reading."— Crunch Mesh, course charter

§ II · Who It's For

Four engineers, one platform.

Mesh is opinionated about its audience. C16 (Crunch Pro Backend) is the floor — you should have shipped a non-trivial gRPC or REST service and been on-call for it in production.

No. 01

The Senior Backend, Leveling to Staff

You ship features, you own services, you mentor — but you have not yet been the person in the room who decides whether the system is correct under partition. Mesh closes that gap.

No. 02

The SRE Bridging Into Architecture

You run the platform and you know where it bleeds. Mesh gives you the design vocabulary, the consensus-and-replication theory, and the contract design skills to drive architecture forward.

No. 03

The Cloud Platform Engineer

You know how to wire managed services together. Now you need to know why each one behaves the way it does — and what to build on bare Kubernetes when you are the one shipping the equivalent.

No. 04

The Principal-Track New Grad

If the goal is staff-track at a FAANG-scale or hyper-scale backend, you need this material before the interview loop, not after. Mesh is the loop's preparation, not its consolation.

§ III · Four Phases

From the literature to the live region.

The arc of the program is composed in four phases — six weeks each — each building on the last like load on a leader.

Phase I · Wk. 01—06

Theory & Single-Service

The literature first: CAP, PACELC, FLP, consensus, Lamport and vector clocks, CRDTs, leases. Then microservice fundamentals — bounded contexts, Conway's law, decomposition heuristics. Then your first hardened single service in Go and Python, with typed gRPC contracts and baseline OpenTelemetry.

Phase II · Wk. 07—12

Service Mesh & Eventing

Multiple services. The network as a substrate. Envoy, BFFs, API gateways. Istio in anger, Linkerd as the alternative, Cilium on eBPF for comparison. Kafka and Redpanda; NATS JetStream and Pulsar. Exactly-once via outbox plus idempotency. Temporal for long-running workflows and orchestrated sagas.

Phase III · Wk. 13—18

Data & Reliability

Postgres logical replication and partitioning. Debezium CDC. CQRS and event sourcing. Iceberg on Trino as the lakehouse contract. Caching with Redis and Dragonfly. The OpenTelemetry pipeline through Prometheus, Thanos, Tempo, Loki, Grafana. SLI/SLO discipline, error budgets, circuit breakers, the Universal Scalability Law.

Phase IV · Wk. 19—24

Production & Capstone

Multi-region active-active with quorum-aware writes. CRDT conflict resolution in production. Zero-trust networking with SPIFFE, SPIRE, and OPA. Chaos engineering with chaos-mesh; a 90-minute gameday. Pact contract testing across the polyglot surface. Capstone: the Polyglot Marketplace Backbone, defended in front of two external reviewers.

§ IV · The Curriculum

Twenty-four weeks, week by week.

Each entry corresponds to a folder in the GitHub repository with lecture notes, the primary reading list, a hands-on lab, a quiz, and a reflective writeup. Detailed acceptance criteria live in the syllabus.

01

The Literature: CAP, PACELC, and FLP

CAP theorem · PACELC reformulation · FLP impossibility · linearizable / sequential / causal / eventual · safety vs liveness.

Lab 01

Two-node toy register in Go with simulated partition

02

Time, Order, and Consensus

Physical vs logical clocks · Lamport timestamps · vector clocks · leases and fencing tokens · Raft deeply · Paxos overview · etcd, ZooKeeper, Consul.

Lab 02

Vector-clock chat log + 3-node etcd on Kind

03

CRDTs & Conflict Resolution

State-based vs operation-based · G-counter, PN-counter, OR-set, LWW-register · Riak, AntidoteDB · when LWW is a footgun.

Lab 03

OR-set shopping cart in Rust; 3-way partition heal

04

Microservice Fundamentals & Decomposition

Bounded contexts · Conway's law and the inverse maneuver · decomposition heuristics · anti-patterns: distributed monolith, shared database, the entity service.

Lab 04

Decompose a 40 kLOC Django monolith · written memo

05

API Contracts: gRPC & Protobuf

Protobuf wire format · schema evolution · gRPC unary, streaming, bidi · gRPC-Web · interceptors · the cost of REST drift.

Lab 05

cart.v1 Protobuf · Go server · Python client · grpcurl

06

The Single Hardened Service

Twelve-factor reviewed · structured JSON logs · graceful shutdown · readiness/liveness · baseline OpenTelemetry · the runbook as deliverable.

Lab 06

cart-service to production-ready · Helm chart · runbook

07

BFFs, Gateways, and Envoy

API gateway vs mesh ingress · Envoy filters, listeners, clusters, EDS/CDS/LDS/RDS · the BFF pattern · gRPC-Web vs Connect · rate limiting.

Lab 07

Envoy ingress · rate limit · hedged retries · BFF in Go

08

Istio in Production

istiod · sidecar vs ambient · mTLS by default · AuthorizationPolicy · VirtualService and DestinationRule · weighted canary · mesh-layer fault injection.

Lab 08

Istio on Kind · 10/90/50/50 canary · 200 ms fault

09

Linkerd, Cilium & the Alternatives

linkerd2-proxy in Rust · sidecar vs sidecar-less · Cilium service mesh on eBPF · honest comparison · when each one wins.

Lab 09

Three meshes benchmarked · 1-page ADR

10

Eventing: Kafka and Redpanda

The log abstraction · partitions, offsets, consumer groups · ISR · Raft-per-partition · retention and compaction · partition key design.

Lab 10

3-broker Strimzi Kafka · order.placed.v1 · Redpanda compare

11

NATS JetStream, Pulsar & Exactly-Once

NATS core vs JetStream · Pulsar tiered storage · idempotency keys · transactional outbox vs Kafka transactions · the pragmatic possibility of EOS.

Lab 11

Outbox in Postgres → Debezium → Kafka → idempotent Rust consumer

12

Temporal & Workflow Orchestration

Temporal frontend/history/matching/worker · deterministic replay · signals, queries, child workflows · orchestration vs choreography for sagas.

Lab 12

Temporal saga in Go · reserve → charge → ship · compensation

13

Postgres at Scale

Logical vs physical replication · declarative partitioning · HOT updates and bloat · pg_stat_statements · pgBouncer · Citus and CockroachDB.

Lab 13

Primary + 2 logical replicas · partition orders by month · 50 M rows

14

CDC, CQRS & Event Sourcing

Debezium connectors · CDC vs dual-write · CQRS in earnest · event sourcing and its costs · materialized views · aggregate vs event-driven.

Lab 14

Debezium → Kafka → Elasticsearch read model + Iceberg sink

15

The Modern Lakehouse: Iceberg & Trino

Row-store vs column-store · OLAP/OLTP boundaries · Iceberg table format · Trino as a federated engine · the role of dbt.

Lab 15

MinIO + Nessie + Iceberg + Trino · dbt revenue rollup

16

Caching: Redis, Memcached, Dragonfly

Look-aside, read-through, write-through, write-back · request coalescing · probabilistic early expiration · Redis Cluster hash slots · the licensing saga.

Lab 16

Cart-read cache · k6 stampede · migrate to Dragonfly

17

Observability: OpenTelemetry End-to-End

OTel SDKs in Go/Python/Rust · context propagation across HTTP, gRPC, Kafka · RED metrics · exemplars · Prometheus + Thanos · Tempo · Loki · Grafana.

Lab 17

Full OTel pipeline · exemplar dashboard · trace across Kafka

18

Reliability: SLOs, Patterns & the USL

SLIs that mean something · error budgets · circuit breakers · bulkheads · timeouts · retries with jitter · backpressure · load shedding · HPA + KEDA · Universal Scalability Law · tail latency.

Lab 18

Cart SLOs · circuit breaker · KEDA on lag · saturation point

19

Multi-Region: Active-Active & Active-Passive

Quorum across regions · replication-lag budgets · geo-routing (DNS, anycast, GSLB) · session affinity · the data-gravity problem.

Lab 19

Two Kind regions · logical Postgres replication · 60 s RTO failover

20

CRDTs in Production

Production-grade CRDT stacks · LWW vs merge semantics by field · vector-clock-driven application-layer resolution · per-field consistency models.

Lab 20

Active-active cart on OR-set · 5-min partition · verify convergence

21

Zero-Trust: SPIFFE, SPIRE, OPA

SPIFFE workload identities · SPIRE deployment · SVID issuance · rotating mTLS without downtime · OPA / Gatekeeper · Kyverno alternatives.

Lab 21

SPIRE in both clusters · SVIDs via Istio · OPA admission policy

22

Chaos Engineering & the Gameday

Netflix's four principles · chaos-mesh, Litmus, Gremlin · gameday playbook · blameless postmortems · the five-whys debate.

Lab 22

Six chaos experiments · 90-minute gameday · postmortem per finding

23

Contracts, Properties & Capacity

Pact consumer-driven contracts · property-based testing (Hypothesis, gopter, proptest) · fault injection at the unit level · USL, Little's Law · cost-aware design.

Lab 23

Pact suite cart↔inventory↔payment · CRDT property tests · capacity memo

24

Capstone Integration & Architecture Review

Final integration · mock staff-engineer design review · 12-minute demo · published postmortem · capstone defense in front of external reviewers.

Capstone

Polyglot Marketplace Backbone · two regions · two chaos drills

§ V · The Toolchain

Open-source first, no managed shortcuts.

Every primary tool below is open-source. Managed services from GCP and AWS are taught as production scale paths — never as the only path. Every commercial vendor has a graded comparison against its open-source equivalent.

RPC
gRPC
HTTP/2 · unary, streaming, bidi
Contracts
Protobuf
typed wire · evolution rules
Event spine
Kafka
log · partitions · ISR
Mesh
Istio
mTLS · canary · ambient
Workflows
Temporal
deterministic replay · sagas
Database
Postgres
logical replication · partitioning
Observability
OpenTelemetry
traces · metrics · logs · exemplars
Substrate
Kubernetes
Kind · k3d · GKE · EKS
Metrics
Prometheus
RED + USE · Thanos long-term
Dashboards
Grafana
single pane · exemplar jumps
Proxy
Envoy
filters · EDS/CDS/LDS/RDS
Messaging
NATS JetStream
subjects · low-latency · EOS

§ VI · Skills You Will Carry

What you walk away with.

By the end of Week 24, you are able to do each of the following — credibly, on a live system, in front of a staff-level reviewer.

  • Decompose a monolithic backend into bounded contexts without creating a distributed monolith.
  • Design a polyglot topology in Go, Python, and Rust over a single typed Protobuf surface.
  • Author gRPC contracts with backward- and forward-compatible schema evolution.
  • Choose between Kafka, Redpanda, and NATS JetStream — and defend the choice on retention, throughput, and ops profile.
  • Implement exactly-once event processing with idempotency keys, outbox tables, and consumer offsets from first principles.
  • Operate Istio or Linkerd in production: mTLS strict, weighted canary, progressive delivery on SLO.
  • Author Temporal workflows for sagas and explain when orchestration beats choreography.
  • Run a Debezium CDC pipeline into both an OLTP read model and an Iceberg-on-Trino lakehouse.
  • Instrument a polyglot system end-to-end in OpenTelemetry with exemplars linking metrics to traces.
  • Define SLIs and SLOs that mean something and manage error budgets against product pressure.
  • Apply CAP, PACELC, FLP, and the consensus literature to design choices, not trivia.
  • Operate an active-active multi-region deployment with quorum-aware writes and CRDT state.
  • Stand up a zero-trust network with SPIFFE/SPIRE identities and OPA policy-as-code.
  • Run a 90-minute gameday with chaos-mesh and write a publishable blameless postmortem.
  • Author Pact consumer-driven contract tests across a polyglot system.
  • Lead a staff-engineer system-design conversation and defend an architecture under cross-examination.

§ VII · The Capstone

One platform. Two regions. Two chaos drills.

The final four weeks of the course are a single substantial system — the Polyglot Marketplace Backbone — built and run as if it were a real backend platform. Architecture document, live deploy with weighted canary, recorded demo, two chaos-drill postmortems.

Capstone Brief

Polyglot Marketplace Backbone

Build and operate the backend platform for a fictional online marketplace as a real two-region system. Services are polyglot by design — Rust for the CRDT cart, Go for the inventory, payment, and BFFs, Python for the order, search, and analytics services — all bound by a single typed Protobuf surface and a Kafka event spine with exactly-once consumers.

  • gRPC + Protobuf everywhere, with cart.v1, inventory.v1, payment.v1, order.v1 packages versioned independently.
  • Kafka event spine with exactly-once consumers via outbox tables and idempotency keys.
  • Temporal workflows for charge / refund / reversal and the order saga, with deterministic replay.
  • Postgres primary per region with logical replication and a Debezium CDC pipeline feeding Elasticsearch and Iceberg.
  • Istio service mesh with mTLS strict, SPIFFE identities via SPIRE, and OPA admission policy.
  • Progressive delivery: weighted canary with automatic rollback on SLO breach.
  • Active-active across two regions with an OR-set CRDT cart and geo-routed reads.
  • OpenTelemetry pipeline: traces to Tempo, metrics to Prometheus + Thanos, logs to Loki, Grafana with exemplars.
  • A published Pact contract test broker with green contracts across the polyglot surface.
  • Two mandatory chaos drills: region failover under 1k RPS, and Kafka broker loss mid-traffic — each with a blameless postmortem.

§ VIII · Getting Started

Three commands. Then begin.

The setup is intentionally lightweight. If you have a 16 GB laptop, Docker, and a local Kubernetes via Kind or k3d, you can begin Week 1 today. Cloud labs do not arrive until phase 3, and even then most can stay local.

# 1. Clone the curriculum repository
git clone https://github.com/CODE-CRUNCH-WORLDWIDE/C22-CRUNCH-MESH.git
cd C22-CRUNCH-MESH

# 2. Verify the local substrate (Docker + Kind or k3d + kubectl)
docker version
kind version          # or: k3d version
kubectl version --client

# 3. Open Week 1 README and begin
$EDITOR curriculum/week-01-cap-pacelc-flp/README.md

Need the full prerequisite quiz, the cloud budget plan, or the recommended reading list? See the README and the syllabus.

§ IX · Frequently Asked

Questions, anticipated.

Do I need a cloud account to start?

Not for most of the course. Phases 1, 2, and 3 run on Kind, k3d, or minikube locally — 16 GB of RAM is enough through week 11, and 32 GB is strongly preferred from week 12 onward. Phase 4 multi-region labs benefit from two GKE Autopilot or EKS clusters in different regions for 4–8 hours at a time; budget approximately USD 40–80 across phases 3 and 4. The capstone can stay fully local if you accept simulated multi-region (two Kind clusters with a routed control plane).

Why six weeks of theory before any service?

Because distributed-systems engineering is dominated by reasoning errors more than coding errors. CAP, PACELC, FLP, Lamport, Raft, and the CRDT literature are not trivia — they are the design vocabulary you reach for when the system is wrong under partition. By the time you write your first hardened service in week 6, you have the conceptual scaffolding to defend every decision. The ordering is intentional and is defended in the charter.

How does this relate to C16 (Crunch Pro Backend) and C15 (Crunch DevOps)?

C16 is the floor — you should have shipped a non-trivial gRPC or REST service in production and been on-call for it. C15 gives you the Docker, Kubernetes, CI/CD, and Terraform fluency that lets the mesh and eventing labs proceed without a parallel infrastructure course. The canonical path is C1 → C15 → C16 → C22. Two or more years of production backend experience is an accepted equivalent.

Why is the capstone polyglot — Go, Python, and Rust?

Because real platforms are polyglot, and the moment you accept that, contract design stops being a stylistic preference and becomes a load-bearing decision. The Rust cart-service forces you to confront CRDT semantics in a type system that takes them seriously. The Go services let you ship velocity where velocity is the point. The Python services exercise the boundary between control planes and analytical pipelines. The single Protobuf surface that binds them is the moral position the course makes you defend.

Why no LLM-branded architecture?

Crunch Mesh is a distributed-systems course, not a vector-store course. Inference pipelines, embeddings, and retrieval are first-class topics in C5 (Crunch AI & Data Science). What Mesh teaches is the substrate underneath them — exactly-once events, multi-region state, service mesh, observability — which is precisely the substrate every LLM platform team ends up building. The two courses pair naturally; one is not a substitute for the other.

Will this prepare me for a staff interview loop?

Yes. The course closes with a career engineering pack targeted at the staff-track loop: twelve mock system-design problems at FAANG scale (news feed, ride-hailing dispatch, payments ledger, ad auction, real-time gaming presence, and seven more), a 40-question oral quiz bank on CAP, PACELC, FLP, Raft, CRDTs, EOS, and OpenTelemetry semantics, three STAR-format on-call narratives from your gameday and capstone work, the capstone runbook revised into a portfolio artifact, and two live mocks with external reviewers.

§ X · Begin

Twenty-four weeks from now,
you will defend an architecture.

Open the repository. Read Week 1. The diagram is yours to draw.