The Literature: CAP, PACELC, and FLP
CAP theorem · PACELC reformulation · FLP impossibility · linearizable / sequential / causal / eventual · safety vs liveness.
Two-node toy register in Go with simulated partition
Twenty-four weeks of production-grade distributed systems. From CAP, PACELC, and Raft on day one to a polyglot, multi-region platform on graduation: gRPC and Protobuf as the typed contract surface, Kafka and NATS for the event spine, Temporal for the long-running workflows, Istio and Linkerd at the network edge, OpenTelemetry through every seam, sagas with exactly-once consumers, and an active-active deployment that survives a region.
§ I · The Program
Crunch Mesh is the microservices and distributed-systems specialization of the Code Crunch academy — the Labs-tier course for engineers who have already shipped backend systems and now need to design the platforms those systems live on. It begins with the literature: CAP, PACELC, FLP, Raft, Lamport, CRDTs. It ends with a polyglot platform spread across two regions, instrumented end-to-end, and submitted to chaos drills with a postmortem.
You author Protobuf as a moral position, not a serialization format. You operate Kafka, Redpanda, and NATS JetStream side by side and pick a partition key with evidence. You run Temporal for the workflows that orchestration owns and sagas for the workflows that choreography owns. You stand Istio up, migrate services into it, and roll a weighted canary while a 200 ms fault is injected at the mesh layer. You watch a trace cross a Kafka boundary and a metric exemplar jump to the span that produced it. By week 24 you can defend an architecture at staff level — with the diagram, the tradeoffs, and the evidence.
"Distributed-systems engineering is dominated by reasoning errors more than coding errors, and the corrective is reading."— Crunch Mesh, course charter
§ II · Who It's For
Mesh is opinionated about its audience. C16 (Crunch Pro Backend) is the floor — you should have shipped a non-trivial gRPC or REST service and been on-call for it in production.
You ship features, you own services, you mentor — but you have not yet been the person in the room who decides whether the system is correct under partition. Mesh closes that gap.
You run the platform and you know where it bleeds. Mesh gives you the design vocabulary, the consensus-and-replication theory, and the contract design skills to drive architecture forward.
You know how to wire managed services together. Now you need to know why each one behaves the way it does — and what to build on bare Kubernetes when you are the one shipping the equivalent.
If the goal is staff-track at a FAANG-scale or hyper-scale backend, you need this material before the interview loop, not after. Mesh is the loop's preparation, not its consolation.
§ III · Four Phases
The arc of the program is composed in four phases — six weeks each — each building on the last like load on a leader.
The literature first: CAP, PACELC, FLP, consensus, Lamport and vector clocks, CRDTs, leases. Then microservice fundamentals — bounded contexts, Conway's law, decomposition heuristics. Then your first hardened single service in Go and Python, with typed gRPC contracts and baseline OpenTelemetry.
Multiple services. The network as a substrate. Envoy, BFFs, API gateways. Istio in anger, Linkerd as the alternative, Cilium on eBPF for comparison. Kafka and Redpanda; NATS JetStream and Pulsar. Exactly-once via outbox plus idempotency. Temporal for long-running workflows and orchestrated sagas.
Postgres logical replication and partitioning. Debezium CDC. CQRS and event sourcing. Iceberg on Trino as the lakehouse contract. Caching with Redis and Dragonfly. The OpenTelemetry pipeline through Prometheus, Thanos, Tempo, Loki, Grafana. SLI/SLO discipline, error budgets, circuit breakers, the Universal Scalability Law.
Multi-region active-active with quorum-aware writes. CRDT conflict resolution in production. Zero-trust networking with SPIFFE, SPIRE, and OPA. Chaos engineering with chaos-mesh; a 90-minute gameday. Pact contract testing across the polyglot surface. Capstone: the Polyglot Marketplace Backbone, defended in front of two external reviewers.
§ IV · The Curriculum
Each entry corresponds to a folder in the GitHub repository with lecture notes, the primary reading list, a hands-on lab, a quiz, and a reflective writeup. Detailed acceptance criteria live in the syllabus.
CAP theorem · PACELC reformulation · FLP impossibility · linearizable / sequential / causal / eventual · safety vs liveness.
Two-node toy register in Go with simulated partition
Physical vs logical clocks · Lamport timestamps · vector clocks · leases and fencing tokens · Raft deeply · Paxos overview · etcd, ZooKeeper, Consul.
Vector-clock chat log + 3-node etcd on Kind
State-based vs operation-based · G-counter, PN-counter, OR-set, LWW-register · Riak, AntidoteDB · when LWW is a footgun.
OR-set shopping cart in Rust; 3-way partition heal
Bounded contexts · Conway's law and the inverse maneuver · decomposition heuristics · anti-patterns: distributed monolith, shared database, the entity service.
Decompose a 40 kLOC Django monolith · written memo
Protobuf wire format · schema evolution · gRPC unary, streaming, bidi · gRPC-Web · interceptors · the cost of REST drift.
cart.v1 Protobuf · Go server · Python client · grpcurl
Twelve-factor reviewed · structured JSON logs · graceful shutdown · readiness/liveness · baseline OpenTelemetry · the runbook as deliverable.
cart-service to production-ready · Helm chart · runbook
API gateway vs mesh ingress · Envoy filters, listeners, clusters, EDS/CDS/LDS/RDS · the BFF pattern · gRPC-Web vs Connect · rate limiting.
Envoy ingress · rate limit · hedged retries · BFF in Go
istiod · sidecar vs ambient · mTLS by default · AuthorizationPolicy · VirtualService and DestinationRule · weighted canary · mesh-layer fault injection.
Istio on Kind · 10/90/50/50 canary · 200 ms fault
linkerd2-proxy in Rust · sidecar vs sidecar-less · Cilium service mesh on eBPF · honest comparison · when each one wins.
Three meshes benchmarked · 1-page ADR
The log abstraction · partitions, offsets, consumer groups · ISR · Raft-per-partition · retention and compaction · partition key design.
3-broker Strimzi Kafka · order.placed.v1 · Redpanda compare
NATS core vs JetStream · Pulsar tiered storage · idempotency keys · transactional outbox vs Kafka transactions · the pragmatic possibility of EOS.
Outbox in Postgres → Debezium → Kafka → idempotent Rust consumer
Temporal frontend/history/matching/worker · deterministic replay · signals, queries, child workflows · orchestration vs choreography for sagas.
Temporal saga in Go · reserve → charge → ship · compensation
Logical vs physical replication · declarative partitioning · HOT updates and bloat · pg_stat_statements · pgBouncer · Citus and CockroachDB.
Primary + 2 logical replicas · partition orders by month · 50 M rows
Debezium connectors · CDC vs dual-write · CQRS in earnest · event sourcing and its costs · materialized views · aggregate vs event-driven.
Debezium → Kafka → Elasticsearch read model + Iceberg sink
Row-store vs column-store · OLAP/OLTP boundaries · Iceberg table format · Trino as a federated engine · the role of dbt.
MinIO + Nessie + Iceberg + Trino · dbt revenue rollup
Look-aside, read-through, write-through, write-back · request coalescing · probabilistic early expiration · Redis Cluster hash slots · the licensing saga.
Cart-read cache · k6 stampede · migrate to Dragonfly
OTel SDKs in Go/Python/Rust · context propagation across HTTP, gRPC, Kafka · RED metrics · exemplars · Prometheus + Thanos · Tempo · Loki · Grafana.
Full OTel pipeline · exemplar dashboard · trace across Kafka
SLIs that mean something · error budgets · circuit breakers · bulkheads · timeouts · retries with jitter · backpressure · load shedding · HPA + KEDA · Universal Scalability Law · tail latency.
Cart SLOs · circuit breaker · KEDA on lag · saturation point
Quorum across regions · replication-lag budgets · geo-routing (DNS, anycast, GSLB) · session affinity · the data-gravity problem.
Two Kind regions · logical Postgres replication · 60 s RTO failover
Production-grade CRDT stacks · LWW vs merge semantics by field · vector-clock-driven application-layer resolution · per-field consistency models.
Active-active cart on OR-set · 5-min partition · verify convergence
SPIFFE workload identities · SPIRE deployment · SVID issuance · rotating mTLS without downtime · OPA / Gatekeeper · Kyverno alternatives.
SPIRE in both clusters · SVIDs via Istio · OPA admission policy
Netflix's four principles · chaos-mesh, Litmus, Gremlin · gameday playbook · blameless postmortems · the five-whys debate.
Six chaos experiments · 90-minute gameday · postmortem per finding
Pact consumer-driven contracts · property-based testing (Hypothesis, gopter, proptest) · fault injection at the unit level · USL, Little's Law · cost-aware design.
Pact suite cart↔inventory↔payment · CRDT property tests · capacity memo
Final integration · mock staff-engineer design review · 12-minute demo · published postmortem · capstone defense in front of external reviewers.
Polyglot Marketplace Backbone · two regions · two chaos drills
§ V · The Toolchain
Every primary tool below is open-source. Managed services from GCP and AWS are taught as production scale paths — never as the only path. Every commercial vendor has a graded comparison against its open-source equivalent.
§ VI · Skills You Will Carry
By the end of Week 24, you are able to do each of the following — credibly, on a live system, in front of a staff-level reviewer.
§ VII · The Capstone
The final four weeks of the course are a single substantial system — the Polyglot Marketplace Backbone — built and run as if it were a real backend platform. Architecture document, live deploy with weighted canary, recorded demo, two chaos-drill postmortems.
Capstone Brief
Build and operate the backend platform for a fictional online marketplace as a real two-region system. Services are polyglot by design — Rust for the CRDT cart, Go for the inventory, payment, and BFFs, Python for the order, search, and analytics services — all bound by a single typed Protobuf surface and a Kafka event spine with exactly-once consumers.
cart.v1, inventory.v1, payment.v1, order.v1 packages versioned independently.§ VIII · Getting Started
The setup is intentionally lightweight. If you have a 16 GB laptop, Docker, and a local Kubernetes via Kind or k3d, you can begin Week 1 today. Cloud labs do not arrive until phase 3, and even then most can stay local.
# 1. Clone the curriculum repository git clone https://github.com/CODE-CRUNCH-WORLDWIDE/C22-CRUNCH-MESH.git cd C22-CRUNCH-MESH # 2. Verify the local substrate (Docker + Kind or k3d + kubectl) docker version kind version # or: k3d version kubectl version --client # 3. Open Week 1 README and begin $EDITOR curriculum/week-01-cap-pacelc-flp/README.md
Need the full prerequisite quiz, the cloud budget plan, or the recommended reading list? See the README and the syllabus.
§ IX · Frequently Asked
Not for most of the course. Phases 1, 2, and 3 run on Kind, k3d, or minikube locally — 16 GB of RAM is enough through week 11, and 32 GB is strongly preferred from week 12 onward. Phase 4 multi-region labs benefit from two GKE Autopilot or EKS clusters in different regions for 4–8 hours at a time; budget approximately USD 40–80 across phases 3 and 4. The capstone can stay fully local if you accept simulated multi-region (two Kind clusters with a routed control plane).
Because distributed-systems engineering is dominated by reasoning errors more than coding errors. CAP, PACELC, FLP, Lamport, Raft, and the CRDT literature are not trivia — they are the design vocabulary you reach for when the system is wrong under partition. By the time you write your first hardened service in week 6, you have the conceptual scaffolding to defend every decision. The ordering is intentional and is defended in the charter.
C16 is the floor — you should have shipped a non-trivial gRPC or REST service in production and been on-call for it. C15 gives you the Docker, Kubernetes, CI/CD, and Terraform fluency that lets the mesh and eventing labs proceed without a parallel infrastructure course. The canonical path is C1 → C15 → C16 → C22. Two or more years of production backend experience is an accepted equivalent.
Because real platforms are polyglot, and the moment you accept that, contract design stops being a stylistic preference and becomes a load-bearing decision. The Rust cart-service forces you to confront CRDT semantics in a type system that takes them seriously. The Go services let you ship velocity where velocity is the point. The Python services exercise the boundary between control planes and analytical pipelines. The single Protobuf surface that binds them is the moral position the course makes you defend.
Crunch Mesh is a distributed-systems course, not a vector-store course. Inference pipelines, embeddings, and retrieval are first-class topics in C5 (Crunch AI & Data Science). What Mesh teaches is the substrate underneath them — exactly-once events, multi-region state, service mesh, observability — which is precisely the substrate every LLM platform team ends up building. The two courses pair naturally; one is not a substitute for the other.
Yes. The course closes with a career engineering pack targeted at the staff-track loop: twelve mock system-design problems at FAANG scale (news feed, ride-hailing dispatch, payments ledger, ad auction, real-time gaming presence, and seven more), a 40-question oral quiz bank on CAP, PACELC, FLP, Raft, CRDTs, EOS, and OpenTelemetry semantics, three STAR-format on-call narratives from your gameday and capstone work, the capstone runbook revised into a portfolio artifact, and two live mocks with external reviewers.
§ X · Begin
Open the repository. Read Week 1. The diagram is yours to draw.