System Design Guide

Overview

This guide covers the process of turning product requirements into deployable, observable, and resilient distributed system architectures. Use it for greenfield architecture, scaling existing systems, design reviews, architecture decision records, or monolith-to-services migrations.

Design Process

1. Requirements

Clarify functional needs, non-functional targets (latency, throughput, durability), read/write ratio, peak traffic patterns, and geographic distribution. If the stakeholder cannot provide traffic numbers, estimate from user count: assume 10% DAU/MAU ratio, 5 requests per session, 80% of traffic in 8 hours (peak = 3x average).

2. Capacity estimation

Calculate QPS, storage growth, and bandwidth. Project at 1x, 5x, and 10x load. Identify the bottleneck resource. Use scripts/capacity_calculator.py for calculations. Always show your math — never state capacity without derivation.

3. High-level architecture

Map components, data stores, queues, caches, and external dependencies. Define sync vs async boundaries. Start with the fewest components possible — if 3 boxes solve it, do not draw 7.

4. Component deep-dive

Specify technology choices with justification. Define partitioning, replication, consistency model, and cache invalidation per store. Every technology choice must answer: "Why this over the simpler alternative?"

engineering-system-designer