Design WhatsApp (a Chat System) — System Design Interview Walkthrough

The thing that makes a chat system hard is real-time delivery: how do you get a message to a recipient instantly when they're online, reliably when they're offline, and to everyone when it's a group? Persistent connections (WebSocket) plus a connection registry and a durable message store answer all three. Everything else — receipts, ordering, history — hangs off that. This applies the standard system design framework to a real-time problem.

Unlike a URL shortener (request/response, read-heavy), chat is stateful and push-based — the server has to reach out to clients. That's the whole challenge.

1. Requirements

Functional: 1:1 messaging, group messaging, online/last-seen presence, delivery + read receipts, message history, and push notifications when the recipient is offline. Out of scope (say so): voice/video calls; media (mention it's stored in an object store + CDN like any blob, with a URL sent in the message).

Non-functional: low latency (messages feel instant), highly available, durable (never lose a message), ordered within a conversation, and able to handle hundreds of millions of concurrent connections.

2. The core challenge: connections

HTTP request/response can't push. So clients hold a persistent WebSocket to a fleet of connection (gateway) servers. The problem this creates: when User A sends to User B, which server holds B's connection?

Maintain a session registry (e.g., Redis): userId → {gatewayServerId, connectionId}, updated on connect/disconnect.
To deliver, look up B's gateway and route the message there (server-to-server), which pushes it down B's socket.

This connection layer is the heart of the design — call it out first.

3. Sending a message (the happy path)

A's client sends the message over its WebSocket to its gateway.
The gateway persists the message (durability before delivery — so a crash doesn't lose it) and assigns a per-conversation sequence number.
Look up B in the session registry.
- B online: route to B's gateway → push down B's socket → B's client acks → mark delivered.
- B offline: the message is already stored; enqueue a push notification (APNs/FCM). B pulls undelivered messages on reconnect.
Receipts (sent → delivered → read) are just small ack messages flowing back the same way.

4. Data model & storage

Chat is write-heavy (every message is a write) with time-ordered reads per conversation — a textbook fit for a wide-column store (Cassandra/HBase):

Data	Store	Key
Messages	Wide-column (Cassandra)	partition by `conversationId`, clustered by `sequence`/timestamp
Session registry	In-memory (Redis)	`userId` → gateway/connection
User/group metadata	Relational or KV	`userId`, `groupId` → members

Partitioning messages by conversationId keeps a conversation's history together and time-ordered, which is exactly the read pattern.

5. Ordering & delivery semantics

Ordering: guarantee it per conversation with a monotonic sequence number assigned server-side; don't trust client clocks. Global ordering across all conversations isn't needed.
Delivery: aim for at-least-once + idempotent client dedup (each message has a unique id), which is simpler and safer than exactly-once. The client drops duplicates by id.

6. Group messaging (the fan-out)

A group message is delivered to every member. For small groups (WhatsApp caps group size), fan out on send: write once, then deliver to each online member via their gateway and store-for-later for offline members. For very large broadcast groups you'd shift toward a pull model — but for typical group sizes, fan-out-on-send is fine. Naming the size threshold is the senior signal.

7. Bottlenecks & trade-offs to name unprompted

Connection scale: hundreds of millions of long-lived sockets → many gateway servers + a fast session registry; connections are the main capacity constraint, not CPU.
Durability vs latency: persist-before-deliver guarantees no message loss at the cost of a write on the hot path; this is the right trade for a messaging app.
Presence cost: true real-time presence for everyone is expensive; most systems use periodic heartbeats and show "last seen" rather than exact status.
Thundering herd on reconnect: when a gateway dies, its clients reconnect en masse — spread them across servers and rehydrate undelivered messages lazily.
Hot conversations: a huge group is a hot partition; cap group size or shard.

Why interviewers use this one

It moves you off the comfortable request/response model into stateful, push-based, real-time territory — connection management, delivery guarantees, and ordering — which a CRUD-only candidate hasn't thought about. It's the same 7-step framework, applied to a problem where the server must reach the client.

Written by Amit Singh — Senior SDE at Amazon, Claude Certified Architect, and founder of AlgoEngineer. We run live mock system-design interviews on exactly these problems in our System Design course.