Back to Blog
System DesignInterview TipsDistributed SystemsFAANG

Design WhatsApp (a Chat System) — System Design Interview Walkthrough

Amit Singh

Amit Singh

Author

June 25, 2026
13 min read

The thing that makes a chat system hard is real-time delivery: how do you get a message to a recipient instantly when they're online, reliably when they're offline, and to everyone when it's a group? Persistent connections (WebSocket) plus a connection registry and a durable message store answer all three. Everything else — receipts, ordering, history — hangs off that. This applies the standard system design framework to a real-time problem.

Unlike a URL shortener (request/response, read-heavy), chat is stateful and push-based — the server has to reach out to clients. That's the whole challenge.

1. Requirements

Functional: 1:1 messaging, group messaging, online/last-seen presence, delivery + read receipts, message history, and push notifications when the recipient is offline. Out of scope (say so): voice/video calls; media (mention it's stored in an object store + CDN like any blob, with a URL sent in the message).

Non-functional: low latency (messages feel instant), highly available, durable (never lose a message), ordered within a conversation, and able to handle hundreds of millions of concurrent connections.

2. The core challenge: connections

HTTP request/response can't push. So clients hold a persistent WebSocket to a fleet of connection (gateway) servers. The problem this creates: when User A sends to User B, which server holds B's connection?

  • Maintain a session registry (e.g., Redis): userId → {gatewayServerId, connectionId}, updated on connect/disconnect.
  • To deliver, look up B's gateway and route the message there (server-to-server), which pushes it down B's socket.

This connection layer is the heart of the design — call it out first.

3. Sending a message (the happy path)

  1. A's client sends the message over its WebSocket to its gateway.
  2. The gateway persists the message (durability before delivery — so a crash doesn't lose it) and assigns a per-conversation sequence number.
  3. Look up B in the session registry.
    • B online: route to B's gateway → push down B's socket → B's client acks → mark delivered.
    • B offline: the message is already stored; enqueue a push notification (APNs/FCM). B pulls undelivered messages on reconnect.
  4. Receipts (sent → delivered → read) are just small ack messages flowing back the same way.

4. Data model & storage

Chat is write-heavy (every message is a write) with time-ordered reads per conversation — a textbook fit for a wide-column store (Cassandra/HBase):

DataStoreKey
MessagesWide-column (Cassandra)partition by conversationId, clustered by sequence/timestamp
Session registryIn-memory (Redis)userId → gateway/connection
User/group metadataRelational or KVuserId, groupId → members

Partitioning messages by conversationId keeps a conversation's history together and time-ordered, which is exactly the read pattern.

5. Ordering & delivery semantics

  • Ordering: guarantee it per conversation with a monotonic sequence number assigned server-side; don't trust client clocks. Global ordering across all conversations isn't needed.
  • Delivery: aim for at-least-once + idempotent client dedup (each message has a unique id), which is simpler and safer than exactly-once. The client drops duplicates by id.

6. Group messaging (the fan-out)

A group message is delivered to every member. For small groups (WhatsApp caps group size), fan out on send: write once, then deliver to each online member via their gateway and store-for-later for offline members. For very large broadcast groups you'd shift toward a pull model — but for typical group sizes, fan-out-on-send is fine. Naming the size threshold is the senior signal.

7. Bottlenecks & trade-offs to name unprompted

  • Connection scale: hundreds of millions of long-lived sockets → many gateway servers + a fast session registry; connections are the main capacity constraint, not CPU.
  • Durability vs latency: persist-before-deliver guarantees no message loss at the cost of a write on the hot path; this is the right trade for a messaging app.
  • Presence cost: true real-time presence for everyone is expensive; most systems use periodic heartbeats and show "last seen" rather than exact status.
  • Thundering herd on reconnect: when a gateway dies, its clients reconnect en masse — spread them across servers and rehydrate undelivered messages lazily.
  • Hot conversations: a huge group is a hot partition; cap group size or shard.

Why interviewers use this one

It moves you off the comfortable request/response model into stateful, push-based, real-time territory — connection management, delivery guarantees, and ordering — which a CRUD-only candidate hasn't thought about. It's the same 7-step framework, applied to a problem where the server must reach the client.


Written by Amit Singh — Senior SDE at Amazon, Claude Certified Architect, and founder of AlgoEngineer. We run live mock system-design interviews on exactly these problems in our System Design course.

Ready to Ace Your Interviews?

Join thousands of students who have successfully landed their dream jobs at FAANG companies.