The key insight for designing YouTube is that it's two systems wearing one logo: a heavy video pipeline (upload → transcode → store → CDN) and a lightweight metadata service. Keep them separate, serve the bytes from a CDN (never your servers), and make transcoding asynchronous, and the design falls into place. This applies the standard system design framework to a storage- and bandwidth-bound problem.
The mistake candidates make is treating video like normal data. It isn't — a single file is gigabytes, viewers are global, and reads dwarf writes. So the design is dominated by storage, transcoding, and delivery, not by the database.
1. Requirements
Functional: upload a video; watch a video (smooth playback on any device/network); basic metadata (title, description); view counts; search by title. Out of scope (state it): recommendations, comments, monetization, live streaming.
Non-functional: extremely read-heavy (views ≫ uploads), global low-latency playback, high durability (never lose an uploaded video), and elastic capacity for transcoding spikes.
2. Back-of-the-envelope
The point is to show storage and bandwidth dominate. Assume 1M uploads/day at ~1 GB raw each → ~1 PB/day of raw ingest before transcoding (transcoding then produces several renditions per video, multiplying storage). Views might be 1,000× uploads. Conclusion: storage and egress bandwidth are the cost centers; the metadata DB is trivial by comparison. Say that out loud — it reframes the whole design.
3. The two subsystems
A. Write path — the upload/transcode pipeline (the crux)
- Client requests an upload URL; uploads the raw file directly to object storage (S3) (often via a resumable/multipart upload).
- Upload completion enqueues a transcoding job.
- A fleet of transcoding workers converts the raw file into multiple resolutions and formats, chunked for adaptive streaming (HLS/DASH), plus thumbnails. This is CPU-heavy and asynchronous — the video is "processing" until done.
- Transcoded renditions are written back to object storage and distributed to the CDN.
- Metadata (status, durations, rendition manifest URLs) is written to the metadata DB.
B. Read path — watching
- Client hits the metadata service, gets the video info + a manifest (the list of chunk/rendition URLs).
- The player streams chunks from the CDN using adaptive bitrate — it steps resolution up/down based on the viewer's bandwidth, so playback stays smooth.
- Your origin servers serve almost no video bytes; the CDN absorbs the traffic.
4. Data model & storage
| Data | Store | Why |
|---|---|---|
| Raw + transcoded video | Object storage (S3) + CDN | Cheap, durable blob storage; CDN for global delivery |
| Video metadata | Relational or NoSQL | Small records; lookups by videoId; search index (Elasticsearch) on title |
| View counts | Async counters (queue → batch increment) | Avoid a hot-row write per view; eventual consistency is fine |
5. Bottlenecks & trade-offs to name unprompted
- Transcoding cost & latency: it's the expensive, slow step → make it async, autoscale the worker fleet, and prioritize popular formats first so the video is watchable quickly.
- Storage explosion: every video becomes many renditions → tier cold/old videos to cheaper storage; don't pre-generate every resolution for unpopular videos.
- CDN is non-negotiable: serving petabytes of egress from origin is impossible/ruinous; the CDN (multi-tier, edge caching) is the read architecture.
- View counts: never
UPDATE ... SET views = views+1on the hot path — buffer in a queue and batch; accept eventual consistency. - Hot videos (virality): a few videos take most traffic → the CDN handles it, but pre-warm edges for anticipated spikes.
- Thumbnails/manifests are tiny and cacheable — serve from CDN too.
Why interviewers love this one
It forces the candidate to separate a write-heavy media pipeline from a read-heavy metadata service, to reach for object storage + CDN + async transcoding instead of a database, and to reason about cost at petabyte/terabit scale. It's the same 7-step framework; the bottleneck just moves from the DB (as in the URL shortener) to storage and bandwidth.
Written by Amit Singh — Senior SDE at Amazon, Claude Certified Architect, and founder of AlgoEngineer. We teach these with live mock system-design interviews in our System Design course.