Back to Blog
System DesignInterview TipsDistributed SystemsFAANG

Design YouTube — System Design Interview Walkthrough

Amit Singh

Amit Singh

Author

June 25, 2026
13 min read

The key insight for designing YouTube is that it's two systems wearing one logo: a heavy video pipeline (upload → transcode → store → CDN) and a lightweight metadata service. Keep them separate, serve the bytes from a CDN (never your servers), and make transcoding asynchronous, and the design falls into place. This applies the standard system design framework to a storage- and bandwidth-bound problem.

The mistake candidates make is treating video like normal data. It isn't — a single file is gigabytes, viewers are global, and reads dwarf writes. So the design is dominated by storage, transcoding, and delivery, not by the database.

1. Requirements

Functional: upload a video; watch a video (smooth playback on any device/network); basic metadata (title, description); view counts; search by title. Out of scope (state it): recommendations, comments, monetization, live streaming.

Non-functional: extremely read-heavy (views ≫ uploads), global low-latency playback, high durability (never lose an uploaded video), and elastic capacity for transcoding spikes.

2. Back-of-the-envelope

The point is to show storage and bandwidth dominate. Assume 1M uploads/day at ~1 GB raw each → ~1 PB/day of raw ingest before transcoding (transcoding then produces several renditions per video, multiplying storage). Views might be 1,000× uploads. Conclusion: storage and egress bandwidth are the cost centers; the metadata DB is trivial by comparison. Say that out loud — it reframes the whole design.

3. The two subsystems

A. Write path — the upload/transcode pipeline (the crux)

  1. Client requests an upload URL; uploads the raw file directly to object storage (S3) (often via a resumable/multipart upload).
  2. Upload completion enqueues a transcoding job.
  3. A fleet of transcoding workers converts the raw file into multiple resolutions and formats, chunked for adaptive streaming (HLS/DASH), plus thumbnails. This is CPU-heavy and asynchronous — the video is "processing" until done.
  4. Transcoded renditions are written back to object storage and distributed to the CDN.
  5. Metadata (status, durations, rendition manifest URLs) is written to the metadata DB.

B. Read path — watching

  1. Client hits the metadata service, gets the video info + a manifest (the list of chunk/rendition URLs).
  2. The player streams chunks from the CDN using adaptive bitrate — it steps resolution up/down based on the viewer's bandwidth, so playback stays smooth.
  3. Your origin servers serve almost no video bytes; the CDN absorbs the traffic.

4. Data model & storage

DataStoreWhy
Raw + transcoded videoObject storage (S3) + CDNCheap, durable blob storage; CDN for global delivery
Video metadataRelational or NoSQLSmall records; lookups by videoId; search index (Elasticsearch) on title
View countsAsync counters (queue → batch increment)Avoid a hot-row write per view; eventual consistency is fine

5. Bottlenecks & trade-offs to name unprompted

  • Transcoding cost & latency: it's the expensive, slow step → make it async, autoscale the worker fleet, and prioritize popular formats first so the video is watchable quickly.
  • Storage explosion: every video becomes many renditions → tier cold/old videos to cheaper storage; don't pre-generate every resolution for unpopular videos.
  • CDN is non-negotiable: serving petabytes of egress from origin is impossible/ruinous; the CDN (multi-tier, edge caching) is the read architecture.
  • View counts: never UPDATE ... SET views = views+1 on the hot path — buffer in a queue and batch; accept eventual consistency.
  • Hot videos (virality): a few videos take most traffic → the CDN handles it, but pre-warm edges for anticipated spikes.
  • Thumbnails/manifests are tiny and cacheable — serve from CDN too.

Why interviewers love this one

It forces the candidate to separate a write-heavy media pipeline from a read-heavy metadata service, to reach for object storage + CDN + async transcoding instead of a database, and to reason about cost at petabyte/terabit scale. It's the same 7-step framework; the bottleneck just moves from the DB (as in the URL shortener) to storage and bandwidth.


Written by Amit Singh — Senior SDE at Amazon, Claude Certified Architect, and founder of AlgoEngineer. We teach these with live mock system-design interviews in our System Design course.

Ready to Ace Your Interviews?

Join thousands of students who have successfully landed their dream jobs at FAANG companies.