Design Twitter / X

Requirements

Functional Requirements

Post tweets (text, images, videos)
Follow/unfollow users
Home timeline — see tweets from followed users
User timeline — see a user's tweets
Search tweets and users
Like and retweet

Non-Functional Requirements

Scale: 500M users, 200M DAU
Availability: 99.99% uptime
Latency: Timeline loads < 200ms
Throughput: 600K tweets/sec at peak

Estimation

200M DAU × 5 tweets/day = 1B tweets/day
Average tweet size: ~300 bytes
Storage: 1B × 300B = 300GB/day (text only)
With media: ~5TB/day

High-Level Architecture

                    ┌──────────────┐
  Clients ─────────→│   API Gateway  │
                    └──────┬───────┘
            ┌──────────────┼──────────────┐
            ▼              ▼              ▼
      ┌──────────┐  ┌──────────┐  ┌──────────┐
      │ Tweet    │  │ Timeline │  │ Search   │
      │ Service  │  │ Service  │  │ Service  │
      └────┬─────┘  └────┬─────┘  └────┬─────┘
           │              │              │
      ┌────▼─────┐  ┌────▼─────┐  ┌────▼─────┐
      │ Tweet DB │  │ Timeline │  │ Search   │
      │ (MySQL)  │  │ Cache    │  │ Index    │
      └──────────┘  │ (Redis)  │  │ (Elastic)│
                    └──────────┘  └──────────┘

The Fan-Out Problem

The core challenge is timeline generation. When you open Twitter, you see tweets from everyone you follow, sorted by time.

Fan-Out on Write (Push Model)

When a user tweets, push it to all followers' timelines.

Fan-Out on Read (Pull Model)

When a user opens their timeline, fetch tweets from all followed users at that moment.

Hybrid Approach (Twitter's Actual Design)

Best of Both Worlds

Normal users (< 10K followers): Fan-out on write
Celebrities (> 10K followers): Fan-out on read

When loading timeline, merge pre-computed timeline with live queries for celebrity tweets.

Data Storage

Tweet Storage (MySQL/PostgreSQL)

Column	Type
tweet_id	BIGINT (Snowflake ID)
user_id	BIGINT
content	VARCHAR(280)
media_urls	JSON
created_at	TIMESTAMP

Sharded by user_id for even distribution.

Timeline Cache (Redis)

Each user's home timeline is a sorted set in Redis:

Key: timeline:{user_id}
Value: list of tweet IDs (last 800 tweets)
TTL: 7 days

Media Storage

Object Storage (S3) for images and videos
CDN for serving media globally with low latency

Twitter created Snowflake for generating unique, time-sortable IDs across distributed systems. A 64-bit ID contains: timestamp (41 bits) + datacenter ID (5 bits) + machine ID (5 bits) + sequence number (12 bits).

Key Trade-offs

Decision	Choice	Reasoning
Fan-out strategy	Hybrid	Balances write cost vs read latency
Database	MySQL + Redis	Proven at scale, Redis for fast timelines
Search	Elasticsearch	Full-text search, real-time indexing
Media storage	S3 + CDN	Cost-effective, globally distributed
ID generation	Snowflake	Time-sortable, no coordination needed

Don't Forget

In the interview, always discuss: rate limiting, spam detection, content moderation, and how you'd handle trending topics.