Design Twitter / X
System design for a Twitter-like social media platform handling tweets, timelines, and real-time updates at scale.
Requirements
Functional Requirements
- Post tweets (text, images, videos)
- Follow/unfollow users
- Home timeline — see tweets from followed users
- User timeline — see a user's tweets
- Search tweets and users
- Like and retweet
Non-Functional Requirements
- Scale: 500M users, 200M DAU
- Availability: 99.99% uptime
- Latency: Timeline loads < 200ms
- Throughput: 600K tweets/sec at peak
Estimation
- 200M DAU × 5 tweets/day = 1B tweets/day
- Average tweet size: ~300 bytes
- Storage: 1B × 300B = 300GB/day (text only)
- With media: ~5TB/day
High-Level Architecture
┌──────────────┐
Clients ─────────→│ API Gateway │
└──────┬───────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Tweet │ │ Timeline │ │ Search │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ Tweet DB │ │ Timeline │ │ Search │
│ (MySQL) │ │ Cache │ │ Index │
└──────────┘ │ (Redis) │ │ (Elastic)│
└──────────┘ └──────────┘
The Fan-Out Problem
The core challenge is timeline generation. When you open Twitter, you see tweets from everyone you follow, sorted by time.
Fan-Out on Write (Push Model)
When a user tweets, push it to all followers' timelines.
Fan-Out on Read (Pull Model)
When a user opens their timeline, fetch tweets from all followed users at that moment.
Hybrid Approach (Twitter's Actual Design)
Best of Both Worlds
- Normal users (< 10K followers): Fan-out on write
- Celebrities (> 10K followers): Fan-out on read
When loading timeline, merge pre-computed timeline with live queries for celebrity tweets.
Data Storage
Tweet Storage (MySQL/PostgreSQL)
| Column | Type |
|---|---|
| tweet_id | BIGINT (Snowflake ID) |
| user_id | BIGINT |
| content | VARCHAR(280) |
| media_urls | JSON |
| created_at | TIMESTAMP |
Sharded by user_id for even distribution.
Timeline Cache (Redis)
Each user's home timeline is a sorted set in Redis:
- Key:
timeline:{user_id} - Value: list of tweet IDs (last 800 tweets)
- TTL: 7 days
Media Storage
- Object Storage (S3) for images and videos
- CDN for serving media globally with low latency
ID Generation
Snowflake IDs
Twitter created Snowflake for generating unique, time-sortable IDs across distributed systems. A 64-bit ID contains: timestamp (41 bits) + datacenter ID (5 bits) + machine ID (5 bits) + sequence number (12 bits).
Key Trade-offs
| Decision | Choice | Reasoning |
|---|---|---|
| Fan-out strategy | Hybrid | Balances write cost vs read latency |
| Database | MySQL + Redis | Proven at scale, Redis for fast timelines |
| Search | Elasticsearch | Full-text search, real-time indexing |
| Media storage | S3 + CDN | Cost-effective, globally distributed |
| ID generation | Snowflake | Time-sortable, no coordination needed |
Don't Forget
In the interview, always discuss: rate limiting, spam detection, content moderation, and how you'd handle trending topics.