system designintermediate 12m2026-06-09

News Feed / Timeline System — Fanout-on-Read vs Write, Ranking

Session 35 of the 48-session learning series.

Why this session matters

This is Session 35 of 48 in the System Design track. Every consumer app — Twitter, Instagram, LinkedIn, TikTok — has a feed at its heart. The architectural choice between fanout-on-write and fanout-on-read shapes the entire system's cost, latency, and scale ceiling. Interview-favourite for a reason.

Agenda

Push (fanout-on-write) vs pull (fanout-on-read) — when each wins
Hybrid fanout — the celebrity user problem
Feed ranking — chrono → weighted → ML-ranked
Read path optimisation — cache, materialised feeds
Write path scalability — sharding, async pipelines, fan-out service

Pre-read (skim before the session)

Deep dive

1. The two fundamental approaches

Fanout-on-write (push) — when user A posts, the system writes A's post into the feed-cache of every follower immediately. Read is then a cheap lookup.

Fanout-on-read (pull) — when user X opens their feed, the system queries posts from everyone X follows and merges them in real time.

PUSH                              PULL
post → fanout                     post → store
   ├→ follower1.feed                ↓
   ├→ follower2.feed              feed open?
   └→ ... N followers              ↓
                                  query all-followed
read = single cache hit           merge top-K

The classic trade-off:

Model	Write cost	Read cost	Stale risk
Push	O(followers)	O(1)	low
Pull	O(1)	O(followed × posts/sec)	high

2. Who uses what

Twitter (early) — pull for celebrities, push for everyone else. Hybrid. Instagram — primarily push, ranking layer on top. Facebook — pull-heavy with aggressive caching. LinkedIn — hybrid; ranks at read time.

In 2026, the trend is hybrid with ML ranking dominating the experience over raw chronology.

3. The celebrity user problem

Lady Gaga has 80M followers. If she posts and we push to all 80M feeds: 80M writes + 80M cache invalidations. Slow, expensive, error-prone.

Solutions:

Pull-on-celebrities, push-on-normals — define a threshold (e.g. >1M followers); their posts are pulled at read time, mixed with pushed posts from normal accounts.
Lazy push — only push to active followers (logged in last 7 days).
Async fanout with priority queues — push immediately to active power-users; trickle to inactive followers.
Materialise per-celebrity feed shard — followers don't get the post in their feed; their feed-render pulls the celebrity post separately.

4. The data model

Per user, conceptually:

inbox: deque[(post_id, ts, score)]  # max ~500–2000 items
following: set[user_id]
followers: set[user_id]

In practice:

inbox lives in Redis (or per-user partitioned Kafka).
following/followers in graph DB or sharded RDBMS.
post_body in object store; ID is the cache key.

5. Write path

user posts
   ↓
write to posts.db (durable)
   ↓
publish event to Kafka topic "post.created"
   ↓
fanout service consumes:
   - look up followers
   - for active followers: push post_id into their inbox cache
   - for inactive: skip (lazy-load on next visit)
   - if celebrity: skip fanout entirely

Fan-out service is the heavy-iron. Typically: thousands of consumers, sharded by follower-id, retry queue for failed pushes.

6. Read path

user opens feed
   ↓
fetch user.inbox from Redis (push results)
   ↓
fetch celebrity_posts where author in user.celebrities (pull)
   ↓
fetch newly-posted items from active follows not yet pushed
   ↓
merge + rank + paginate
   ↓
hydrate post bodies, author info, engagement counts

Latency budget for the whole thing: < 200 ms. Each step has tight sub-budgets.

7. Ranking — chronological → ML

Era 1: pure chronological. Easy, predictable, falls apart at scale (firehose).

Era 2: weighted recency:

score = engagement_score * exp(-age_hours / half_life)

Tune by content type; tweak constantly.

Era 3: ML-ranked (where everyone is now):

Retrieve candidates (push inbox + pull celebrities + suggested = ~1000).
Score each with a click-prediction model (S27).
Re-rank for diversity, freshness, advertiser slots.
Serve top-50.

Per-request: ~5 ms ranking budget. Pre-compute features; serve model with low-latency stack.

8. Counters, likes, replies

Engagement signals power ranking. Showing accurate counts to millions in real time is its own subproblem:

Sharded counters — partition by post_id % N; merge at read.
Approximate counts — HyperLogLog for unique likers if exact not needed.
Lazy increment — write to Kafka; aggregator updates counter every N seconds.
Caching with staleness — TTL 30s on "like count"; user perceives "live".

9. Storage tiering

Hot — last 24 hours. In Redis + memory. Reads bypass disk.
Warm — last 30 days. In SSD-backed RDBMS / NoSQL.
Cold — older. In object store; rarely queried.

Most reads hit hot. Cost of cold storage is negligible; throwing posts away makes legal/UX problems.

10. Failure modes

Fan-out service backlog — bursty viral post stalls behind queue. Mitigate with sharded queues, autoscaling, backpressure.
Inbox cache cold — Redis flush wipes feed; cold start storms backend. Mitigate with cache warmer + soft fallback.
Celebrity broadcast storm — every follower reloads at the same moment. Mitigate with stagger + CDN-cached celebrity post.
Schema migration — feed format change; old client crashes. Mitigate with version field; tolerate forever.

11. Notification fan-out

Same shape as feed fan-out but lower latency, higher priority. Push notifications go through:

APNS / FCM / WebPush adapters.
Per-user preferences (do not disturb).
De-dup (don't notify same like twice).
Rate limit per user.

Often a separate service from feed fanout, sharing the same event bus.

12. Reality check

Build order for a new feed:

Pull-only, paginated, no ranking — get product launched.
Add Redis cache for hot posts; TTL = a few minutes.
Add push fanout for active users; keep pull as fallback.
Add celebrity special-case at the threshold that bites you.
Add ML ranker once you have engagement data to learn from.

Don't over-engineer day 1. But know where the road leads.

Reading material

Books:

Designing Data-Intensive Applications — Martin Kleppmann (the Twitter / fanout case study in ch. 1; the canonical "push vs pull" reference)
System Design Interview, Vol. 1 — Alex Xu (the dedicated News Feed chapter)
System Design Interview, Vol. 2 — Alex Xu (the dedicated news-aggregator and notification-system chapters)
Web Scalability for Startup Engineers — Artur Ejsmont (the timeline / activity stream chapter)

Papers:

Feeding Frenzy: Selectively Materializing Users' Event Feeds — Silberstein et al. 2010 (SIGMOD, Yahoo!) — the original push/pull hybrid paper that everyone implements.
TAO: Facebook's Distributed Data Store for the Social Graph — Bronson et al. 2013 (USENIX ATC) — the storage layer that backs the News Feed.
Unicorn: A System for Searching the Social Graph — Curtiss et al. 2013 (Facebook, VLDB) — Facebook's search-over-graph engine.

Official docs:

Redis — Sorted Sets — the data structure 90% of feed materialisations are built on.
Apache Cassandra — modeling for time series — the canonical timeline backend.
Kafka — log compaction — how delete-then-rewrite timeline state actually works.

Blog posts:

Twitter Engineering — Real-time delivery architecture at Twitter — the canonical Twitter scale post.
Instagram Engineering — Powered by AI: Instagram's Explore recommender system — feed ranking with two-stage retrieval + ranking.
LinkedIn Engineering — A brief history of Scaling LinkedIn — how the feed system evolved through 4 architectures.
Pinterest Engineering — Building the Pinterest home feed — push/pull/hybrid in production.
Discord — How Discord stores trillions of messages — the message store under the feed.

In-depth research material

Apache Cassandra — github.com/apache/cassandra — ~9k ★, the OG wide-column store; the read-modify-write substrate for materialised feeds.
ScyllaDB — github.com/scylladb/scylladb — ~14k ★, the C++ Cassandra-compatible store Discord migrated trillions of messages to.
Apache Kafka — github.com/apache/kafka — ~29k ★, the canonical fanout-on-write delivery substrate.
Snowflake (Twitter) IDs — github.com/twitter-archive/snowflake — the time-ordered 64-bit ID generator everyone copies for feed ordering.
Twitter Engineering — Twitter's New Architecture: Manhattan + Storm — the storage rewrite.
Pinterest Engineering — Smartfeed: Pinterest's home feed — the hybrid push/pull ranked feed in detail.
Meta Engineering — How Facebook News Feed works (2023 update) — the modern AI-ranked feed.
Slack Engineering — Scaling Slack's Job Queue — the fanout substrate of "send notification to everyone in this channel."
Instagram Engineering — Instagram's Million Dollar Mobile Engineering Investment — the client-side feed caching strategy.
Tumblr Engineering — Building a real-time activity stream — the Disco-on-Kafka fanout system.

Videos

Design Twitter Timeline | System Design Interview — InfoQ — System Design Interview · 16 min — the canonical interview-style walkthrough.
Design a News Feed System — Hello Interview — Evan King · 56 min — a 60-minute simulated FAANG interview on the problem.
Real-time Delivery Architecture at Twitter — InfoQ — Twitter engineer · 47 min — the actual architecture from the people who run it.
Designing Instagram — System Design Mastery — System Design Mastery · 28 min — the feed + media variant of the same problem.
Building the News Feed Architecture — Facebook Engineering (F8) — F8 keynote · 31 min — the OG fanout-on-write explained by the people who invented the modern version.

LeetCode — Design Twitter

Link: https://leetcode.com/problems/design-twitter/
Difficulty: Medium
Why this problem: Implement post + follow + topK feed merge. The model problem for fanout systems. (Yes, we saw this in S27 — different lens here, system-design framing.)
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

Compare push vs pull fanout on write cost, read cost, staleness.
Identify the celebrity-user problem and 3 mitigations.
Sketch the write path including Kafka + fanout service + Redis inbox.
Sketch the read path including merge of pushed + pulled + suggested.
Explain why chronological-only feeds fail at scale.
Solve design-twitter — heap-based k-way merge of per-user post streams.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

LLM Serving Part 2 — Speculative Decoding, Quantisation, Throughput

Multimodal LLMs — Vision, Language, Audio, Tool Use Combined