[Book Notes] System Design Interview — An Insider’s Guide (Volume 1)

Devin Z

5 min readDec 26, 2023

The precursor of a more recent best-seller.

System Design Interview - An insider's guide

www.amazon.com

Back-of-the-Envelope Estimation

Latency of main memory reference: 0.1 μs
Round-trip time within a datacenter: 0.5 ms
Round-trip time within a metro: 5 ms
Disk seek time: 10 ms
Round-trip time within a continent: 30 ms
Round-trip time across continents: 150 ms
Hard disk sequential read speed: 200 MB/s
SSD sequential read speed: 1 GB/s

Design Rate Limiter

Purpose: to avoid resource exhaustion and to reduce cost.
Where to put it?
- On the client side, which is unreliable.
- Right in the server.
- In an API gateway.
To throttle a request, return HTTP code 429 (too many requests).
A rate limiting algorithm needs to balance accuracy and memory efficiency.
Store runtime counters in Redis, which not only is fast but also supports time-based expiration and atomic operations.
- Multiple rate limiters reference the same data store to dispense with synchronization.
Customized rate limiting rules should be cached for fast read.

Design Unique ID Generator

UUIDs don’t naturally increase over time.
A single ticket server can become a single point of failure.
The Twitter snowflake approach:
- A sign bit + 41-bit timestamp + 5-bit datacenter ID + 5-bit machine ID + 12-bit sequence number.
- It requires clock synchronization across servers.

Design URL Shortener

Return HTTP code 301 (permanent redirect) to reduce server load, or return 302 (temporary redirect) for click rate tracking.
Two options for generating short URLs:
- Use hashing, and repeatedly append the input with a predefined string util there’s no collision.
- Use a unique ID generator to generate an integral ID and perform base-64 conversion.
CRC-32 is simple, efficient and is used for detecting random errors, whereas MD5 and SHA-1 are complex and are used to ensure security.

Design Web Crawler

The downloaded HTML files need to be validated and checked on duplication before being stored and parsed.
The extracted URLs need to be filtered and checked on duplication before being added to the URL frontier.
The URL frontier needs to ensure politeness and account for priorities.
- The front-queue selector bias towards high-priority URLs.
- The back-queue selector makes sure that URLs for the same host are dispatched to the same download thread.

Design Notification System

Three types of notifications:
- mobile notifications (iOS or Android)
- SMS messages
_ emails
Contact info and notification templates should be cached for fast read.
Use per-type message queues to decouple
- notification servers, which receive requests and hydrate notifications;
- works, who send notifications to third-party services with at-least-once semantics.
The total number of queued notifications needs to be monitored.

Design News Feed System

Take a hybrid approach:
- Fanout on write (push model) for most writers to guarantee read performance.
- Fanout on read (pull model) for celebrity writers.
The news feed cache only keeps post ids and user ids, and the post content is only hydrated at the serving time.

Design Chat System

Use WebSocket connections, which are persistent and bidirectional, to push messages to active clients.
Separate the stateless HTTP services (for login, signup, and user profile operations) from stateful WebSocket services (for messaging and online presence).
Upon a user login, ZooKeeper-based service discovery helps assign a chat server to the client.
Multiple devices of the same user connect to the same server to sync messages.
For group chats, one message fans out to the message sync queue of each recipient.
Use low-latency horizontally-scalable key-value stores (e.g. Cassandra) to store messages.
Message ids could be local to the pair of chatting friends or to the chat group.
Messages to offline recipients are sent to a notification system.
Clients send heartbeat messages to the presence server to maintain online status, and it fans out to all subscribers.

Design Search Autocomplete System

Cache top results at each node of a trie.
Use AJAX for sending requests without refreshing the whole web page.
Search events are down-sampled before being logged, and are aggregated before being used for updating the trie (DB and cache).
Shard the storage by prefix and balance the load per statistics.

Design YouTube

Assume the average video size is 300 MB.
Popular videos should be streamed from CDN.
A client needs to get a pre-signed URL before uploading the video to the cloud storage.
Metadata updates are served at the same time as the actual video is being uploaded.
A video is first uploaded to the original storage before being transcoded by a transcoding server.
- Transcoding refers to encoding a video into compatible bitrates and encoding formats.
- An encoding format includes both the container and the codecs.
- The transcoding server splits a video into chunks (if it’s not done on the client side) so that they can be processed in parallel.
- Use a DAG model to pipeline various processing tasks.
- Transcoded videos are stored in a separate cloud storage.
Use message queues to decouple
- video downloading from the original storage;
- video transcoding;
- video uploading to the transcoded storage.
On completion of a video upload, a completion handler updates the metadata for the video.

Design Google Drive

Three APIs:
- Upload a file (maybe resumable).
- Download a file.
- List versions of a file.
An uploaded file is chunked into blocks (e.g. 4MB) by block servers before being sent to cloud storage (S3).
- Skip further processing for unmodified blocks.
- Blocks are compressed and encrypted before being stored and synced to other subscribers.
- Metadata updates are processed at the same time as the file is being uploaded.
A file has a history of versions, and each file version consists of a sequence of blocks.
A subscribed client establishes a long polling connection to get notified of file changes.
- Right before a file upload, it gets notified so that it will not proceed to change the file to cause conflicts.
- On completion of an upload, it gets notified so that it can fetch the metadata and download the new blocks.
Infrequently accessed blocks can be moved to less expensive cold storage (S3 glacier).

References:

Alex Xu. System Design Interview — An Insider’s Guide (2rd edition). Byte Code LLC. 2020.

[Book Notes] System Design Interview — An Insider’s Guide (Volume 2)

A must-read when jobs in tech are at risk.

devinz1993.medium.com

[Book Notes] System Design Interview — An Insider’s Guide (Volume 1)

System Design Interview - An insider's guide

Back-of-the-Envelope Estimation

Design Rate Limiter

Design Unique ID Generator

Design URL Shortener

Design Web Crawler

Design Notification System

Design News Feed System

Design Chat System

Design Search Autocomplete System

Design YouTube

Design Google Drive

References:

[Book Notes] System Design Interview — An Insider’s Guide (Volume 2)

A must-read when jobs in tech are at risk.

Written by Devin Z