[Book Notes] System Design Interview — An Insider’s Guide (Volume 1)

Devin Z
5 min readDec 26, 2023

The precursor of a more recent best-seller.

Sunnyvale, December 22, 2023

Back-of-the-Envelope Estimation

  • Latency of main memory reference: 0.1 μs
  • Round-trip time within a datacenter: 0.5 ms
  • Round-trip time within a metro: 5 ms
  • Disk seek time: 10 ms
  • Round-trip time within a continent: 30 ms
  • Round-trip time across continents: 150 ms
  • Hard disk sequential read speed: 200 MB/s
  • SSD sequential read speed: 1 GB/s

Design Rate Limiter

  • Purpose: to avoid resource exhaustion and to reduce cost.
  • Where to put it?
    - On the client side, which is unreliable.
    - Right in the server.
    - In an API gateway.
  • To throttle a request, return HTTP code 429 (too many requests).
  • A rate limiting algorithm needs to balance accuracy and memory efficiency.
  • Store runtime counters in Redis, which not only is fast but also supports time-based expiration and atomic operations.
    - Multiple rate limiters reference the same data store to dispense with synchronization.
  • Customized rate limiting rules should be cached for fast read.

Design Unique ID Generator

  • UUIDs don’t naturally increase over time.
  • A single ticket server can become a single point of failure.
  • The Twitter snowflake approach:
    - A sign bit + 41-bit timestamp + 5-bit datacenter ID + 5-bit machine ID + 12-bit sequence number.
    - It requires clock synchronization across servers.

Design URL Shortener

  • Return HTTP code 301 (permanent redirect) to reduce server load, or return 302 (temporary redirect) for click rate tracking.
  • Two options for generating short URLs:
    - Use hashing, and repeatedly append the input with a predefined string util there’s no collision.
    - Use a unique ID generator to generate an integral ID and perform base-64 conversion.
  • CRC-32 is simple, efficient and is used for detecting random errors, whereas MD5 and SHA-1 are complex and are used to ensure security.

Design Web Crawler

  • The downloaded HTML files need to be validated and checked on duplication before being stored and parsed.
  • The extracted URLs need to be filtered and checked on duplication before being added to the URL frontier.
  • The URL frontier needs to ensure politeness and account for priorities.
    - The front-queue selector bias towards high-priority URLs.
    - The back-queue selector makes sure that URLs for the same host are dispatched to the same download thread.

Design Notification System

  • Three types of notifications:
    - mobile notifications (iOS or Android)
    - SMS messages
    _ emails
  • Contact info and notification templates should be cached for fast read.
  • Use per-type message queues to decouple
    - notification servers, which receive requests and hydrate notifications;
    - works, who send notifications to third-party services with at-least-once semantics.
  • The total number of queued notifications needs to be monitored.

Design News Feed System

  • Take a hybrid approach:
    - Fanout on write (push model) for most writers to guarantee read performance.
    - Fanout on read (pull model) for celebrity writers.
  • The news feed cache only keeps post ids and user ids, and the post content is only hydrated at the serving time.

Design Chat System

  • Use WebSocket connections, which are persistent and bidirectional, to push messages to active clients.
  • Separate the stateless HTTP services (for login, signup, and user profile operations) from stateful WebSocket services (for messaging and online presence).
  • Upon a user login, ZooKeeper-based service discovery helps assign a chat server to the client.
  • Multiple devices of the same user connect to the same server to sync messages.
  • For group chats, one message fans out to the message sync queue of each recipient.
  • Use low-latency horizontally-scalable key-value stores (e.g. Cassandra) to store messages.
  • Message ids could be local to the pair of chatting friends or to the chat group.
  • Messages to offline recipients are sent to a notification system.
  • Clients send heartbeat messages to the presence server to maintain online status, and it fans out to all subscribers.

Design Search Autocomplete System

  • Cache top results at each node of a trie.
  • Use AJAX for sending requests without refreshing the whole web page.
  • Search events are down-sampled before being logged, and are aggregated before being used for updating the trie (DB and cache).
  • Shard the storage by prefix and balance the load per statistics.

Design YouTube

  • Assume the average video size is 300 MB.
  • Popular videos should be streamed from CDN.
  • A client needs to get a pre-signed URL before uploading the video to the cloud storage.
  • Metadata updates are served at the same time as the actual video is being uploaded.
  • A video is first uploaded to the original storage before being transcoded by a transcoding server.
    - Transcoding refers to encoding a video into compatible bitrates and encoding formats.
    - An encoding format includes both the container and the codecs.
    - The transcoding server splits a video into chunks (if it’s not done on the client side) so that they can be processed in parallel.
    - Use a DAG model to pipeline various processing tasks.
    - Transcoded videos are stored in a separate cloud storage.
  • Use message queues to decouple
    - video downloading from the original storage;
    - video transcoding;
    - video uploading to the transcoded storage.
  • On completion of a video upload, a completion handler updates the metadata for the video.

Design Google Drive

  • Three APIs:
    - Upload a file (maybe resumable).
    - Download a file.
    - List versions of a file.
  • An uploaded file is chunked into blocks (e.g. 4MB) by block servers before being sent to cloud storage (S3).
    - Skip further processing for unmodified blocks.
    - Blocks are compressed and encrypted before being stored and synced to other subscribers.
    - Metadata updates are processed at the same time as the file is being uploaded.
  • A file has a history of versions, and each file version consists of a sequence of blocks.
  • A subscribed client establishes a long polling connection to get notified of file changes.
    - Right before a file upload, it gets notified so that it will not proceed to change the file to cause conflicts.
    - On completion of an upload, it gets notified so that it can fetch the metadata and download the new blocks.
  • Infrequently accessed blocks can be moved to less expensive cold storage (S3 glacier).

--

--