Basics of Database Systems

Devin Z
8 min readJun 12, 2022

Curated learning notes on computer science topics (2/4).

Filoli Garden, May 22, 2022
  • Data independence: applications are insulated from how data is logically structured and physically stored.
  • Impedance mismatch: the disconnect between the object model in the application code and the relational model in the database system.
  • Driving forces behind NoSQL:
    - Need for greater scalability.
    - Widespread preference for free and open source software.
    - Specialized query operations.
    - More dynamic and expressive data models.
  • Compared with the relational model:
    - The document model is better for one-to-many relationships with more flexible schemas and performance gains due to storage locality.
    - The graph model is targeted at many-to-many relationships.
  • Motivation of normalization: human-readable information is prone to changes and duplicating them would incur write overheads and risk inconsistencies.
  • Motivation of denormalization: creating redundant data to improve read performance at the expense of write overheads.
  • Normal forms of the relational model:
    - Second normal form (2NF): 1NF with no partial dependency.
    - Third normal form (3NF): 2NF with no transitive dependency.
    - Boyce-Codd normal form (BCNF): a non-trivial functional dependency must be a key constraint of the relation.
    - For any relation, a lossless-join dependency-preserving decomposition into 3NF is always possible.
    - Decomposition into BCNF is not always dependency-preserving.
  • Schema-on-read vs schema-on-write:
    - Schema-on-read burdens consumers with data interpretation, while schema-on-write forces producers to put data in a standardized format.
    - Schema-on-read is similar to runtime type checking, whereas schema-on-write is similar to compile-time type checking.
  • Advantages of declarative query languages (vs imperative query languages):
    - Conciseness and ease to work with.
    - Potential for performance optimizations and parallel execution.
  • Independent operators for set algebra:
    - renaming (ρ)
    - projection (π)
    - selection (σ)
    - product (×)
    - union (∪)
    - difference (-)
  • Some intermediate SQL features:
    - aggregates with HAVING clauses
    - nested queries
    - common table expressions (CTE)
    - window functions (ex: LC 512)
  • In SQL, a subquery is correlated if it references attributes from surrounding subqueries.
  • Indexes speed up read queries but slow down writes.
  • Two main schools of thought for OLTP systems:
    - Page-oriented storage engines (e.g. B-trees) write blocks in place to achieve consistent performance (no merging or compaction) and strong support for transactional semantics.
    - Log-structured storage engines (e.g. LSM-trees) leverage sequential writes to contain write amplification and disk fragmentation.
  • Advantages of SSTables over hash indexes:
    - No need to keep an index of all the keys in memory.
    - Simple and efficient segment merge.
    - Potential for compression of consecutive records.
  • B+ trees only store keys and values in leaf nodes, whereas B-trees store them in all nodes.
  • Latch crabbing/coupling on B-trees:
    - For search, unlatch the parent node on latching the child node.
    - For mutation, unlatch an internal node when it’s guaranteed to not be split or merged.
    - To reduce contention on the root, optimistically take the read-path latching approach until reaching the leaf and restart with write-path latching only if it does require splits or merges.
  • As a rule of thumb, LSM-trees are typically faster for blind writes, whereas B-trees are thought to be faster for reads.
  • Hash-based indexes are good for equality selections, whereas B+ trees are good for range selections and sequential scans.
  • Prefix key compression increases the fanout of a B+ tree and reduces the height.
  • ISAM is a static tree structure that is typically used when inserts and deletes are rare.
  • Cuckoo hashing uses multiple hash tables with different hash functions.
  • Extendible hashing and linear hashing enable resizing the hash table on demand without rebuilding the entire table.
    - Extensible hashing splits a bucket when it’s full and doubles the slot array when the global depth increases.
    - Linear hashing splits the bucket at the split pointer and incrementally expands the slot array.
  • R-trees are balanced in the face of changing data, whereas kd-trees are typically bulk loaded.
  • Common memory-efficient probabilistic data structures:
    - A Bloom filter approximates the set membership with zero false negative.
    - A count-min sketch approximates occurrences of elements.
    - A HyperLogLog approximates the number of distinct elements.
  • A Levenshtein automaton supports efficient search for words within a given edit distance.
  • The performance advantage of in-memory databases is usually because they avoid the overheads of serialization and deserialization, given the fact that operating systems actively cache recently used disk blocks.
  • Transaction processing vs analytics:
    - The bottleneck of OLTP systems is usually disk seek time, whereas the bottleneck of OLAP systems is usually disk bandwidth.
    - OLTP databases are usually row-oriented, whereas OLAP databases are usually column-oriented.
  • Storage models:
    - N-ary storage model (row stores, typically for OLTP)
    - Decomposition storage mode (column stores, typically for OLAP)
    - Partition attribute across (PAX, column-oriented within a row group)
  • In the n-ary storage model:
    - A file can either be a data file or an index file.
    - A file consists of fixed-sized pages, which are either data pages or directory pages.
    - A page is a collection of records, either of a fixed length or of variable lengths.
  • Column-oriented storage enables efficient compression techniques, such as bitmap encoding (for small value spaces) and run-length encoding (for consecutive duplicate values).
  • Dictionary compression enables storing fixed-length values for variable-length data in column stores.
  • Delta store: stage updates in a row store and asynchronously migrate them to a column store in batches.
  • Pointer swizzling translates disk addresses into memory addresses.
  • Atomicity, isolation and durability are properties of the database, whereas consistency (in ACID) is a property of the application.
    - Atomicity has nothing to do with concurrency, but means “all or nothing”.
  • Anomalies of weak isolation levels:
    - Dirty read: a transaction reads uncommitted data of another transaction.
    - Dirty write: a transaction overwrites uncommitted data of another transaction.
    - Read skew (unrepeatable read): a transaction reads different objects at different points in time, which can be solved by MVCC.
    - Lost update: two transactions concurrently perform a read-modify-write cycle and one overwrites another’s write without incorporating its changes.
    - Write skew: a transaction does some write based on some read and by the time it commits the data it read has changed. Snapshot isolation is susceptible to this anomaly.
    - Phantom read: the result of a search query in one transaction is changed by writes in another transaction. Predicate locking is the theoretical antidote to this.
  • Multi-version concurrency control (MVCC) for snapshot isolation:
    - Readers never block writers and writers never block readers.
    - Keep multiple versions of an object written by different transactions.
    - List all active transactions at the start of each transaction to determine which versions are visible.
    - Used for backups and analytic queries.
  • Approaches for serializable isolation:
    - Two-phase locking (2PL) + predicate locks (practically index-range locks).
    - Serializable snapshot isolation (SSI).
  • Two-phase locking (2PL):
    - In the growing phase, only acquiring or upgrading locks is permitted.
    - In the shrinking phase, only releasing locks is permitted.
    - The operations on all the objects could be considered to happen at a moment between the two phases.
    - Cascading aborts is required for 2PL, otherwise dirty reads would occur if a transaction aborts after some of its exclusive locks are released.
    - In comparison, strict 2PL does not allow a transaction to release a lock until commit or abort.
    - To avoid starvation, shared locks shouldn’t be granted if there’s someone waiting for the exclusive lock.
  • Advantages of a buffer manager (vs OS memory management):
    - ability to pin a page in the buffer pool or force it back to the disk;
    - ability to adjust the replacement policy;
    - ability to pre-fetch pages based on access patterns.
  • Buffer management policies:
    - NO STEAL policy: a dirty page should not be evicted from the buffer pool if pinned by an uncommitted transaction.
    - FORCE policy: a transaction should write all dirty pages back to the disk on commit.
    - NO STEAL policy causes poor throughput but ensures atomicity with simple rollback.
    - FORCE policy causes poor response time but provides durability with simple recovery.
    - Write-ahead logging (WAL) guarantees atomicity and durability when STEAL and NO FORCE are allowed.
    - A WAL record contains (1) the log sequence number (LSN), (2) the transaction ID, (3) the object ID, (5) the before value, and (5) the after value.
    - WAL records could be flushed to disk in batches, as long as all the records for the updates on a dirty page are flushed to disk before the page itself is flushed to disk.
    - An application gets notified of a commit only after the COMMIT record gets flushed to disk, but may get notified of an abort even if log records have not yet flushed.
  • ARIES crash recovery algorithm:
    - Analysis phase: identify all the active transactions and dirty pages (at the time of the crash) by scanning forward from the most recent checkpoint.
    - Redo phase: scan forward to redo all updates on the dirty pages in the buffer pool.
    - Undo phase: scan backward to undo all updates of uncommitted transactions in the reverse order.
  • System catalogs are a collection of tables maintaining metadata and statistics of other tables for estimating costs and optimizing query execution.
  • Steps of query compilation:
    - The SQL query is parsed into an abstract syntax tree (AST).
    - The AST is algebraized into a logical query plan based on system catalogs.
    - The logical query plan may be optimized based on both system catalogs and cost models.
    - The logical query plan is transformed into a physical query plan.
  • Query processing models:
    - Iterator model (top-down)
    - Materialization model (bottom-up, suitable for OLTP)
    - Vectorization model (hybrid, typically for OLAP)
  • Index nested loop join: if there is an index on the join column of one relation, make that table the inner query and exploit the index.
  • Hash join: build a hash table for the smaller table, and then run the inner loop on the larger table.
    - If the hash table doesn’t fit in the memory, use the same hash function to partition the two tables into buckets first.
    - Use a Bloom filter to speed up the probe.
  • Sort-merge join: sort two relations on the join column and scan them to do a merge.
  • Index-only plans: retrieving data records from the disk file can be avoided for some queries given a suitable index.
  • Clustered index: the indexed row is stored directly within the index.
  • Zone maps: pre-compute columnar aggregates for each block to enable skipping a block on a query.
  • Column sketches: build a bitmap index by mapping value ranges to codings in an order-preserving manner, taking account of frequency distributions.
  • To keep data systems in sync:
    - Change data capture (CDC) is done at the database level by extracting writes in the system-of-record database and replicating them in derived data systems (in contrast with dual writes, which may have inconsistent event ordering).
    - Event sourcing is done at the application level by keeping an immutable log of change events (for write throughput and auditability).
    - CQRS (command query responsibility segregation) further transforms write-optimized event logs into read-optimized views.
    - The derived dataset is the place where the read and write paths meet, representing a tradeoff between the two types of workloads.¹
  • Lambda architecture:
    - A stream processor consumes events and updates an approximate view.
    - A batch processor reprocesses the events and creates a more accurate view.
  • Complex event processing (CEP) reverses the relationship between queries and data, such that events continuously flow past standing queries in search of queries that match them.

--

--