Basics of Database Systems

Devin Z

8 min readJun 12, 2022

Curated learning notes on computer science topics (2/4).

Data independence: applications are insulated from how data is logically structured and physically stored.
Impedance mismatch: the disconnect between the object model in the application code and the relational model in the database system.
Driving forces behind NoSQL:
- Need for greater scalability.
- Widespread preference for free and open source software.
- Specialized query operations.
- More dynamic and expressive data models.
Compared with the relational model:
- The document model is better for one-to-many relationships with more flexible schemas and performance gains due to storage locality.
- The graph model is targeted at many-to-many relationships.
Motivation of normalization: human-readable information is prone to changes and duplicating them would incur write overheads and risk inconsistencies.
Motivation of denormalization: creating redundant data to improve read performance at the expense of write overheads.
Normal forms of the relational model:
- Second normal form (2NF): 1NF with no partial dependency.
- Third normal form (3NF): 2NF with no transitive dependency.
- Boyce-Codd normal form (BCNF): a non-trivial functional dependency must be a key constraint of the relation.
- For any relation, a lossless-join dependency-preserving decomposition into 3NF is always possible.
- Decomposition into BCNF is not always dependency-preserving.
Schema-on-read vs schema-on-write:
- Schema-on-read burdens consumers with data interpretation, while schema-on-write forces producers to put data in a standardized format.
- Schema-on-read is similar to runtime type checking, whereas schema-on-write is similar to compile-time type checking.
Advantages of declarative query languages (vs imperative query languages):
- Conciseness and ease to work with.
- Potential for performance optimizations and parallel execution.
Independent operators for set algebra:
- renaming (ρ)
- projection (π)
- selection (σ)
- product (×)
- union (∪)
- difference (-)
Some intermediate SQL features:
- aggregates with HAVING clauses
- nested queries
- common table expressions (CTE)
- window functions (ex: LC 512)
In SQL, a subquery is correlated if it references attributes from surrounding subqueries.
Indexes speed up read queries but slow down writes.
Two main schools of thought for OLTP systems:
- Page-oriented storage engines (e.g. B-trees) write blocks in place to achieve consistent performance (no merging or compaction) and strong support for transactional semantics.
- Log-structured storage engines (e.g. LSM-trees) leverage sequential writes to contain write amplification and disk fragmentation.
Advantages of SSTables over hash indexes:
- No need to keep an index of all the keys in memory.
- Simple and efficient segment merge.
- Potential for compression of consecutive records.
B+ trees only store keys and values in leaf nodes, whereas B-trees store them in all nodes.
Latch crabbing/coupling on B-trees:
- For search, unlatch the parent node on latching the child node.
- For mutation, unlatch an internal node when it’s guaranteed to not be split or merged.
- To reduce contention on the root, optimistically take the read-path latching approach until reaching the leaf and restart with write-path latching only if it does require splits or merges.
As a rule of thumb, LSM-trees are typically faster for blind writes, whereas B-trees are thought to be faster for reads.
Hash-based indexes are good for equality selections, whereas B+ trees are good for range selections and sequential scans.
Prefix key compression increases the fanout of a B+ tree and reduces the height.
ISAM is a static tree structure that is typically used when inserts and deletes are rare.
Cuckoo hashing uses multiple hash tables with different hash functions.
Extendible hashing and linear hashing enable resizing the hash table on demand without rebuilding the entire table.
- Extensible hashing splits a bucket when it’s full and doubles the slot array when the global depth increases.
- Linear hashing splits the bucket at the split pointer and incrementally expands the slot array.
R-trees are balanced in the face of changing data, whereas kd-trees are typically bulk loaded.
Common memory-efficient probabilistic data structures:
- A Bloom filter approximates the set membership with zero false negative.
- A count-min sketch approximates occurrences of elements.
- A HyperLogLog approximates the number of distinct elements.
A Levenshtein automaton supports efficient search for words within a given edit distance.
The performance advantage of in-memory databases is usually because they avoid the overheads of serialization and deserialization, given the fact that operating systems actively cache recently used disk blocks.
Transaction processing vs analytics:
- The bottleneck of OLTP systems is usually disk seek time, whereas the bottleneck of OLAP systems is usually disk bandwidth.
- OLTP databases are usually row-oriented, whereas OLAP databases are usually column-oriented.
Storage models:
- N-ary storage model (row stores, typically for OLTP)
- Decomposition storage mode (column stores, typically for OLAP)
- Partition attribute across (PAX, column-oriented within a row group)
In the n-ary storage model:
- A file can either be a data file or an index file.
- A file consists of fixed-sized pages, which are either data pages or directory pages.
- A page is a collection of records, either of a fixed length or of variable lengths.
Column-oriented storage enables efficient compression techniques, such as bitmap encoding (for small value spaces) and run-length encoding (for consecutive duplicate values).
Dictionary compression enables storing fixed-length values for variable-length data in column stores.
Delta store: stage updates in a row store and asynchronously migrate them to a column store in batches.
Pointer swizzling translates disk addresses into memory addresses.
Atomicity, isolation and durability are properties of the database, whereas consistency (in ACID) is a property of the application.
- Atomicity has nothing to do with concurrency, but means “all or nothing”.
Anomalies of weak isolation levels:
- Dirty read: a transaction reads uncommitted data of another transaction.
- Dirty write: a transaction overwrites uncommitted data of another transaction.
- Read skew (unrepeatable read): a transaction reads different objects at different points in time, which can be solved by MVCC.
- Lost update: two transactions concurrently perform a read-modify-write cycle and one overwrites another’s write without incorporating its changes.
- Write skew: a transaction does some write based on some read and by the time it commits the data it read has changed. Snapshot isolation is susceptible to this anomaly.
- Phantom read: the result of a search query in one transaction is changed by writes in another transaction. Predicate locking is the theoretical antidote to this.
Multi-version concurrency control (MVCC) for snapshot isolation:
- Readers never block writers and writers never block readers.
- Keep multiple versions of an object written by different transactions.
- List all active transactions at the start of each transaction to determine which versions are visible.
- Used for backups and analytic queries.
Approaches for serializable isolation:
- Two-phase locking (2PL) + predicate locks (practically index-range locks).
- Serializable snapshot isolation (SSI).
Two-phase locking (2PL):
- In the growing phase, only acquiring or upgrading locks is permitted.
- In the shrinking phase, only releasing locks is permitted.
- The operations on all the objects could be considered to happen at a moment between the two phases.
- Cascading aborts is required for 2PL, otherwise dirty reads would occur if a transaction aborts after some of its exclusive locks are released.
- In comparison, strict 2PL does not allow a transaction to release a lock until commit or abort.
- To avoid starvation, shared locks shouldn’t be granted if there’s someone waiting for the exclusive lock.
Advantages of a buffer manager (vs OS memory management):
- ability to pin a page in the buffer pool or force it back to the disk;
- ability to adjust the replacement policy;
- ability to pre-fetch pages based on access patterns.
Buffer management policies:
- NO STEAL policy: a dirty page should not be evicted from the buffer pool if pinned by an uncommitted transaction.
- FORCE policy: a transaction should write all dirty pages back to the disk on commit.
- NO STEAL policy causes poor throughput but ensures atomicity with simple rollback.
- FORCE policy causes poor response time but provides durability with simple recovery.
- Write-ahead logging (WAL) guarantees atomicity and durability when STEAL and NO FORCE are allowed.
- A WAL record contains (1) the log sequence number (LSN), (2) the transaction ID, (3) the object ID, (5) the before value, and (5) the after value.
- WAL records could be flushed to disk in batches, as long as all the records for the updates on a dirty page are flushed to disk before the page itself is flushed to disk.
- An application gets notified of a commit only after the COMMIT record gets flushed to disk, but may get notified of an abort even if log records have not yet flushed.
ARIES crash recovery algorithm:
- Analysis phase: identify all the active transactions and dirty pages (at the time of the crash) by scanning forward from the most recent checkpoint.
- Redo phase: scan forward to redo all updates on the dirty pages in the buffer pool.
- Undo phase: scan backward to undo all updates of uncommitted transactions in the reverse order.
System catalogs are a collection of tables maintaining metadata and statistics of other tables for estimating costs and optimizing query execution.
Steps of query compilation:
- The SQL query is parsed into an abstract syntax tree (AST).
- The AST is algebraized into a logical query plan based on system catalogs.
- The logical query plan may be optimized based on both system catalogs and cost models.
- The logical query plan is transformed into a physical query plan.
Query processing models:
- Iterator model (top-down)
- Materialization model (bottom-up, suitable for OLTP)
- Vectorization model (hybrid, typically for OLAP)

[Lecture Notes] Query Execution & Processing

Lecture 6 of CMU 15–721 Advanced Databases (Spring 2023).

devinz1993.medium.com

Index nested loop join: if there is an index on the join column of one relation, make that table the inner query and exploit the index.
Hash join: build a hash table for the smaller table, and then run the inner loop on the larger table.
- If the hash table doesn’t fit in the memory, use the same hash function to partition the two tables into buckets first.
- Use a Bloom filter to speed up the probe.
Sort-merge join: sort two relations on the join column and scan them to do a merge.
Index-only plans: retrieving data records from the disk file can be avoided for some queries given a suitable index.
Clustered index: the indexed row is stored directly within the index.
Zone maps: pre-compute columnar aggregates for each block to enable skipping a block on a query.
Column sketches: build a bitmap index by mapping value ranges to codings in an order-preserving manner, taking account of frequency distributions.
To keep data systems in sync:
- Change data capture (CDC) is done at the database level by extracting writes in the system-of-record database and replicating them in derived data systems (in contrast with dual writes, which may have inconsistent event ordering).
- Event sourcing is done at the application level by keeping an immutable log of change events (for write throughput and auditability).
- CQRS (command query responsibility segregation) further transforms write-optimized event logs into read-optimized views.
- The derived dataset is the place where the read and write paths meet, representing a tradeoff between the two types of workloads.¹
Lambda architecture:
- A stream processor consumes events and updates an approximate view.
- A batch processor reprocesses the events and creates a more accurate view.
Complex event processing (CEP) reverses the relationship between queries and data, such that events continuously flow past standing queries in search of queries that match them.

References:

Devin Z

Software System Design

View list

33 stories

Basics of Database Systems

[Lecture Notes] Query Execution & Processing

Lecture 6 of CMU 15–721 Advanced Databases (Spring 2023).

References:

Software System Design

Written by Devin Z