Lecture 6 of CMU 15–721 Advanced Databases (Spring 2023).
The Video Lecture¹
- Andy Pavlo’s top 3 optimizations:
- Data parallelization (vectorization)
- Task parallelization (multi-threading)
- Code specialization (compilation) - Optimization goals:
- Reduce instruction counts.
- Reduce cycles per instructions.
- Parallelize execution. - The vectorization model is a middle ground between two extremes:
- The iterator/volcano model, in which a child operator emits one tuple at a time.
- The materialization model, in which a child operator emits all tuples. - The materialization model is better for OLTP workloads because intermediate results are typically small.
- Most OLAP systems use the vectorization model because:
- It reduces invocations per operator.
- It allows for SIMD. - Two plan processing directions:
- Top-down from root (the pull model).
- Bottom-up from leaves (the push model). - Two approaches of parallel execution:
- Inter-query parallelism
- Intra-query parallelism - Two ways of intra-query parallelism:
- Intra-operator parallelism (horizontal)
- Inter-operator/pipelined parallelism (vertical)
The Monet/X100 Paper²
- Hindrances to CPU pipelining:
- Dependency between instructions
- If-branch mis-predictions - Super-scalar CPUs can execute multiple independent instructions in parallel.
- Compiler optimizations, such as loop pipelining, are critical for achieving good CPU utilization.
- Cache hit-ratio of the memory loads and stores also has a big influence on the processor throughput.
- Two ways that the Volcano model leads to low IPC (instructions-per-cycle) efficiency:
- It inhibits compiler optimizations that exploit CPU parallelism.
- The interpretation overhead cannot be amortized across multiple tuples. - Drawbacks to the full materialization of MonetDB:
- Materializing results of expressions consumes extra memory bandwidth.
- It introduces extra positional joins. - X100 strikes a balance between the Volcano and MonetDB execution models:
- It uses a rather standard relational algebra as query language.
- It uses vertically fragmented data layout to enable lightweight data compression, but reduces update costs by using append-only delta columns.
- The use of vectorized primitives not only optimizes memory layout in the cache, but also limits degree of freedom so that aggressive loop pipelining could be applied.
The Scan-vs-Probe Paper³
- Access path selection: when a query is predicated on a column with a secondary index, a secondary index scan may or may not work better than a full sequential scan.
- Sequential scans are increasingly fast in modern optimized OLAP systems since:
- Column storage avoids reading unneeded attributes.
- Vectorized execution reduces tuple interpretation overhead.
- Scans can be shared across queries.
- Compressing individual columns independently is more efficient.
- Holding each attribute contiguously allows for SIMD processing. - Secondary index scans are still useful when selectivity is low, but the break-even point decreases as more concurrent queries are run on the same data.
- As the data size increases, the selectivity crossover point reaches a maximum and then drops gradually.
- The cost of a sequential scan grows linearly.
- For a secondary index scan, the tree height growth is sub-linear, but the final sort by row-id has a super-linear complexity.