[Lecture Notes] Query Execution & Processing

3 min readFeb 13, 2023

The Video Lecture¹

Andy Pavlo’s top 3 optimizations:
- Data parallelization (vectorization)
- Task parallelization (multi-threading)
- Code specialization (compilation)
Optimization goals:
- Reduce instruction counts.
- Reduce cycles per instructions.
- Parallelize execution.
The vectorization model is a middle ground between two extremes:
- The iterator/volcano model, in which a child operator emits one tuple at a time.
- The materialization model, in which a child operator emits all tuples.
The materialization model is better for OLTP workloads because intermediate results are typically small.
Most OLAP systems use the vectorization model because:
- It reduces invocations per operator.
- It allows for SIMD.
Two plan processing directions:
- Top-down from root (the pull model).
- Bottom-up from leaves (the push model).
Two approaches of parallel execution:
- Inter-query parallelism
- Intra-query parallelism
Two ways of intra-query parallelism:
- Intra-operator parallelism (horizontal)
- Inter-operator/pipelined parallelism (vertical)

Hindrances to CPU pipelining:
- Dependency between instructions
- If-branch mis-predictions
Super-scalar CPUs can execute multiple independent instructions in parallel.
Compiler optimizations, such as loop pipelining, are critical for achieving good CPU utilization.
Cache hit-ratio of the memory loads and stores also has a big influence on the processor throughput.
Two ways that the Volcano model leads to low IPC (instructions-per-cycle) efficiency:
- It inhibits compiler optimizations that exploit CPU parallelism.
- The interpretation overhead cannot be amortized across multiple tuples.
Drawbacks to the full materialization of MonetDB:
- Materializing results of expressions consumes extra memory bandwidth.
- It introduces extra positional joins.
X100 strikes a balance between the Volcano and MonetDB execution models:
- It uses a rather standard relational algebra as query language.
- It uses vertically fragmented data layout to enable lightweight data compression, but reduces update costs by using append-only delta columns.
- The use of vectorized primitives not only optimizes memory layout in the cache, but also limits degree of freedom so that aggressive loop pipelining could be applied.

Access path selection: when a query is predicated on a column with a secondary index, a secondary index scan may or may not work better than a full sequential scan.
Sequential scans are increasingly fast in modern optimized OLAP systems since:
- Column storage avoids reading unneeded attributes.
- Vectorized execution reduces tuple interpretation overhead.
- Scans can be shared across queries.
- Compressing individual columns independently is more efficient.
- Holding each attribute contiguously allows for SIMD processing.
Secondary index scans are still useful when selectivity is low, but the break-even point decreases as more concurrent queries are run on the same data.
As the data size increases, the selectivity crossover point reaches a maximum and then drops gradually.
- The cost of a sequential scan grows linearly.
- For a secondary index scan, the tree height growth is sub-linear, but the final sort by row-id has a super-linear complexity.