# Arm Yourself with Data Structures

LeetCode 480, 803, 336 and 850.

Data structures are essential building blocks for composing efficient algorithms. In my previous posts, I’ve shown common uses of vectors (stacks), double-ended queues, priority queues (binary heaps) and hash maps in LeetCode problems. Today we’re going to see four more data structures that could be employed in coding questions.

## LC 480 Sliding Window Median

Given an array of n numbers and a sliding window of size k (k≤n) moving from the left to the right of the array, return the medians of the numbers in the window at each time.

A brute force solution could take O(n*k) time, since we need to iterate all the k numbers for each of the n-k+1 windows. However, batch processing the n-k+1 offline queries should lend us some opportunities to resolve them incrementally and thereby bring down the amortized cost of each query.

To achieve that, we’d like to separate the smaller ⌈k/2⌉ numbers and the larger ⌊k/2⌋ numbers in each window. Each time the window slides, we need to: (1) add a new number to one of the two groups to keep the variant that all those in the smaller group are not greater than any of those in the larger group, (2) rebalance the sizes of the two groups so that we could easily know the ⌊k/2⌋th and the⌈k/2⌉th smallest numbers, and (3) delete the leftmost number in the current window from the groups for the next iteration.

A binary heap is competent for doing (1) and (2) efficiently, but only a tree map could handle all the three in O(k) time. In the code snippet below, I used the B-tree map from the Rust standard library. In other languages you are more likely to use maps that are based on balanced binary search trees, such as red-black trees.

## LC 803 Bricks Falling When Hit

There is an m x n binary grid, in which 1 represents a brick and 0 represents an empty space. A brick is stable if it’s either at the first row or adjacent to another stable brick in its four directions. Every time we hit a cell in the grid, the brick in that cell (if there’s one) would disappear, and some other bricks could become unstable because of the disappeared brick and thus fall. Given the sequence of cells to hit, return the number of falling bricks after each hit in order.

Similar to the last problem, given that we have a schedule of offline queries, we should take advantage of that information and achieve a better overall performance than handling them in sequence. In this problem, we need to adjust the order of resolving the queries. We would make a time reversal from the final state after all the hits are done, add back each disappeared brick in the reverse order, and for each one check how many unstable bricks would become stable. Basically, we want to answer how many more bricks would get into the same connected component with the bricks on the first row if a new brick emerges at a hit location.

The data structure that could efficiently handle this is the disjoint set, which has a nuanced time complexity reasoning but is very easy to implement.

## LC 336 Palindrome Pairs

Given an array of unique words, return all the pairs of the distinct indices (i, j) such that words[i]+words[j] is a palindrome.

For each index i, we want to find all the indices j (j ≠ i) such that either (1) words[i] is the reverse of some suffix of words[j] and the remaining prefix of words[j] is palindromic, or (2) words[j] is the reverse of some prefix of words[i] and the remaining suffix of words[i] is palindromic.

This would be a good application of a trie, which helps us to find all those matching a prefix or suffix from a collection of strings. First, we build a trie to match all the suffices of all the words. Then we enumerate words[i], and for each of them we enumerate its prefixes and use the trie to locate candidates for words[j] satisfying condition (2). Once we reach the end of words[i], we do a DFS downwards from the current trie node to enumerate all the candidates for words[j] that are longer than words[i] and that satisfy condition (1). The entire process could be implemented as below.

## LC 850 Rectangle Area II

Calculate the total area in the plane covered by an array of axis-aligned rectangles. An area covered by multiple rectangles should be counted only once.

This is a geometry problem that could be solved by a plane sweep. Basically, we scan the plane from left to right with a line aligned with the y-axis, and maintain the total length on the line that is currently intersected with any rectangle. Essentially, we just need to enumerate the x-coordinates at which we enter or leave rectangles in the increasing order. For each such x-coordinate, we need to do some bookkeepings for all the active rectangles our sweep line is intersecting.

Now the problem becomes how we could efficiently maintain a list of line segments and determine the total length covered by any active segment at any time. The segment tree, which was introduced in a previous post, could handle both the update and the query in O(log n) time, where n is the total number of the line segments. As before, we need to first discretize and index all the y-coordinates of the rectangles so that each leaf node of the segment tree represents a y-coordinate. We define the span of a leaf node as the distance from the y-coordinate of itself to the y-coordinate of the next leaf node. Over the plane sweep, we need to count the number of active rectangles occupying the y-coordinate for each leaf node, and maintain the total span of all the occupied leaves under each internal node. This translates to the following code.    