Fundamental Data Structures

Algorithm Engineering — L04

Dr. Dominik Krupke

Welcome

Linear Structures

Start with std::vector. Use something else when the workload clearly asks for it.

The choice decides the layout

Same sixteen integers, two containers — two very different shapes in memory.

What a pointer hop actually costs

Level Latency Ratio
L1 cache ~1 ns
L2 cache ~4 ns
L3 cache ~12 ns 12×
Main memory ~100 ns 100×

Order-of-magnitude numbers; exact values are hardware-specific.

A std::vector traversal lets the prefetcher keep data in cache.

A std::list traversal risks main memory on every hop.

Asymptotically identical, empirically one hundred times slower.

When does std::list actually win?

Three conditions, all at once:

  • Many splices, few traversals. Moving elements between lists without copying.
  • Iterator stability matters. Pointers into the list must survive insertions and removals elsewhere.
  • Elements are expensive to move. Copying or moving a value is costly compared to the pointer overhead.

If you cannot name all three, reach for std::vector.

std::deque — the Swiss army knife nobody should open

A map of pointers to blocks. Every iterator increment risks crossing a block boundary; every implementation is slightly different and usually slower than its specification suggests.

Prefer vector-as-queue

Trade a little memory (the husks) for contiguous layout, a single allocation, and one cache line per hop.

Small-buffer optimization

Worth reaching for: absl::InlinedVector, boost::small_vector, llvm::SmallVector.

Summary — linear structures

  • Default: std::vector. Contiguous, predictable, cache-friendly.
  • Last resort: std::list. Only when splicing, stable iterators, or expensive moves demand it.
  • Avoid: std::deque. Prefer std::vector with a read index.
  • Optimize small cases: absl::InlinedVector and friends when most instances hold only a handful of elements.
  • Carpenter, Almost Always Vector, CppCon 2024 (Carpenter 2024)
  • Carruth, High Performance Code 201: Hybrid Data Structures, CppCon 2016 (Carruth 2016)

Hash Maps

A core workhorse — with performance that depends heavily on the implementation.

When do we actually reach for a hash map?

  • Small N (≲ 100): sorted std::vector + linear scan — cache beats asymptotics.
  • Dense integer keys: std::vector<Value> indexed directly — no hash needed.
  • Ordered queries (range, predecessor): sorted container, not a hash map.
  • Everything else: hash map territory. But not the one the standard ships.
// Graph: map node_id (0..N-1) → degree.
std::vector<int> degree(N, 0);
for (auto e : edges) degree[e.src]++;

// Tokenized text: count by token ID.
std::vector<int> count(vocab_size, 0);
for (auto tok : tokens) count[tok]++;

std::unordered_map is a linked list in disguise

The bucket array holds pointers to heap-allocated nodes. Every insert allocates; every lookup chases a pointer off the array.

Open addressing — keep collisions local

One allocation. Collisions probed in adjacent slots. A miss stays in the same cache line.

But: wide values → few slots per cache line. Every probe step = fresh cache line.

Swiss tables — split metadata from data

Tiny metadata array scanned before any key is touched — 16 slots per cache line, regardless of value width.

A single SIMD instruction checks all 16 tags at once; only matches pay to read the full key-value pair.

Tuning: reserve, load factor, hash quality

  • Reserve before a bulk insert — one call avoids ~log of N rehashes.
  • Load factor: leave it alone. The defaults are benchmarked.
  • Hash quality: std::hash<int> is typically the identity. Swap in xxHash or ankerl::unordered_dense::hash for hot string keys.

Summary — hash maps

  • Check the exceptions first: dense integer keys → std::vector. Ordered queries → sorted container. Tiny N → linear scan.
  • Avoid: std::unordered_map. Chained by standard mandate — 2× slower on hits, 3–4× on misses and unreserved build.
  • Default: ankerl::unordered_dense, absl::flat_hash_map, or boost::unordered_flat_map.
  • Tune: reserve before bulk inserts. Leave load factor alone. Hash quality matters for non-trivial keys.

Sorted Containers

When order itself is part of the query, not just membership.

When do we actually need order?

  • Range queries: “all log entries between 10:00 and 11:00”, “prices between 90 € and 110 €”
  • Predecessor / successor: “closest timestamp ≤ now”, “next larger key”
  • Ordered iteration: stream keys sorted, merge two indexes, prefix scans
  • Order statistics: k-th smallest, rank of a key
  • Database / filesystem indexes: every B-tree on disk exists for range scans

If none of these apply, you want a hash map, not a sorted container.

Lead with the simplest thing: sorted std::vector

Build once, binary-search many times.

std::vector<std::pair<Key, Value>> table;
// ... fill and sort by key ...

auto it = std::lower_bound(
    table.begin(), table.end(), key,
    [](auto const& p, Key const& k) { return p.first < k; });

if (it != table.end() && it->first == key) {
    use(it->second);
}

Read-mostly ordered map? Start here. Simple and fast.

std::map is a red-black tree

30 keys → 30 separate allocations, 6 levels deep. std::map is std::list with a sorting invariant — per-node malloc, plus rebalancing on every insert.

Height is the cost

For one million keys, a balanced binary tree is ~20 levels deep.

Each level is a pointer hop to somewhere else in memory. In the bad case, 20 cache misses per lookup — and the work inside each node is negligible.

Cache misses come from tree height, not from node work. Reduce height by widening the branching factor.

B-tree: same keys, one cache line per node

30 keys → 13 nodes, 3 levels (order 5). Every key still carries its own value pointer (gray stubs).

Node sized to one cache line ⇒ one cache miss per level ⇒ logarithm in the branching factor, not in two.

Choosing the order: match the cache line

  • In memory: order 16–64. Node fits in one or two cache lines.
  • On disk: order 128–512. Node is a disk block; minimize seeks.
  • Bigger nodes ⇒ more work inside the node, diminishing returns on height.

Same structure, different constant. The “latency you are minimizing” decides the node size.

B+ tree: routing on top, values in leaves

Internal nodes = pure routing. All values live in leaves. Leaves linked for range scans.

Range query = one tree descent + a linear walk over the leaf chain. This is why every database uses B+.

Summary — sorted containers

  • Ask first: do I need order, or just point lookup? If lookup only → hash map.
  • Default: sorted std::vector + lower_bound. Build once, search many — but mid-insert is O(N): on a million-key vector a single insert shifts megabytes.
  • Mutation-heavy ordered map: absl::btree_map. Drop-in for std::map, ~3× faster lookup, per-op cost stays bounded by log N.
  • Range scans dominate: B+ tree or sorted vector — sequential memory layout, no pointer-chasing.
  • Avoid: std::map. Red-black tree with per-node malloc — slower than absl::btree_map on every workload. No shape where it wins.

Priority Queues

The simple binary heap is the right default — with one useful tuning knob.

When do we actually need a priority queue?

  • Event simulation: “what happens next?” ordered by time stamp
  • Shortest paths / A*: expand the frontier vertex with the smallest tentative distance
  • Top-k selection: keep a bounded heap of the best k items in a stream
  • Scheduling: earliest-deadline-first, anytime-algorithm re-queues
  • Branch-and-bound: always explore the cheapest open node first

If you only need the ordering once at the end, just sort. A priority queue earns its place when you pop and keep inserting.

Binary heap: implicit tree in a flat array

No pointers, no per-node allocations. The tree is index arithmetic on a std::vector.

push: append, then bubble up

pop: swap root with last, then sift down

Height is the cost — widen the tree

A binary heap over one million keys is ~20 levels deep. Each level is a potential cache miss per push / pop. Widen the tree, shrink the height.

d-ary heap: each node has d children instead of 2. Same array, same index arithmetic — just d·i + k instead of 2i + 1. The d children sit in one cache line; height drops from log₂ N to log_d N.

Modifying the key? Push a new entry instead

Heaps support no efficient way to change a key already in the heap. Work around it:

// min-heap: smallest priority first
using Entry = std::pair<int, int>;           // (priority, id)
std::priority_queue<Entry, std::vector<Entry>, std::greater<>> pq;

pq.push({new_priority, id});                 // duplicate, not update

while (!pq.empty()) {
    auto [p, id] = pq.top(); pq.pop();
    if (p != priority[id]) continue;         // stale — skip
    process(id);
}

Lazy deletion. The heap grows by at most a constant factor per update — a cheap price for avoiding the reverse-index bookkeeping a real decrease-key needs.

Summary — priority queues

  • Ask first: push-and-pop in a loop? If it is “sort once at the end” → just std::sort.
  • Default: std::priority_queue — binary heap on a vector. Cache-friendly, no allocations.
  • Cheap knob: a d-ary heap (d = 4 or 8) if the heap is large.
  • Modifying a key? Lazy deletion.

Asymptotic ≠ fast — the Fibonacci heap. Better textbook bound for Dijkstra, but pointer-chasing and cache-hostile consolidation make it lose to a flat binary heap in practice. Lazy deletion removes the last reason to use it.

Union-Find

A case where the theory translates well into practice: a compact implementation with very low amortized cost.

Two operations

Maintain a partition of {0, …, 7} under:

  • find(x) — which set is x in?
  • unite(x, y) — merge the two sets.

 

Two ideas, near-constant time

Union by rank: attach the shorter tree under the taller. Tree height stays logarithmic.

Path compression: during find, re-parent every node on the path directly to the root.

Twenty lines

#include <numeric>
#include <vector>

struct UnionFind {
    std::vector<int> parent, rank_;

    UnionFind(int n) : parent(n), rank_(n, 0) {
        std::iota(parent.begin(), parent.end(), 0);
    }

    int find(int x) {
        while (parent[x] != x) {
            parent[x] = parent[parent[x]];   // path halving
            x = parent[x];
        }
        return x;
    }

    void unite(int x, int y) {
        int rx = find(x), ry = find(y);
        if (rx == ry) return;
        if (rank_[rx] < rank_[ry]) std::swap(rx, ry);
        parent[ry] = rx;
        if (rank_[rx] == rank_[ry]) ++rank_[rx];
    }
};

Inverse Ackermann — and why it does not matter

Amortized cost per operation: \[O(\alpha(n))\]

For any n a physical computer can represent: α(n) ≤ 4.

In practice, treat both operations as O(1). The constant factor is dominated by the single cache line that holds the parent array hot.

Summary — union-find

  • Shape: a parent array, nothing else. Twenty lines of C++.
  • Tricks: union by rank, path compression (or halving) — apply both.
  • Cost: α(n) amortized; constant on any real input.
  • Spot it: dynamic partitions under merges, anywhere offline connectivity or equivalence classes appear.

Graphs

For graphs, the representation often matters as much as the algorithm.

Why graphs matter

Graphs are arguably the dominant abstraction of combinatorial optimization.

They model concrete entities — people in a social network, intersections in a street network, machines on a factory floor.

…but just as naturally abstract ones — states of a system and the transitions between them, configurations of a puzzle, positions in a game.

Once a problem wears this shape, a huge library of algorithms becomes available. The rest of this chapter is about making those algorithms run fast on real hardware.

Compressed Sparse Row — two flat arrays

std::vector<std::vector<int>> places each vertex’s neighbors in a separate heap allocation: one cache miss per vertex before reading a single edge.

CSR is to graphs what std::vector is to linear structures — the default unless you have a specific reason otherwise.

Annotated graphs — where do weights live?

  • Per-vertex data (distance, parent, visited-flag): one parallel array per attribute, indexed by vertex id. Same pattern as std::vector<T>; no indirection.
  • Per-edge data on CSR: add a weights array of length E, aligned with targets. weights[i] is the weight of the edge whose target is targets[i]. Two parallel arrays become three; everything still streams.
  • AoS alternative: one struct Edge { int dst; float weight; } array. Better if the algorithm always touches both fields together; cache line carries what you need.

The annotation layout is the same AoS vs SoA question you already know. Fields the inner loop reads together should sit together; cold fields (labels, timestamps, provenance) go in their own array.

Adjacency matrix — when the graph is dense

Right for small V with a dense graph, when edge-existence queries dominate, or when the algorithm is expressed as linear algebra (SpMV, matrix powers, GraphBLAS).

Bonus: the matrix is an algorithm

When the graph mutates

  • Pure CSR: fast reads, painful writes. Adding one edge shifts the rest.
  • Nested vector: cheap writes, slow reads.
  • In practice: keep a “cold” CSR snapshot plus a small “hot” delta structure. Rebuild periodically — every k inserts, or when the delta grows past some threshold.
  • Bulk loads: build an edge list, sort by source, convert to CSR in one pass.

Summary — graphs

  • Default: CSR. Two flat arrays, streams at memory bandwidth.
  • Avoid: std::vector<std::vector<int>> as a long-lived representation — same pathology as std::list.
  • Edge-centric: an edge list, sorted as the algorithm needs.
  • Dynamic: CSR plus a small delta, rebuild periodically.

Spatial Data Structures

Queries in space, not by key. That usually calls for different structures.

When do we need them?

  • Too many geometric objects to scan: millions of points, rectangles, triangles, particles.
  • Queries are local in space: near-neighbor, range, ray/intersection, or collision — only a small region is relevant.
  • Most data is irrelevant per query: the whole win comes from pruning almost everything.
  • Everywhere: GIS/maps, game physics, robotics, CAD, ray tracing, nearest-neighbor search.

The structure is chosen by the query, not the data. Range → grid/R-tree. Nearest-neighbor → kd-tree. Ray → BVH. Collision → BVH/grid. The rest of this section is a menu.

Uniform grid — the boring, often-winning baseline

\(O(1)\) expected for near-neighbor queries when points are roughly uniform. One array of bucket vectors, one divide-and-floor to find a cell.

kd-tree — recursive median splits

Degrades to linear scan as dimensionality rises. By about twenty dimensions, brute-force is competitive. This is the “curse of dimensionality” for data structures.

BVH — the ray-tracing workhorse

R-tree — rectangles in a database

Delaunay triangulation — geometry → sparse graph

Not a query accelerator — a graph builder. Turns a point cloud into a sparse planar graph so downstream graph algorithms have something to walk.

Summary — spatial structures

  • Default for points: uniform grid. Simple, fast, often enough.
  • Low-D nearest neighbor: kd-tree. Degrades above ~10 dimensions.
  • Objects, rays, collisions: BVH. Build quality dominates traversal speed.
  • On-disk rectangles: R-tree.
  • Sparse proximity graph: Delaunay triangulation — bridge to graph algorithms.
  • High-D embeddings: approximate methods — a topic of its own.

Probabilistic Structures

A closing party trick: give up exactness for tiny memory — simple, surprising, rarely taught.

Bloom filter in action

Where does the false-positive rate come from?

\[\Pr[\text{a fixed bit stays 0 after one hash}] \;=\; 1 - \tfrac{1}{m}\]

\[\Pr[\text{it stays 0 after all } kn \text{ hashes}] \;=\; \left(1 - \tfrac{1}{m}\right)^{kn} \;\approx\; e^{-kn/m}\]

\[\Pr[\text{bit is 1}] \;\approx\; 1 - e^{-kn/m}\]

\(m\): size of bit array

\(k\): number of hashes

\(n\): inserted elements

A non-member query hits \(k\) bits independently — false positive iff all are 1: \[\Pr[\text{false positive}] \;\approx\; \left(1 - e^{-kn/m}\right)^{k}\]

Minimize over \(k\) for fixed \(m/n\): optimum at \(k^\star = \tfrac{m}{n}\ln 2\).

Sizing a Bloom filter in practice

bits / element optimal \(k\) FPR for \(n = 10{,}000{,}000\)
4 3 14.7 % 4.8 MB
6 4 5.6 % 7.2 MB
8 6 2.2 % 9.5 MB
10 7 0.82 % 12 MB
12 8 0.31 % 14 MB
16 11 0.046 % 19 MB
20 14 0.0068 % 24 MB

Rule of thumb: each ~5 bits per element buys roughly a 10× lower FPR. Ten bits (~1 %) is the default most libraries pick; go higher only when a false positive is genuinely expensive.

Count-Min sketch — approximate frequencies

d rows, w columns. Hash each item into one cell per row; increment. Query returns the min across rows.

Count-Min — the formal guarantee

\[w = \lceil e/\varepsilon \rceil, \quad d = \lceil \ln(1/\delta) \rceil \;\Rightarrow\; \Pr[\,\widehat{f}(x) - f(x) \le \varepsilon N\,] \ge 1-\delta\]

  • \(\varepsilon\) controls accuracy (error relative to total stream mass \(N\)).
  • \(\delta\) controls confidence (probability the bound holds).
  • Heavy hitters are estimated well; rare items can look noisier in relative terms.

The estimate is always an overcount — never an undercount. The error is bounded by total stream mass, not by the item’s true frequency.

HyperLogLog — distinct-count in kilobytes

Hash each item → count leading zeros → the more distinct items, the higher the maximum.

\(2^{14}\) registers = 12 KB \(\Rightarrow\) ~0.8 % error. Redis PFCOUNT, BigQuery APPROX_COUNT_DISTINCT.

Summary — probabilistic structures

Bloom Count-Min HyperLogLog
Question “seen it?” “how often?” “how many distinct?”
Answer bit vector min of d counters leading-zero stats
Error false positives (no FN) one-sided overcount \(\le \varepsilon N\) rel. std. err. \(\approx 1.04/\sqrt{m}\)
Classic use LSM read skip heavy hitters COUNT(DISTINCT …)

Same recipe, three questions. Reach for them when an exact structure would blow the memory budget — otherwise just use a hash set.

Wrap-up

The data structure is the performance story.

What we covered

Linear
vector first, the rest rarely
Hash maps
open addressing, not chaining
Sorted
B-tree over red-black
Priority queues
implicit tree in an array
Union-find
twenty lines, near-constant
Graphs
CSR: two flat arrays
Spatial
grids, kd-trees, BVH, R-trees
Probabilistic
accuracy for space

What we did not cover

  • Strings: suffix arrays, suffix trees, tries, FM-index. Their own world.
  • Persistent / functional: immutable structures, path copying, versioned data.
  • Concurrent / lock-free: hash maps, queues, trees under contention.
  • External-memory: real B-trees on disk, LSM trees (RocksDB, LevelDB).
  • Succinct / compressed: wavelet trees, RRR bit-vectors, rank/select.
  • More sketches: Cuckoo filter (deletable membership), t-digest (quantiles), MinHash (set similarity).
  • GPU-oriented: hash tables, BVHs, sort-based primitives on device memory.

Further reading

  • Mehlhorn & Sanders, Algorithms and Data Structures: The Basic Toolbox (Mehlhorn and Sanders 2008)
  • Wirth, Algorithms + Data Structures = Programs (Wirth 1976)
  • Jiang, Data Structures in Practice — A Hardware-Aware Approach (Jiang 2025)
  • Carruth, Efficiency with Algorithms, Performance with Data Structures, CppCon 2014 (Carruth 2014)

Appendix — Reference

For lookup after the lecture. Not covered in the recorded walkthrough.

C++ / Python cheat sheet

Topic C++ Python
Dynamic array std::vector list, numpy.ndarray
Small / inline vector absl::InlinedVector, boost::small_vector, llvm::SmallVector — (fall back to list)
Deque / FIFO std::vector with read index collections.deque
Hash map (default) absl::flat_hash_map, ankerl::unordered_dense, boost::unordered_flat_map dict (already flat, open-addressed)
Hash map (std) std::unordered_map — chained, avoid when possible
Sorted lookup (static) sorted std::vector + std::lower_bound bisect on a sorted list
Sorted (mutation-heavy) absl::btree_map, absl::btree_set sortedcontainers.SortedList / SortedDict
Sorted (std) std::map, std::set — red-black tree, avoid
Priority queue std::priority_queue, boost::d_ary_heap heapq on a list
Union-find 20-line inline implementation scipy.cluster.hierarchy.DisjointSet, networkx.utils.UnionFind
Graph (CSR) hand-rolled CSR; GraphBLAS, Galois, Ligra scipy.sparse.csgraph, igraph, rustworkx
Graph (dict-of-dicts) — (avoid) networkx — fine below ~10⁵ edges
Spatial (kd-tree, BVH, R-tree) CGAL, nanoflann, Intel Embree, boost::geometry scipy.spatial.KDTree, scipy.spatial.cKDTree, rtree
Bloom filter folly, RocksDB, Abseil pybloom-live, rbloom
Count-Min / HyperLogLog datasketches (Apache) datasketches, probables

Rule of thumb: in Python, reach for a native-code library (NumPy, SciPy, Abseil-via-pybind11) before a pure-Python container when the workload is the bottleneck. In C++, reach for Abseil/Boost flat containers before the std:: ordered or chained defaults.

See you next time

References

Carpenter, Kevin. 2024. Back to Basics: Almost Always Vector.
Carruth, Chandler. 2014. Efficiency with Algorithms, Performance with Data Structures.
Carruth, Chandler. 2016. High Performance Code 201: Hybrid Data Structures.
Jiang, Danny. 2025. Data Structures in Practice: A Hardware-Aware Approach for System Software Engineers.
Mehlhorn, Kurt, and Peter Sanders. 2008. Algorithms and Data Structures: The Basic Toolbox. Springer.
Wirth, Niklaus. 1976. Algorithms + Data Structures = Programs. Prentice-Hall.