Caches and Memory Hierarchy
From registers to DRAM — understanding latency, bandwidth, and why cache-friendly code wins.
Updated March 18, 2026
The Pyramid
| Level | Typical size | Latency (cycles) |
|---|---|---|
| Registers | ~few hundred B | 0–1 |
| L1 cache | 32–64 KB | ~4 |
| L2 cache | 256 KB – 1 MB | ~12 |
| L3 cache | 8–64 MB | ~40 |
| DRAM | 16–512 GB | ~200 |
| NVMe SSD | TB-scale | ~10 000+ |
Every level trades capacity for speed. A program that fits its working set in L1 will feel instant; one that thrashes DRAM will crawl.
Registers
Registers are the fastest memory level. They are located on the same chip as the CPU cores and are used to store the most frequently used data and instructions.
Cache
Cache Lines and Spatial Locality
Caches operate on 64-byte lines. Accessing one byte pulls in the entire line, so iterating a contiguous array is essentially free after the first miss — this is spatial locality at work.
int32_t[] a = ...; // initialize array
// cache-friendly: sequential access
for (int i = 0; i < N; i++)
sum += a[i];
// cache-hostile: strided access
for (int i = 0; i < N; i += STRIDE)
sum += a[i];
Questions
- What is the
STRIDEin the example code?
Answers
- A cache line is 64 bytes and each
int32_tis 4 bytes, soSTRIDE = 64 / 4 = 16. At this stride every access lands in a different cache line, eliminating all spatial locality benefit.
Cache Types
- Inclusive cache: all cache lines are stored in the cache.
- Exclusive cache: only the cache lines that are not in the cache are stored in the cache.
Cache
There is a fundemental trade off between the size of the cache and the number of cache lines that can be stored in the cache. The more cache lines that can be stored in the cache, the more cache lines that can be stored in the cache. The more cache lines that can be stored in the cache, the more cache lines that can be stored in the cache.
Prefetching
Hardware prefetchers detect sequential and strided patterns automatically. Software prefetch intrinsics (__builtin_prefetch, _mm_prefetch) give you explicit control when access patterns are irregular.
Rule of thumb: Measure with
perf stat— if your L1 miss rate is above ~5 %, there is likely a data layout opportunity.