Virtual Memory

Page tables, TLBs, and the elegant abstraction that gives every process its own address space.

virtual-memorypage-tabletlbmmu

Why Virtual Memory?

Every process believes it owns the entire address space. The Operating System and Memory Management Unit collaborate to map virtual pages to physical frames, providing:

  1. Isolation — process A can’t touch process B’s memory.
  2. Overcommit — total virtual memory can exceed physical RAM.
  3. Convenience — contiguous virtual ranges backed by scattered physical frames.

Virtual memory is both a hardware and software concept. The MMU and OS kernel each own distinct responsibilities.

All CPU architectures have a MMU:

  • x86-64
  • ARM64
  • PowerPC
  • MIPS
  • RISC-V
Controls
MMU / CPU (hardware) OS Kernel (software)
Address translation per accessCreating/destroying page tables
TLB lookup and cachingPhysical frame allocation
Hardware page-table walkPage fault handling (alloc, swap in)
Permission enforcement (R/W/X)Setting permission bits
Raising page faultsSwap space and eviction policy
Page-table format and page sizes (ISA-fixed)Address space layout (mmap, brk)
Does not control
MMU / CPU (hardware) OS Kernel (software)
Which pages are mappedTranslation speed
Which physical frames to useTLB replacement policy
Swap or eviction policyPage-table walk implementation
Per-process address space layoutSupported page sizes (ISA-fixed)

Page Tables

x86-64 Page Tables

A 4-level radix tree (on x86-64) translates 48-bit virtual addresses to physical. Each level indexes 9 bits, with the final level pointing to a 4 KB page frame.

Virtual Address (48-bit)
┌────────┬────────┬────────┬────────┬─────────────┐
│ PML4   │  PDPT  │   PD   │   PT   │   Offset    │
│ 9 bits │ 9 bits │ 9 bits │ 9 bits │  12 bits    │
└────────┴────────┴────────┴────────┴─────────────┘
  • PML4 - Page Map Level 4
  • PDPT - Page Directory Pointer Table
  • PD - Page Directory
  • PT - Page Table
  • Offset - 12 bits

Example of x86-64 Page Table Walk

Translate virtual address 0x00007F4AB3C1D000 to a physical address:

Step 1: Split the 48-bit address into four 9-bit indices and a 12-bit offset:

Virtual address decomposition into PML4, PDPT, PD, PT indices and offset

Step 2: Walk the four levels, using each index to look up the next table base:

  1. Read CR3 → PML4 base address (0x1A3000).
  2. PML4[254] → read 8 bytes at 0x1A3000 + 254×8. Next base: 0x3F5000.
  3. PDPT[298] → read 8 bytes at 0x3F5000 + 298×8. Next base: 0x7A2000.
  4. PD[414] → read 8 bytes at 0x7A2000 + 414×8. Next base: 0x1B8000.
  5. PT[285] → read 8 bytes at 0x1B8000 + 285×8. Physical frame: 0xC40000.
  6. Result: 0xC40000 + 0x000 = 0xC40000.

Page table walk — CR3 through four levels to physical address

Each table is exactly one page (512 entries × 8 bytes = 4 KB). The full walk costs 4 memory accesses (~200 cycles) — which is why the TLB exists.

TLB: Translation Lookaside Buffer

Walking four levels of page tables on every access would take too long. The Translation Lookaside Buffer caches recent translations — typically 64–1536 entries with a >99 % hit rate for well-behaved workloads.

TLB misses are expensive (~20–100 cycles). Huge pages (2 MB / 1 GB) reduce TLB pressure by covering more memory per entry.

Example of a TLB

  1. This simple TLB has 16 entries.
  2. Assume 2^10 = 1024 pages are mapped.
    1. Page numbers are 10 bits
    2. Index into the TLB with 4 bits (lets randomly pick bits 8 5 3 0)

TLB indexing example — 16 entries indexed by bits 8, 5, 3, 0 of the virtual page number

Questions
  1. Why is the TLB not a simple array?
  2. How does the TLB know which process the page belongs to?
  3. How does the TLB know the physical address is not for a different process?
Answers
  1. The TLB is not a simple array due to the trade off between size and the number of possible entries the TLB can have. If the TLB was a simple array, it would need to the size of the virtual address space.
  2. The TLB store the process ID along with the page table entry.
  3. Since the TLB already has the process ID, once the virtual address if confirmed to be the same process, the physical address is unique.

Demand Paging & Page Faults

Pages start unmapped. On first access the CPU raises a page fault; the OS allocates a frame, zeroes it, updates the page table, and resumes execution. This lazy strategy avoids wasting RAM on pages that are never touched.

Practical tip: For latency-sensitive applications, use mlock() / MAP_POPULATE to fault pages in ahead of time.

x86 Huge Pages

Huge pages (2 MB / 1 GB) reduce TLB pressure by covering more memory per entry. This is especially useful for latency-sensitive applications. When a block of memory is expected to be accessed sequentially, having a larger mapping will increase the chances of a TLB hit by increasing the size of the mapping.

Huge pages