Process Scheduling
CFS, real-time priorities, and how the Linux kernel decides who runs next.
The Scheduling Problem
With N runnable threads and M cores (N >> M), the scheduler must decide:
- Who runs next?
- For how long before preemption?
- On which core (affinity, NUMA)?
Linux CFS (Completely Fair Scheduler)
CFS models an ideal multitasking CPU where every task gets an equal share of CPU time. It tracks each task’s virtual runtime (vruntime) — the less CPU a task has used, the more it deserves.
Tasks live in a red-black tree keyed by vruntime. The leftmost node (smallest vruntime) runs next. After each time-slice, vruntime is incremented proportionally to the task’s weight (derived from its nice value).
Real-Time Scheduling
Linux offers two real-time policies above CFS:
| Policy | Behavior |
|---|---|
SCHED_FIFO | Run until yield/block; highest-priority FIFO wins |
SCHED_RR | Same, but with round-robin within the same priority |
Real-time tasks always preempt CFS tasks.
Core Affinity & NUMA
taskset and sched_setaffinity() pin threads to specific cores. On NUMA systems, accessing memory on a remote node costs ~1.5–2× more latency — so co-locating threads with their data matters.
Quick check: Use
schedstatorperf schedto see if your workload is suffering from excessive migrations or run-queue imbalance.