Process Scheduling

CFS, real-time priorities, and how the Linux kernel decides who runs next.

schedulingcfslinuxreal-time

The Scheduling Problem

With N runnable threads and M cores (N >> M), the scheduler must decide:

  • Who runs next?
  • For how long before preemption?
  • On which core (affinity, NUMA)?

Linux CFS (Completely Fair Scheduler)

CFS models an ideal multitasking CPU where every task gets an equal share of CPU time. It tracks each task’s virtual runtime (vruntime) — the less CPU a task has used, the more it deserves.

Tasks live in a red-black tree keyed by vruntime. The leftmost node (smallest vruntime) runs next. After each time-slice, vruntime is incremented proportionally to the task’s weight (derived from its nice value).

Real-Time Scheduling

Linux offers two real-time policies above CFS:

PolicyBehavior
SCHED_FIFORun until yield/block; highest-priority FIFO wins
SCHED_RRSame, but with round-robin within the same priority

Real-time tasks always preempt CFS tasks.

Core Affinity & NUMA

taskset and sched_setaffinity() pin threads to specific cores. On NUMA systems, accessing memory on a remote node costs ~1.5–2× more latency — so co-locating threads with their data matters.

Quick check: Use schedstat or perf sched to see if your workload is suffering from excessive migrations or run-queue imbalance.