Chapter 9 The goroutine Scheduler on Go: Under the Hood

9.1 The Scheduling Problem and the GMP Model

Mon, 01 Jan 0001 00:00:00 +0000

9.1 The Scheduling Problem and the GMP Model

Write go f() and a goroutine starts running. Behind that one line sits the most intricate machine in the Go runtime: the scheduler. It has to answer a question that is not at all simple: how can tens of thousands of goroutines take turns on a small handful of CPU cores, running fast while letting the user barely notice it is there. This section first lays out the problem the scheduler must solve, its overall skeleton, and its place in the larger family of concurrent runtimes. Later sections then go deep into each component.

9.2 Work-Stealing Scheduling

Mon, 01 Jan 0001 00:00:00 +0000

9.2 Work-Stealing Scheduling

9.1 left us a question: each P has its own local queue, so work is bound to be distributed unevenly. Some Ps are overwhelmed while others sit idle. How to spread the load without introducing a central bottleneck is the core difficulty of concurrent scheduling. Go’s answer is a design with thirty years of theory behind it, one that recurs throughout the industry: work stealing.

This section goes a bit deeper than the rest. We first make clear what Go does, then trace back to the scheduling theory behind it (why it is “provably good”), then look across its different incarnations in systems such as Cilk, Java, and Rust, and finally stop at the questions that remain open.

9.3 The MPG Model and the Units of Concurrent Scheduling

Mon, 01 Jan 0001 00:00:00 +0000

9.3 The MPG Model and the Units of Concurrent Scheduling

The first question a scheduler must answer is not “how to schedule” but “what to schedule.” Go calls the thing being scheduled a goroutine, and carries it on a triple of M, P, and G. Before we get our hands on the scheduling algorithm (from 9.4 onward), this section settles these three scheduling units: what a goroutine really is within the lineage of computer science, how its running context is encoded, why scheduling itself has to happen on a special g0, what states a goroutine passes through over its life, and how the worker thread M that carries it is parked and unparked. Once these few things are clear, the scheduling algorithms that follow are just “moving G around among these units.”

9.4 The Scheduling Loop

Mon, 01 Jan 0001 00:00:00 +0000

9.4 The Scheduling Loop

The previous sections laid out the materials: we know what G, M, and P are (9.3), and we know how an M finds work (9.2). This section actually sets them spinning, watching how the scheduling loop ceaselessly picks and runs goroutines on a single thread, and how it strikes a balance between “letting a single goroutine run a little longer” (throughput and locality) and “not letting any goroutine starve” (fairness).

9.5 Thread Management

Mon, 01 Jan 0001 00:00:00 +0000

9.5 Thread Management

9.1 laid down the three-layer GMP structure: G is the user-level unit of execution, P is the scheduling permit and the carrier of local resources, and M is the leg actually borrowed from the operating system. The previous sections dwelt mostly on G and P; this section turns its attention to M, and answers several questions we have kept deferring: what M actually is, where it comes from, why GOMAXPROCS caps P while the thread count is often larger, why a single blocking system call does not drag down the other G alongside it, and what price the runtime pays when a user wants to pin a goroutine to a specific thread (LockOSThread).

9.6 Signal Handling

Mon, 01 Jan 0001 00:00:00 +0000

9.6 Signal Handling

Operating system signals are asynchronous and low-level: a signal may interrupt any thread at any moment, and what the signal handler is allowed to do is severely constrained. What Go users want, on the other hand, is usually to wire a channel to SIGINT with signal.Notify and shut down gracefully when it arrives. What the runtime has to do is build a bridge between these two: turn a treacherous asynchronous signal into an event a goroutine can consume in peace. Every design choice on this bridge is governed by one hard constraint: what can be done inside a signal context. Once you understand this constraint, the rest of this section’s mechanics are merely its corollaries.

9.7 Cooperation and Preemption

Mon, 01 Jan 0001 00:00:00 +0000

9.7 Cooperation and Preemption

In 9.5 The Scheduling Loop we left an open question: if some G runs for too long, how can other G’s still get scheduled? The answer cannot avoid an old pair of concepts from scheduling theory, cooperative versus preemptive. Cooperative scheduling relies on the scheduled party voluntarily yielding; preemptive scheduling relies on the scheduler interrupting the scheduled party from the outside.

The Go runtime has nothing like the hardware interrupt capability of an operating system kernel. The work-stealing scheduler (9.2) is essentially first-come-first-served cooperative scheduling. How it can still forcibly interrupt a G that refuses to yield, without sacrificing this premise, is the design this section sets out to make clear. The thread starts from a theoretical question: by what right can the runtime not stop a goroutine at an arbitrary instruction?

9.8 System Monitoring

Mon, 01 Jan 0001 00:00:00 +0000

9.8 System Monitoring

The scheduler’s ordinary path was laid out in 9.4 The Scheduling Loop: one M binds to one P, takes a Goroutine off the queue, runs it, then takes the next. This path rests on one premise, that it gets a chance to run at all. Yet once all the Ps are mired in long system calls, or some Goroutine spins forever and holds a P hostage, ordinary scheduling seizes up. No one takes the P back, and no one polls the network. Put differently, cooperative logic that runs on a P cannot deal with the situation where “the P itself cannot move.”

9.9 The Network Poller

Mon, 01 Jan 0001 00:00:00 +0000

9.9 The Network Poller

Source facts verified against src/runtime/netpoll.go and its per-platform implementations (netpoll_epoll.go, netpoll_kqueue.go, and so on) and src/internal/poll/fd_unix.go.

Go’s network code looks blocking: conn.Read simply “sits” there waiting for data. But if it truly blocked the operating system thread it runs on, then ten thousand goroutines waiting on the network would tie up ten thousand threads, and the M:N model that 9.1 worked so hard to build would collapse in an instant. What lets the blocking style still scale is the network poller (netpoller). Behind it lies a long history about “how to tend a vast number of connections with a handful of threads.” This section first lays out that history and the design axes behind it, then looks at how Go hides a mature event mechanism inside the runtime so that you write synchronous code and run event-driven I/O.

9.10 Timers

Mon, 01 Jan 0001 00:00:00 +0000

9.10 Timers

time.Sleep, time.After, time.Timer, time.Ticker, and even SetDeadline on network reads and writes all rest on the same timer machinery. It has to answer a question that looks simple but is in fact subtle, a question about data structures: when thousands of timers exist at once, how do we efficiently know “who should be woken next, and when,” without burning a thread to do it. This section starts from that abstract problem, makes the trade-offs of the various solutions clear, and then lands on Go’s choice and how it evolved.

9.11 NUMA Awareness and the Future of the Scheduler

Mon, 01 Jan 0001 00:00:00 +0000

9.11 NUMA Awareness and the Future of the Scheduler

The scheduler described in the preceding sections rests on an assumption that was never spelled out: every M reaches memory equally fast, and moving a G between any two Ps costs the same. On a laptop or a single-socket server, that assumption is nearly true. But once you put the program on a large multi-socket server it begins to break, and the more cores there are, the wider the gap. This section is about that crack: where it comes from, why the Go scheduler has looked past it for so long, a NUMA-aware design that was carefully thought through yet never shipped, and how users today work around it.