Go under the hood
Go: Under the Hood

Chapter 14 Execution Stack Management

Goroutines are cheap enough that millions can coexist at once (9.3), and one key reason hides in their stack. An operating system thread’s stack usually reserves a fixed block of megabytes right from the start; a million of them would reach the terabyte range, which is simply infeasible. A Go goroutine’s stack starts at only 2KB and grows on demand. This “small but growable” stack is the physical foundation of goroutine cheapness. This chapter explains how it is designed, how it is allocated, how it grows and shrinks, and the design evolution behind it from segmented stacks to contiguous stacks.

To make the stacks small, Go's run-time uses resizable, bounded stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time grows (and shrinks) the memory for storing the stack automatically, allowing many goroutines to live in a modest amount of memory.
-- The Go Authors, "Go FAQ: Why goroutines instead of threads?"

Make the stack small, then let it resize on demand: behind that single sentence lies a whole set of design choices. The stack is managed by the runtime on the heap rather than tied to a thread; the checks and preemption are compressed into a single comparison in the function prologue; growth relies on copying an entire segment, and shrinking is handled along the way by the garbage collector. This chapter follows that thread to take apart execution stack management, from the design trade-offs of contiguous stacks, through allocation, growth, copying and pointer adjustment, to shrinking and the cross-system coordinates of its evolution.