12.3 Initialization
12.2 broke the allocator down into a few parts: mcache, mcentral, mheap, and arena. But that was a
static picture. These parts do not fall into place on their own; they must be set up once, at the very start of the
program. This section answers the question: when main has not yet run and the first new has not yet happened, what
infrastructure does the runtime lay down for the allocator.
Understanding initialization is not a matter of paraphrasing mallocinit line by line. It is a matter of seeing two
facts that it establishes once and never changes again. First, how the address space is organized into arenas, which
decides how the runtime, given any heap address, looks up “which span it belongs to, whether it holds a pointer, and
whether it is still alive”; this is the bedrock on which garbage collection (13) works. Second, that
“reserving an address” and “committing physical memory” are two different things, which explains why the virtual memory
(VIRT) of a Go process is often far larger than its resident memory (RES), and why this is not a leak.
The allocator is one of the first subsystems to finish initializing, alongside the execution stack, called as
mallocinit from the scheduler’s bootstrap phase (3.5).
12.3.1 mallocinit: laying the foundation at startup
mallocinit does three kinds of work: a batch of self-checks on constants (arena sizes, bitmap word counts, and the
physical page size, all computed at compile time, must be self-consistent), initialization of the global heap mheap_,
and seeding a set of “arena growth hints” for the 64-bit address space. A trimmed sketch follows:
| |
That string of self-checks is not ceremony. Many of the allocator’s fast paths address by bit manipulation, on the
premise that these sizes are exactly powers of 2 and divide one another evenly. If some port or size change breaks the
premise, rather than let the program produce a baffling out-of-bounds error at runtime, it is better to throw on the
spot at startup. This is a stylistic habit of runtime code: write invariants that “should always hold” as startup-time
assertions, and catch errors at the earliest possible point.
The “growth hints” arenaHints in the third step deserve a few more words. Go does not ask the operating system for the
whole heap at startup. Instead it records a string of addresses meaning “in the future I would like to start growing the
heap from here.” By default it starts from 0x00c0... (on little-endian machines this shows up in memory as c0 00,
c1 00, …, recognizable at a glance when debugging), spreading out from the middle of the address space, which makes
it easy to extend the heap into a single contiguous region without colliding with other mappings. Actually asking the
operating system for an address waits until the first heap growth (12.7).
mheap_.init then initializes, one by one, the fixalloc (12.2) for the various fixed-size metadata
in the heap: mspan, mcache, and the various kinds of special records each have their own fixed-size allocator. One detail
here is worth pointing out:
| |
h.spanalloc.zero = false is one piece of evidence, surfacing as early as initialization, of the symbiosis between the
allocator and the GC (12.2): when an mspan’s memory is reclaimed and reallocated, it is not zeroed, so
that the sweepgen sweep-generation field survives across “release, reallocate” and the background sweeper does not
mistake a span that is in fact being reused for one that can be reclaimed. A switch that looks like it merely “saves a
memclr” is backed by a concurrency contract between the sweeper and the allocator.
12.3.2 arena: organizing the address space
The heap is not one solid block; it is assembled from many fixed-size arenas. On 64-bit Linux each arena is 64MB
(heapArenaBytes = 1 << logHeapArenaBytes, logHeapArenaBytes = 26), and the heap always requests addresses from the
operating system at arena granularity, aligned to arena boundaries. Why slice it into arenas, rather than treat the heap
as one unstructured stretch of contiguous memory? Because the runtime needs the ability to look up metadata from any
address in , and aligned, fixed-size chunking degrades that lookup into a few shifts and table reads.
Given a heap address p, the runtime first converts it into an arena number, then uses that number to index a two-level
table arenas:
| |
Why two levels rather than one? If a flat array covered the entire 64-bit addressable heap range
(heapAddrBits = 48), the pointer table alone would take up a substantial amount of memory, while the vast majority of
entries would stay empty forever. Two levels (arenas[L1][L2]) let the L2 subtables be allocated on demand: only
when an arena is actually mapped does its corresponding L2 subtable get created. On 64-bit non-Windows platforms,
arenaL1Bits = 0, the L1 dimension degrades to a single entry, and the index is effectively single-level; Windows
enables L1 only because its address space layout differs. This, again, is an engineering choice to “save memory for the
uncommon case.”
Each arena has one heapArena of metadata, stored outside the heap. It is where the answers to “look up” live:
| |
The spans array is the first link in the lookup chain: given an address p, first obtain the arena via arenaIndex,
then obtain the page number via (p / pageSize) % pagesPerArena, and look it up in spans to get the *mspan. Once the
span is in hand, “is p a pointer” and “is p alive” can be asked further from the span’s metadata (gcmarkBits in
12.2, and the heap bitmap for small objects). This chain, “address -> arena -> span -> object
metadata”, is the foundation that lets garbage collection make a scanning decision about a pointer in
(13). Drawn as an index diagram, the split into two segments is plain to see: the high bits select the
arena, the low bits select the page.
flowchart LR
P["heap address p"] -->|"arenaIndex(p)"| IDX["arena number"]
P -->|"(p/pageSize) % pagesPerArena"| PG["page number"]
subgraph T["arenas two-level index (L2 allocated on demand)"]
direction TB
L1["L1 dimension<br/>single entry on non-Windows"] --> L2["L2 subtable"]
end
subgraph HA["heapArena metadata (off-heap)"]
direction TB
SPANS["spans[pagesPerArena]<br/>page number -> *mspan"]
MARKS["pageInUse / pageMarks"]
end
IDX --> L1
L2 --> SPANS
PG --> SPANS
SPANS --> S["*mspan"]
S --> META["object metadata<br/>gcmarkBits, heap bitmap<br/>is it a pointer, is it alive"]On where pointer bitmaps are stored, Go made an important change starting in Go 1.22. Earlier each arena had a single bitmap covering the whole arena; now the pointer/scalar information for most small objects is instead stored inline with the object or in the span header (
heapBitsInSpan, the span’s malloc header), falling back to a separate bitmap only when necessary. This moved metadata from “spread evenly across the address space” to “stored close to the actual object”, improving locality. This book returns to this thread in 13 when it covers scanning. The reader need only remember: what the arena layer provides is a fast lookup from address to span, while the exact landing place of pointer information evolves across versions.
12.3.3 Reserve and commit: why VIRT far exceeds RES
Having carved out the concept of an arena, there is still a practical question to answer: how does the runtime obtain from the operating system the memory backing those arenas. There is a set of actions here that are easy to confuse but must be kept apart. Reserving address space (reserve) and committing physical memory (commit) are two different things. Reserving merely declares to the operating system that “this stretch of virtual address is mine”; it consumes no physical pages and is nearly free. Only just before some stretch of address is actually written to for the first time does the runtime commit it, letting the operating system allocate physical pages for it.
The runtime abstracts a memory region into four states, and all OS-level operations transition between these four states
(see runtime/mem.go):
stateDiagram-v2
[*] --> None
None --> Reserved: sysReserve (claim address, no physical memory)
Reserved --> Prepared: sysMap (move into committable region, account for it)
Prepared --> Ready: sysUsed (ensure safe to access)
Ready --> Prepared: sysUnused (madvise to return physical pages)
Ready --> None: sysFree (munmap, fully return)
Ready --> Reserved: sysFault (debugging the runtime only)
None --> Ready: sysAlloc (in one step, a whole block of zero memory)- None: the default state, neither reserved nor mapped; accessing it faults.
- Reserved: the address belongs to the runtime, but accessing it triggers a page fault, and it does not count toward the process’s memory footprint.
- Prepared: reserved, intended not to be backed by physical memory (the operating system may implement this lazily), and can transition into Ready efficiently; the result of accessing it at this point is undefined.
- Ready: safe to access.
In theory the three states None, Reserved, and Ready suffice to cover all platforms; Prepared is an extra state added for
performance. Its value is clearest on POSIX systems: Reserved is typically a private anonymous mmap with PROT_NONE,
and transitioning into Ready would normally require changing the permission to PROT_READ|PROT_WRITE. Prepared’s
“undefined” gives the runtime room: with MADV_FREE it can drop Ready back to Prepared. So the permission bits are set
just once early on, and from then on the runtime can tell the operating system at any time “feel free to take these
pages”, without repeatedly changing permissions.
This abstraction, brought down to Linux, is a few thin wrappers over mmap/madvise (runtime/mem_linux.go):
| |
The correspondence between the seven cross-platform primitives and the four states is: sysReserve claims the address
(None to Reserved), sysMap moves it into the committable region (Reserved to Prepared), sysUsed ensures it is usable
(Prepared to Ready), sysUnused returns physical pages (Ready to Prepared), sysFree fully munmaps (Ready to None),
sysFault sets it to fault unconditionally (Ready to Reserved, for debugging only), and sysAlloc is the shortcut that
gets a whole block of zero memory in one step (None to Ready). The cross-platform layer (mem.go) only does memory
accounting; the actual system calls land in each OS’s *OS implementation.
It should be noted that where “commit” lands differs across platforms. On systems that support overcommit, such as Linux,
after sysMap changes the permission to read-write, the physical pages are filled in by the kernel through lazy page
faulting at first access, and sysUsed (Prepared to Ready) is nearly a no-op. On systems with explicit commit and hard
quotas, such as Windows, sysUsed is the step that actually commits physical memory to the operating system. The four-
state abstraction adds a Prepared state precisely so that both kinds of platform can share the same upper-layer logic.
Now the practical question from the opening can be answered. A Go process’s VIRT is often several GB or even hundreds of
GB, far exceeding the RES it actually occupies, because the runtime reserves large stretches of arena address space
(Reserved/Prepared) but commits physical memory (Ready) only for pages that have actually been written. Reserving
does not count toward the physical footprint, so this is not a leak, but the normal expression of the “claim addresses
cheaply, pay for physical pages on demand” strategy described above. Likewise, when the heap shrinks, the runtime uses
sysUnused (MADV_FREE) to return physical pages to the kernel, while the address stays reserved. This is also why a
drop in RES sometimes lags behind the heap’s actual shrinkage: MADV_FREE lets the kernel “reclaim when it gets around
to it”, and the pages are not actually returned until memory grows tight.
12.3.4 After initialization: the infrastructure in place
When mallocinit returns, the skeleton of the allocator has taken shape, even though at this moment not a single arena
in the heap has actually been committed. The following are in place:
- The page allocator
mheap_.pages(12.7) is initialized, ready to answer “which stretch of contiguous pages is free” at the first heap growth. - One mcentral per size class (the
h.centralarray) is in place, each serving onespanClass, waiting for an mcache to come exchange a span (12.2). - The arena two-level index
arenashas its skeleton built (the L2 subtables not yet allocated), andarenaHintshas seeded a string of growth hints. - The first mcache (
mcache0) has been carved byallocmcachefrom non-GC memory viafixalloc, hanging for now on the bootstrap thread. It will be handed off to some P inprocresize(9.3), and from then on the mcache’s lifetime is bound to a P, not to an M.
In other words, initialization establishes the “containers and indexes”, not the “contents”. The commit of the first arena and the carving of the first span are both deferred until the first allocation actually happens (12.4–12.6). This arrangement of “build the lookup structures first, load the contents lazily” is the same idea reused at different levels: the arena two-level table allocating L2 on demand, and the address being reserved while physical memory is committed on demand. Put off the expensive action as much as possible, and pay the cost only when it is known to be necessary.
Further reading
- The Go Authors. runtime/malloc.go (
mallocinit, arena constants,arenaHintsandarenaIndex). https://github.com/golang/go/blob/master/src/runtime/malloc.go - The Go Authors. runtime/mheap.go (
mheap.init,heapArena, thearenastwo-level index,spanslookup). https://github.com/golang/go/blob/master/src/runtime/mheap.go - The Go Authors. runtime/mem.go (the None/Reserved/Prepared/Ready four-state abstraction and the seven
sys*primitives). https://github.com/golang/go/blob/master/src/runtime/mem.go - The Go Authors. runtime/mem_linux.go (the implementation of
sysReserve/sysMap/sysUnusedand others overmmap/madvise). https://github.com/golang/go/blob/master/src/runtime/mem_linux.go - The Go Authors. runtime/mbitmap.go (
heapBitsInSpan,writeHeapBitsSmall,gc.MinSizeForMallocHeader,span.heapBits(), the implementation that, starting in Go 1.22, moved the pointer bitmap from per-arena to inline/span-header storage). https://github.com/golang/go/blob/master/src/runtime/mbitmap.go - This book: 3.5 Scheduler Initialization, 12.2 Components, 12.7 Page Allocator, 13 Garbage Collection.