13.5 Sweeping and Bitmaps
Once marking (13.4) finishes, any object still white is unreachable garbage. Reclaiming the memory it occupies and handing it back to the allocator is the last step of collection: sweeping. If we followed the textbook recipe of “walk the heap and free dead objects one by one,” the cost of sweeping would be proportional to the number of dead objects, and on a heap with millions of objects that is no small bill. Go’s sweep sidesteps that bill. Three of its design points are worth making clear: sweeping is done by flipping a bitmap rather than processing objects one at a time; it runs concurrently with the user program and lazily spreads its cost across the allocation path; and it does not move objects, so it accepts fragmentation and forgoes compaction. None of these three is accidental. Each corresponds to a definite engineering trade-off, and this section unpacks them one by one.
13.5.1 Sweeping Is a Bitmap Flip
The efficiency of reclamation has its roots in the allocator’s data structures. Section 12.2 explained that every span is sliced into equal-sized slots of one size class, and is paired with two bitmaps:
allocBits: which slots are allocated, used by the allocation path to find free slots.gcmarkBits: which slots were set during this round of marking, that is, which are live, written by the mark phase.
To sweep a span, the core action is surprisingly short: treat the mark bitmap directly as the new allocation bitmap.
| |
After the flip, a marked slot is still 1 in the new allocBits and continues to count as “in use”; an
unmarked slot is 0 in the new bitmap and becomes “free,” to be overwritten directly by the next allocation.
Dead objects neither have to be found nor cleared one by one. The source comment puts this state precisely: a
slot that is unmarked and not yet reallocated is “analogous to being on a freelist.” That is what the
“sweep-free” in this section’s framing means: dead objects need no explicit “cleanup” action; their slots are
simply reinterpreted as allocatable in the instant the bitmap flips.
Working out the cost shows where the saving comes from. Freeing object by object is ; the
bitmap flip compresses the sweep of one span into a constant number of pointer assignments plus a single
countAlloc, independent of how many objects are dead. Sweeping one span costs pointer operations
plus bit counting (countAlloc counts the 1 bits per 64-bit word). Once again the
regularity of size classes degrades reclamation into a bit operation, the same technique seen from the other
side when the allocation path uses allocCache to bit-scan for free slots
(12.2). The allocator and the collector share this pair of bitmaps, which is the
most concrete instance of what 12.1 called “allocation and GC living in symbiosis.”
There is a detail here that is easy to overlook and worth one sentence. The new allocation bitmap comes from
the mark bitmap, so what about slots that were already allocated but went unmarked this round? The source
comment explains it clearly: if a slot’s allocBits index is greater than or equal to freeindex and it is
unmarked, then it “has not been allocated since the previous GC,” so it was empty to begin with, and remains
empty after the flip, requiring no special handling. In other words, the bitmap flip correctly handles both
“objects that died this round” and “slots that were already empty last round”; in the new bitmap both fall
under “free,” treated alike.
Sweeping is not only the bitmap flip. Before the flip, sweep must first handle the special records
attached to dead objects: finalizers (runtime.SetFinalizer), weak handles, and heap profile samples. If an
object about to be freed has registered a finalizer, the sweep re-marks it as live
(setMarkedNonAtomic) and queues the finalizer for execution. This is the origin of “an object with a
finalizer is reclaimed one round later”: the finalizer must run when the object is “about to die,” and for it
to run, the object itself has to live one more round, only truly freed at the next sweep. Weak handles are
cleared before finalization, aligning with the semantics of the weak package. These are the semantic
details, beyond the bitmap flip, that sweeping must look after, but they do not change the trunk: the body of
sweeping is still an change of ownership of the bitmap.
13.5.2 Concurrent and Lazy
Sweeping runs concurrently with the user program. After mark termination, the GC enters the _GCoff
phase, the write barrier is turned off, and a background sweeper (bgsweep) is woken to advance slowly at
ordinary Goroutine priority, taking no STW time at all (for the cadence see 13.3). But relying
on the background sweeper alone would set up a race between sweeping and the next round of allocation: what if
allocation runs too fast and claims spans that have not been swept yet?
Go’s answer is to make sweeping lazy and on demand. It does not sweep every span the moment marking
ends; instead it breaks the sweep apart and spreads it across the allocation path: when some P’s mcache needs
to refill a span for a given size class, if the span it takes from the mcentral is not yet swept
(distinguished by sweepgen sweep generation, see 12.2), it sweeps it in
passing and then uses it. “Allocate and sweep a little”: whoever is about to use a span sweeps it clean
first. The background sweeper is only a backstop, advancing the remaining spans that the allocation path did
not happen to sweep, ensuring they are all swept before the next GC round begins.
“On demand” alone is not enough; we also need “on time.” Imagine the heap holds a large number of spans
waiting to be swept while the program allocates only sporadically, and the background sweeper runs slowly:
then when the next GC round triggers, sweeping is not finished, and two GC rounds overlap. To prevent this the
runtime introduces proportional sweeping: based on the size of the live heap from the previous round, it
estimates “for every byte allocated, how many pages should be swept,” storing it in sweepPagesPerByte;
before allocating a large object or refilling a span, deductSweepCredit “repays the debt” according to this
ratio, and if sweep progress lags behind allocation progress, it sweeps a few more spans on the spot to catch
up.
| |
This mechanism spreads the total cost of sweeping, in proportion, onto “the rate of producing garbage”: the harder the program allocates and the more there is to sweep, the more the allocation path takes on from the background. It reflects the Go runtime’s consistent approach of “spreading concentrated cost thin over the everyday path,” much like mark assist (mutator assist, where a Goroutine allocating quickly helps with marking, 13.4) and the write barrier spreading reachability maintenance over each pointer write (13.2). All three face the same problem: how to keep the progress of collection always catching up with the rate at which the user program creates work, without leaning on long pauses.
flowchart TD
MT["Enter _GCoff after mark termination"] --> WAKE["Wake the background sweeper (bgsweep)"]
MT --> ALLOC["User program keeps allocating"]
WAKE --> BG["Low-priority sweepone() one at a time"]
ALLOC --> NEED{"mcache needs to refill a span of this size class?"}
NEED -->|Take an unswept span from mcentral| ONDEMAND["Sweep this span on demand, then use it"]
NEED -->|Before allocating| CREDIT["deductSweepCredit: repay sweep debt proportionally first"]
BG --> DONE{"All spans swept?"}
ONDEMAND --> DONE
CREDIT --> DONE
DONE -->|Yes| READY["Next GC round can start safely"]
DONE -->|No| MORE["Keep sweeping the remaining spans"]The correctness of concurrent sweeping rests on sweepgen (the sweep generation), a global counter. It
encodes “what sweep state this span is in right now” as the difference between the span’s sweepgen and the
heap’s mheap_.sweepgen. Each GC round increments the heap’s sweep generation by 2, so:
| |
The background sweeper and some P’s allocation path may eye the same unswept span at once, both wanting to
sweep it. The runtime uses one atomic CAS to change sg-2 into sg-1 to claim the span
(sweepLocker.tryAcquire): whoever wins the CAS is responsible for sweeping it, and the other, seeing
sg-1, knows someone is sweeping and simply skips it; once done it atomically sets the span to sg. This
guarantees a span is swept only once per round, and that a span obtained by the allocation path is
necessarily in the “swept” state. Encoding a small three-state machine into the parity and difference of a
counter is a technique that recurs throughout concurrent runtimes, and it pairs with mcentral’s use of
[2]spanSet bucketed by sweep generation (12.2): bucketing uses the sweep
generation to decide which set a span goes into, and claiming uses the sweep generation to decide who sweeps.
13.5.3 Where a Span Goes and “Why Not Compact Fragmentation”
Once a span is swept, where it goes depends on how many live objects remain in it, which is exactly the part
omitted at the end of the previous section’s mspan.sweep:
| |
If countAlloc comes out as zero, the whole span has not a single live object, so it is returned to the
layer above as a unit: the pages go back to mheap’s page allocator (12.7),
counted into reclaimCredit, and those pages can then be re-sliced into spans of any size class, or even
returned to the operating system by the scavenger. This is the only chance a span ever has to step out of its
original size class: an empty span goes back to the heap, and its size-class membership is dissolved along
with it. If live objects remain, the span can only stay in its original size class and hang back onto
mcentral’s swept set depending on whether it still has free slots (note it is the “swept” spanSet, echoing
the sweep-generation bucketing of 13.5.2), for new objects of the same size class to fill the slots that have
opened up.
Here we touch the cost that mark-sweep GC cannot avoid: fragmentation. Go’s GC does not move objects (13.1). A free slot vacated by a dead object inside a span can only be filled by a new object of the same size class; if the whole span is not entirely empty, it still cannot be returned as a unit, and those free slots are “dead weight occupying the span.” In theory, compaction, which moves live objects together to free up contiguous space, could eliminate this fragmentation, but Go deliberately chooses not to. There are two reasons.
First, compaction has to move objects, and moving means finding and updating every pointer that points at
the moved object. In a language with unsafe and cgo, this is almost a dead end: external C code may hold
raw pointers into the Go heap, which the runtime cannot enumerate, let alone rewrite, and moving would
invalidate those pointers. What non-moving collection buys is precisely the guarantee that a pointer’s
address stays constant over the object’s lifetime, something cgo, unsafe.Pointer, and code that uses a Go
pointer as a map key all depend on.
Second, the benefit of compaction is held down inherently by size-class allocation. tcmalloc-style size-class allocation confines fragmentation to the controllable range of “free slots within the same size class” (worst internal fragmentation about 12.5%, 12.2), unlike general-purpose malloc, which can produce holes of arbitrary size. Since fragmentation is already bounded, the marginal benefit of compaction is not enough to offset its complexity and its damage to pointer stability.
So this is a clear-cut trade-off: forgo compaction in exchange for pointer stability and a simpler implementation, and use size classes to keep the cost of fragmentation within an acceptable range. Performance never comes for free: Go buys “coexistence with cgo” and “concurrent collection without moving objects” with “tolerating bounded fragmentation.” Laying the two paths side by side makes the contour of the trade-off clear:
| Dimension | Non-moving (mark-sweep, Go) | Moving (mark-compact / copying, most JVM collectors) |
|---|---|---|
| Allocation speed | Take a free slot (bit scan), fairly fast | Pointer bump (bump pointer), fastest |
| Fragmentation | Present, but confined by size classes to “same-class free slots” | Almost none (contiguous after compaction) |
| Pointer stability | Address constant, cgo / unsafe safe | Address changes, all pointers must be updated precisely |
| Implementation complexity | Lower, no moving or forwarding needed | Higher, needs object forwarding, read barriers, or STW relocation |
| With cgo raw pointers | Naturally compatible | Naturally in conflict |
Others have given different answers: generational copying GC (such as most JVM collectors) trades moving and compaction for the extreme speed of bump-pointer allocation and near-zero fragmentation, at the cost of having to precisely enumerate and update all pointers, and being naturally at odds with cgo-style raw pointers. Standing in the position of “a systems programming language that must interoperate with C,” Go can hardly give up pointer stability as a constraint, so non-moving is almost a choice forced by requirements rather than a performance oversight. The two paths have no rank order; they merely place the complexity in different places.
The matter of fragmentation is not yet closed. The Green Tea GC (13.11), experimental since go1.25/1.26, explores exactly the direction of “organizing scanning and reclamation at span/page granularity to improve memory locality”: it is not about introducing object movement, but about rearranging the granularity of scanning and reclamation so that cache and TLB behavior become friendlier. The design of sweeping and reclamation is still evolving; this section describes the stable form of go1.26, not the end of the road.
Further Reading
- The Go Authors. runtime/mgcsweep.go (the
gcmarkBits->allocBitsflip inmspan.sweep,sweepone,bgsweep, anddeductSweepCreditproportional sweeping). https://github.com/golang/go/blob/master/src/runtime/mgcsweep.go The original design of bitmap allocation and sweep-free allocation appears in Austin Clements. Proposal: Dense mark bits and sweep-free allocation. Go issue #12800, 2015 (go1.6 replaced the free list with direct bitmap allocation). https://github.com/golang/go/issues/12800 - The Go Authors. runtime/mcentral.go (the
[2]spanSetbucketed bysweepgenand on-demand sweeping). https://github.com/golang/go/blob/master/src/runtime/mcentral.go - Rick Hudson. Getting to Go: The Journey of Go’s Garbage Collector. ISMM 2018 Keynote. https://go.dev/blog/ismmkeynote
- Austin Clements, Rick Hudson. Proposal: Concurrent garbage collection (the design of concurrent sweeping and sweepgen). https://go.googlesource.com/proposal/+/master/design/17503-eliminate-rescan
- Richard Jones, Antony Hosking, Eliot Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. 2nd ed., CRC Press, 2023 (Chapter 2 “Mark-Sweep” and Chapter 3 on the trade-off of “mark-compact / moving vs non-moving”).
- Sanjay Ghemawat, Paul Menage. TCMalloc: Thread-Caching Malloc. https://google.github.io/tcmalloc/design.html (how size classes confine fragmentation to a controllable range).
- This book: 12.1 Design Principles, 12.2 Components, 13.1 Basic Ideas, 13.4 Marking, 13.11 Past, Present, and Future.