9.6 Signal Handling
Operating system signals are asynchronous and low-level: a signal may interrupt any thread at any
moment, and what the signal handler is allowed to do is severely constrained. What Go users want,
on the other hand, is usually to wire a channel to SIGINT with signal.Notify and shut down
gracefully when it arrives. What the runtime has to do is build a bridge between these two: turn a
treacherous asynchronous signal into an event a goroutine can consume in peace. Every design choice
on this bridge is governed by one hard constraint: what can be done inside a signal context. Once
you understand this constraint, the rest of this section’s mechanics are merely its corollaries.
9.6.1 Async-signal safety: almost nothing can be done in a handler
A signal handler runs in an interrupted, indeterminate context. The interrupted thread might at
this very moment be holding malloc’s lock, sitting in the middle of some data structure’s
half-updated state, or even rewriting allocator metadata. The moment the handler calls malloc
again or acquires the same lock, it deadlocks itself, or reads a half-finished state and crashes.
POSIX therefore mandates that a signal handler may only call async-signal-safe functions, a
very short whitelist (see signal-safety(7)): system calls like write, _exit, and sigaction,
which touch neither locks nor the allocator, are on it, while malloc, printf, most pthread
locking, and indeed most of the C standard library are not.
This constraint forces out an iron rule shared by every runtime that supports signals:
Do only the minimum that absolutely must be done in the handler, and defer the real processing to somewhere safe.
This is the origin of the classic technique, the self-pipe trick (the paradigm W. R. Stevens
gives in APUE): the handler does nothing but write a single byte to a pre-built pipe; the main
event loop (select/poll) watches the other end of the pipe, so “a signal arrived” is demoted to
an ordinary “the pipe is readable” event, and the subsequent processing returns to the unconstrained
normal context. Linux later folded this technique into the kernel with signalfd(2), letting
signals be consumed directly by epoll in the form of a file descriptor.
Go takes a variant of the same idea, only it replaces that “pipe” with a lock-free queue inside
the runtime (9.6.4). The
reasoning is also very Go: each delivery through a self-pipe costs a write system call, while Go’s
signal handling is deeply coupled with the scheduler and garbage collector, so using an in-process
atomic state machine to “write a bit and wake a waiter” is cheaper than repeatedly entering and
leaving the kernel, and more controllable too. The remaining subsections follow this iron rule and
see how Go implements it across the handler, the alternate stack, the queue, and the dedicated
goroutine.
9.6.2 gsignal: a safe stack the handler carries with it
The first place the iron rule lands is giving the handler a dedicated stack. A signal may arrive
just as a user goroutine’s stack is about to run out; if the handler still ran on this tight stack,
it could easily trigger a stack overflow. Worse, Go’s stacks are growable
(14.6), and stack growth itself has to allocate and to lock, exactly
the operations a handler must not touch. The solution is POSIX’s sigaltstack(2): register an
alternate signal stack for the thread in advance, and the kernel automatically switches to this
stack when dispatching a signal to run the handler.
Go gives each M a special goroutine named gsignal, whose stack is this M’s alternate signal
stack. gsignal is created during the mcommoninit phase via mpreinit, and apart from g0
(9.3) it is the first g each M owns. It does not participate in scheduling, has no goid
in the sense of user code, and exists for the sole purpose of “carrying signal handling”:
| |
After the M enters mstart, minit calls minitSignalStack, which uses sigaltstack to register
gsignal.stack as this thread’s alternate signal stack. There is a careful touch here prepared for
cgo: if the thread already had an alternate signal stack set by non-Go C code (the case where a
non-Go thread calls back into Go), the runtime does not crudely overwrite it but instead adopts the
existing stack as gsignal’s stack, and restores it as-is in unminit. Letting each M manage its
own copy of “which stack signals are handled on” is exactly the precondition for keeping signal
handling and goroutine scheduling out of each other’s way.
9.6.3 The handler only enqueues; a goroutine dispatches
With the stack ready, the next step is to install the handler and split the flow when a signal
arrives. Go installs a unified entry point for each signal it cares about during the initsig
phase. Note that the kernel does not call back into sighandler itself, but into an assembly
trampoline sigtramp: when a signal arrives, the kernel jumps into sigtramp using the C calling
convention, which saves the context, switches into the Go runtime world, and then calls
sigtrampgo, which switches the current g to this M’s gsignal and finally enters the real
sighandler. This layer of trampoline exists because the kernel knows nothing of Go’s g/m/p
abstraction, so someone has to “translate” the execution environment first.
| |
sigfwdgo is the key to coexisting with native libraries: not every signal should be handled by Go.
If some signal was originally registered with a handler by the user’s C library, Go forwards it back
rather than claiming it for itself. We will return to this in
9.6.5.
The real splitting happens inside sighandler. Once it quickly determines the kind of signal, it
handles it along three destinations:
flowchart TD
OS[Operating system delivers a signal] --> TR["sigtramp (assembly trampoline)<br/>translate calling convention, switch to gsignal stack"]
TR --> SH["sighandler<br/>split on the gsignal alternate stack"]
SH -->|"synchronous signal<br/>SIGSEGV / SIGFPE / SIGBUS"| PANIC["preparePanic: forge a call to sigpanic<br/>turn into a recoverable panic"]
SH -->|SIGURG| PREE["doSigPreempt: trigger asynchronous preemption (see 9.7)"]
SH -->|SIGPROF| PROF["sigprof: sample the PC for pprof"]
SH -->|"needs user handling<br/>SIGINT / SIGTERM ..."| Q["sigsend: write to the lock-free queue and return immediately"]
Q --> SG["dedicated goroutine blocked in signal_recv pulls it out"]
SG --> CH["delivered to the channel registered via signal.Notify"]
CH --> USER["user consumes it gracefully with select"]First, synchronous signals. SIGSEGV (null pointer dereference), SIGFPE (division by zero),
and SIGBUS are raised right on the spot by the current thread’s own illegal operation, with a
direct causal relation to the interrupted code. The runtime does not let the process crash silently
but turns them into Go panics: preparePanic rewrites the stack and PC at the interruption point to
make it look as if “the point that errored called sigpanic”, and once the handler returns and
control flow returns to the user g, it throws from sigpanic a runtime error that recover can
catch. The decision relies on the _SigPanic flag in sigtable, and is taken only when the signal
really comes from the kernel (rather than a user kill) and the interrupted thing really is a user
g; otherwise it can only throw:
| |
Second, signals for the runtime’s own use, handled on the spot. SIGPROF is the source of
pprof’s timer sampling (16 Tooling and Observability): the handler
grabs a PC at the interruption point, hands it to sigprof to record, and returns. SIGURG is the
carrier of asynchronous preemption (9.7): the handler calls doSigPreempt to
“inject a preemption request” onto the interrupted g and returns. Note that even when the preemption
signal hits, the handler does not hog it but continues down through the rest of the split, because a
single SIGURG may arrive merged with another signal.
Third, signals that need to be handed to the user, the ones that take that bridge. SIGINT,
SIGTERM, SIGHUP, and the like, which carry the _SigNotify flag or were indeed sent by a user
kill, are handled by the handler only calling sigsend to stuff them into the lock-free queue and
returning immediately, never touching a channel, a lock, or allocation inside the handler. The
actual delivery is left to the dedicated goroutine at the other end of the queue.
The decision among these three is entirely encoded in sigtable’s flag bits, one row per signal:
| |
9.6.4 sigsend: a lock-free queue and a dedicated receiver goroutine
The two ends of the bridge are sigsend (the producer, running inside the handler) and
signal_recv (the consumer, running inside a dedicated goroutine), with a process-global sig
struct in between. Because the producer end lives in the cage of async-signal safety, this queue
must be lock-free and non-allocating: the whole table is a fixed-size bitmap, and the state
machine is driven by atomic CAS:
| |
What sigsend does is restrained: it sets the signal’s corresponding bit into sig.mask with CAS,
then uses a three-state state machine to decide whether to wake the receiver. These three states of
state are the core of the whole synchronization; they let “write a bit” and “wake a sleeper” work
together correctly without each having to take a lock:
| |
At the other end of the queue, signal_recv runs in an ordinary goroutine free of the
async-signal-safety constraint, free to sleep and be woken. It first swaps sig.mask as a whole
into its own local copy recv, returns them bit by bit; when the local copy empties, it switches
the state to sigReceiving and sleeps on sig.note, waiting for the next sigsend to wake it.
This in-and-out is exactly symmetric:
| |
The last stage is in user space, taken over by the os/signal package. On the first Notify it
lazily starts a goroutine running loop, which repeatedly calls signal_recv to pull signals and
then process to dispatch them to the user’s registered channels:
| |
At this point, the signal that was born in an asynchronous, constrained context has become an
ordinary channel receive. The user only needs signal.Notify(ch, os.Interrupt) and then <-ch to
handle it gracefully with the most familiar select. Note that process’s delivery is
non-blocking: when the channel is full, the signal is dropped. This is a deliberate trade-off:
better to miss a signal than to let the dispatch loop be dragged to death by a sluggish consumer.
This is also why the channel for signal.Notify should usually be buffered.
9.6.5 Trade-offs against other runtimes, and a practical side effect
The relationship between signals and managed runtimes has always been delicate, and the root is
this: the runtime wants to use signals, and so do the user and native libraries, yet within a
process there is only one handler for any given signal. The JVM likewise has to take over SIGSEGV
(for null checks and the page-protection trap of safepoint polling), SIGBUS, and so on, and for
this it provides signal chaining (libjsig): it records the handler that existed before it took
over, and chains-forwards back to it when it encounters a case that is not its own. Go’s sigfwdgo
(9.6.3) solves the same problem, only in
the opposite direction: before Go installs its own handler it stores the old one in fwdSig, and
forwards when needed. The two mechanisms arrive at the same place by different routes, both so that
the runtime and native libraries can coexist peacefully on the same signal.
Precisely because signals are a scarce shared resource, Go is especially careful when picking “which
signal” for asynchronous preemption. The runtime source lists a string of heuristic conditions: it
must be passed through by debuggers by default, not be used internally by libc in mixed binaries, be
able to fire spuriously without harm, and be available on platforms without real-time signals (such
as macOS). SIGUSR1/SIGUSR2 are out because applications often give them real meaning; SIGALRM
is out because it is impossible to tell whether a real timer fired. The final choice is SIGURG: it
nominally reports out-of-band data on a socket, and out-of-band data is used by almost no one, and it
does not even tell you “which socket”; it is nearly obsolete to begin with, and even an application
that does use it must tolerate it arriving spuriously. Choosing a signal “harmless enough to send at
will” is the finishing touch of this design.
This set of choices brings a practical side effect that is often overlooked. Since Go 1.14
introduced asynchronous preemption (9.7), a busy program receives SIGURG
frequently, and a signal interrupts a slow system call that is in progress, making it return with
EINTR. POSIX’s SA_RESTART flag can make some interrupted system calls restart automatically, and
Go does set it when installing the handler, but not all system calls are restartable (such as
poll, certain read). Therefore correct Go code, and the syscall wrappers it depends on, must
be able to recognize and retry EINTR. This is a visible cost paid “for the sake of
preemptibility”, and it reminds us that every choice in the signal mechanism ripples all the way
along the system call to user-observable behavior. Gains in performance and capability never come
for free; they always leave a corner elsewhere that needs tending.
9.6.6 Summary
To string this section together: the hard constraint of async-signal safety
(9.6.1) forces out the iron rule
of “do only the minimum in the handler”; the rule landing on the parts gives each M’s own gsignal
alternate stack (9.6.2), sighandler’s
three-way split (9.6.3), and the lock-free
queue with a dedicated goroutine that “demotes” user signals into channel events
(9.6.4). This two-stage pattern
is exactly the same as the asynchronous preemption of 9.7: inject only the
minimal step in the constrained signal context, and push the real work back to safe ground to
finish. Once you understand this, the next section’s asynchronous preemption is just the same
technique applied once more, to scheduling.
Further Reading
- W. Richard Stevens, Stephen A. Rago. Advanced Programming in the UNIX Environment, 3rd ed. Addison-Wesley, 2013. (Async-signal safety, the self-pipe trick, the authoritative treatment of signal handling.)
- The Linux man-pages project. signal-safety(7) (the whitelist of async-signal-safe functions); sigaltstack(2); signalfd(2). https://man7.org/linux/man-pages/man7/signal-safety.7.html
- The Go Authors. runtime/signal_unix.go, runtime/sigqueue.go (
sighandler,sigtramp/sigtrampgo,sigsend/signal_recv, the selection comment forsigPreempt = _SIGURG). https://github.com/golang/go/blob/master/src/runtime/signal_unix.go - The Go Authors. os/signal package documentation and src/os/signal/signal_unix.go (
Notify/loop/process). https://pkg.go.dev/os/signal - The Go Authors. Proposal: Non-cooperative goroutine preemption (#24543, the design motivation
for
SIGURGand asynchronous preemption). https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption - Oracle / OpenJDK. Signal Chaining (libjsig) (industrial practice for a managed runtime sharing signals with native libraries). https://docs.oracle.com/en/java/javase/21/docs/specs/man/java.html
- This book: 9.3 Scheduler Components, 9.7 Cooperation and Preemption, 14.6 Stack Management.