16.6 Runtime Metrics
Profiling (16.5) and tracing (16.3) belong to the family of “pull it in to diagnose when something goes wrong” tools: they cost a fair amount, produce a lot of output, and suit the case where you already suspect a problem somewhere and want to capture a window of time or a single request to look at closely. But in production the more common question is not “profile this for me right now”; it is “has this service been healthy over the past week, when did it start degrading, and should we page someone at midnight?” Answering questions like these does not rely on a one-off deep dive. It relies on continuous, low-cost, long-retainable numbers: how big the heap is, how often GC runs, whether the goroutine count is climbing, whether the tail of scheduling latency is rising. These numbers are runtime metrics.
The difference between metrics and profiles is at heart “aggregate vs. detail” and “always-on vs. on-demand.” A profile attributes every allocation, every slice of CPU, to a specific call stack; it carries a lot of information and is expensive to collect. A metric keeps only an aggregated scalar or distribution; a single read is nearly free, so you can sample it every few seconds and keep sampling for months. This section explains the two interfaces Go exposes for metrics, how they evolved, and how metrics plug into a complete observability stack.
16.6.1 From MemStats to runtime/metrics
Historically, the only entry point for a program to read runtime memory metrics was runtime.ReadMemStats, which fills a MemStats struct (12.8):
| |
This interface has three hard limitations. First, the fields are hard-wired into the struct: for the runtime to expose a new metric, it has to add a field to MemStats and change a public API, which is bound by Go’s compatibility promise and so is done cautiously, leaving many internal states with no way out at all. Second, to obtain a self-consistent snapshot, ReadMemStats briefly stops the world on every read, a cost that is not negligible under high-frequency collection. Third, the pause information amounts to a single PauseNs ring array plus cumulative totals; to know “what is the P99 of the pauses” you have to compute it yourself from the raw array, and that array only holds the most recent 256 entries, having long since dropped the long-term distribution.
Go 1.16 introduced the runtime/metrics package to address these problems systematically. It changes metrics from “struct fields” into “name + value” key-value pairs: each metric is identified by a string key of the form /gc/heap/allocs:bytes, where the key is a path and a unit separated by a colon (encoding the unit into the key is deliberate: if the unit changes, the semantics most likely change too, and that should be a new key). Adding a metric is just adding a row to the runtime’s metric table, touching no existing API, which lets the metric set evolve freely along with the runtime, and even allows different Go implementations to expose different metric sets.
The reading interface is built around three types. Sample is a “name + value” slot for one sampling; Value is a type-tagged union; ValueKind records whether the value is a uint64, a float64, or a histogram:
| |
Reading means first filling in the names you want to read, then handing them to metrics.Read to fill the values in batch. The runtime promises that the Kind of a given metric is guaranteed not to change, so the caller can safely assert the type of a known metric directly; only when a specified name does not exist will the corresponding Value be KindBad:
| |
Unlike MemStats, Read does not need a global stop-the-world to fetch scalars; a single collection is light enough to put inside a goroutine that fires every second and runs indefinitely.
16.6.2 Distribution Over Average: Histograms and Tail Latency
The most valuable step runtime/metrics takes over MemStats is turning a class of metrics into distributions rather than scalars. Scheduling latency /sched/latencies:seconds, the various STW pauses /sched/pauses/total/gc:seconds, and heap allocation sizes /gc/heap/allocs-by-size:bytes all have values that are a Float64Histogram:
| |
Why a distribution? Because for metrics like latency, the average lies. A service with an average scheduling latency of 50 microseconds sounds fine, but if 1% of goroutines wait 10 milliseconds before being scheduled onto a CPU, the average flattens this long tail completely, and it is precisely that 1% that determines the stutter the user feels. Monitoring latency means watching quantiles (P50, P99, P999), and a quantile can only be estimated from a distribution, never derived back from an average. The histogram exists for exactly this: it preserves the shape of the entire distribution at bucket granularity.
Estimating a quantile from a histogram means accumulating counts along the buckets and finding the bucket where the cumulative fraction first crosses the target quantile:
| |
Bucket boundary granularity determines estimation accuracy. The runtime chose a log-linear bucket distribution for latency metrics: the high bits split into “super-buckets” of different magnitudes by exponent, and each super-bucket is then linearly subdivided into several sub-buckets, so that across latencies spanning several orders of magnitude it keeps a roughly constant relative resolution. This contrasts with MemStats.PauseNs, that ring array which only stores the most recent 256 raw values: the histogram loses no history, has no count limit, and naturally supports merging across processes and across time windows (just add two histograms together), exactly the form a monitoring system is happy to consume.
16.6.3 Metrics Are Each Subsystem’s Window to the Outside
The metrics runtime/metrics exposes cover nearly every subsystem this book has dissected, and reading these keys is reading the runtime’s situation at this very moment. By path prefix they fall roughly into a few families:
/gc/*and/memory/classes/*: the panorama of garbage collection and memory (12, 13)./gc/heap/live:bytesis the live heap marked by the previous GC,/gc/heap/goal:bytesis the target heap size for the current cycle, and their ratio is exactly the quantity the GC pacer (13.4) is regulating; differencing/gc/cycles/total:gc-cyclesover time gives the GC frequency;/memory/classes/total:bytesis all the memory the runtime maps from the system, and it is the number theGOMEMLIMITsoft cap actually watches./sched/*: the scheduler’s state (9)./sched/goroutines:goroutinesis the number of live goroutines, and a continuous rise is often a sign of a goroutine leak;/sched/latencies:secondsis the scheduling latency distribution discussed above;/sched/gomaxprocs:threadsis the currentGOMAXPROCS./sched/pauses/*and/cpu/classes/*: quantifying GC’s interference with the application./sched/pauses/total/gc:secondsis the distribution of STW pauses caused by GC (the old key/gc/pauses:secondsis deprecated and points to it);/cpu/classes/gc/total:cpu-secondsestimates the CPU consumed by GC, and comparing it with/cpu/classes/total:cpu-secondstells you GC’s CPU tax rate./sync/mutex/wait/total:seconds: the cumulative time goroutines have spent blocked onsync.Mutex/sync.RWMutexand the runtime’s internal locks; taking its rate gives a rough view of whether global lock contention is worsening, and for a closer look you switch to mutex profiling (16.5).
None of these metrics is a side channel tailored for some particular tool; they are the standard outlets through which the runtime opens up its internal counters. metrics.All()
returns at any time the complete list of Description entries supported by the current version (with name, English description, Kind, and whether the metric is cumulative),
from which you can discover the metric set dynamically at runtime rather than hard-coding it, which is exactly how an interface designed for version compatibility ought to be used.
16.6.4 Plugging Into the Observability Stack
Reading the metrics out is not enough on its own; you have to collect them continuously, store them remotely, visualize them, and alert on them to form an operable pipeline. Go provides two layers of entry points along this pipeline.
The lightest layer is the standard library’s expvar. It hangs variables in JSON form on an HTTP endpoint (default /debug/vars),
and at package initialization it has already Published memstats (a JSON rendering of a runtime.MemStats) and cmdline.
Simply importing it anonymously in your program gets you a memory metrics endpoint for free:
| |
expvar wins on zero dependencies and being available at a moment’s notice, suiting debugging and lightweight introspection; but its JSON format is not the common language of monitoring systems, and it does not directly support histograms and labels. The layer more commonly used in production is the Prometheus client library: the official
client_golang includes a built-in collector that translates runtime/metrics metrics (histograms included) into Prometheus’s
text format, hung on the conventional /metrics endpoint:
| |
Downstream the pipeline is standardized: Prometheus scrapes /metrics on a schedule and stores the time series in its own database; Grafana connects to Prometheus
to draw dashboards; alerting rules (such as “/sched/goroutines doubles within five minutes” or “GC pause P99 exceeds 10ms”) are
triggered by Prometheus’s Alertmanager. In the cloud-native era, “the Go service exposes /metrics, Prometheus scrapes,
Grafana displays” is almost the default assembly. Go’s position in it is clear: provide a lightweight, extensible, whole-runtime-covering
metrics source, and leave the storage, querying, and alerting, which are language-agnostic, to mature external systems.
16.6.5 The Three Pillars of Observability, and Logs
Sorting this chapter’s diagnostic tools by the industry’s “three pillars of observability” makes the whole diagnostic map clear:
| Pillar | Question answered | Form | Implementation in Go |
|---|---|---|---|
| Metrics | how is the system overall, what is the trend | continuous aggregated numbers / distributions | runtime/metrics, expvar (this section) |
| Traces | what exactly happened this one time, where is it slow | a timeline of events | execution tracing (16.3), distributed tracing |
| Profiles | which code the resources went into | aggregation of where resources go | pprof (16.5) |
Each does its own job, yet they connect to one another: metrics handle continuous monitoring and alert when a trend goes abnormal; traces answer where one specific request is slow; profiles handle attributing some resource consumption to code. Monitoring alerts find the heap is growing, you switch to an allocation profile to locate which code is allocating, then use a trace to see where GC stalls within one specific request, a typical “from surface to point” diagnostic path.
Beyond the three pillars there is usually a fourth category, logs. Since Go 1.21 the standard library’s log/slog
(7.3) provides structured logging, turning a log from a single string line into a key-value record that can be filtered, aggregated, and alerted on by
field, filling in exactly the piece of “recording the context of discrete events.” Worth pointing out: Go provides
all four of these infrastructures either built in or via official packages: metrics in runtime/metrics and expvar, tracing in
runtime/trace, profiling in runtime/pprof, and logging in log/slog. A Go service has fairly deep
self-observation ability out of the box, with no heavy dependence on external APM probes. This also explains why Go is especially suited to writing server programs that need to run for a long time and need
to be watched continuously by operators: observability is not bolted on after the fact, but a capability the runtime and standard library prepared from the start.
Further Reading
- The Go Authors. Package runtime/metrics. https://pkg.go.dev/runtime/metrics
(the key-value metric interface,
Sample/Value/Float64Histogram,Alland the complete metric list) - Michael Knyszek. Proposal: API for unstable runtime metrics (#37112). 2020.
https://github.com/golang/go/issues/37112 (the design motivation for
runtime/metricsand the case for replacingMemStats) - The Go Authors. Package expvar. https://pkg.go.dev/expvar
(the
/debug/varsJSON endpoint, publishingmemstatsandcmdlineby default) - The Go Authors. Package runtime, type MemStats. https://pkg.go.dev/runtime#MemStats (the old fixed-field memory statistics and its stop-the-world read semantics)
- Prometheus Authors. Instrumenting a Go application / client_golang.
https://prometheus.io/docs/guides/go-application/ ,
https://github.com/prometheus/client_golang (the
/metricsendpoint and the runtime metrics collector) - The Go Authors. Package log/slog. https://pkg.go.dev/log/slog (structured logging)
- This book: 12.8 Memory Statistics, 13.4 GC Pacing, 16.3 Performance Tracing, 16.5 Benchmarking and Profiling, 7.3 Error Formatting and Context.