┏━━━━━━━━━━━━━━━━━━━━━━━━┓

┗━━━━━━━━━━━━━━━━━━━━━━━━┛
(Possibly) Naïve thoughts regarding Go.

# A Concurrent-safe Centralized Pointer Managing Facility

PV/UV:/ #Go #Cgo #Handle #Non-Moving GC #Escaping

Author(s): Changkun Ou

In the Go 1.17 release, we contributed a new cgo facility runtime/cgo.Handle in order to help future cgo applications better and easier to build concurrent-safe applications while passing pointers between Go and C. This article will look at the feature by asking what the feature offers to us, why we need such a facility, and how exactly we contributed to the implementation eventually.

## Starting from Cgo and X Window Clipboard

Cgo is the de facto approach to interact with the C facility in Go. Nevertheless, how often do we need to interact with C in Go? The answer to the question depends on how much we work on the system level or have to utilize a legacy C library, such as for image processing. Whenever a Go application needs to use the legacy from C, it needs to import a sort of C dedicated package as follows:

1/*
2#include <stdio.h>
3
4void myprint() {
5	printf("Hello %s", "World");
6}
7*/
8import "C"


Then on the Go side, one can simply call the myprint function through the imported C symbol:

1func main() {
2	C.myprint() // Hello World
3}


A few months ago, while we were working on building a new package golang.design/x/clipboard, we found out that there is a lacking of the facility in Go, although the variety of approaches in the wild is there, still suffering from soundness and performance issues.

In the golang.design/x/clipboard package, we had to cooperate with cgo to access system level APIs (well, technically, it is an API from a legacy but widely used C system), but lacking the facility of knowing the execution progress on the C side. For instance, on the Go side, we have to call the C code in a goroutine, then do something else in parallel:

1go func() {
2  C.doWork() // do stuff on C side
3}()
4
5// .. do stuff on Go side ..


However, under certain circumstances, we need a sort of mechanism to understand the execution progress from the C side, which brings communication and synchronization between the Go and C. For instance, if we need our Go code to wait until the C side code finishes some initialization work, then proceed, we need precisely this type of communication.

A real example that we encountered was the need to interact with the clipboard facility. In Linux's X window environment, clipboards are decentralized and can only be owned by each application. The ones who need access to clipboard information required to create their clipboard instance. Say an application A wants to paste something into the clipboard, it has to request to the X window server, then become a clipboard owner to send the information back to other applications whenever they send a copy request.

This design was considered natural and often required applications to cooperate: If an application B tries to request become the next owner of the clipboard, then A will lost its ownership, then the copy requests from application C, D, etc., will be forwarded to the application B. Similar to a shared region of memory being overwritten by somebody else.

With the above context information, one can understand that before an application starts to “paste” (serve) the clipboard information, it first obtains the clipboard ownership. Until we get the ownership, the clipboard information will not be available. In other words, if a clipboard API is designed in the following way:

1clipboard.Write("some information")


We have to guarantee from its inside that when the function returns, the information should be available to be accessed.

Back then, our first idea to deal with the problem was to pass a channel from Go to C, then send a value through the channel from C to Go. After a quick research, we realized that it is impossible because channels cannot be passed as a value between C and Go, even there is a way to pass the entire channel value to the C, there will be no method for sending values through the channel on the C side.

The next idea was to pass a function callback, then get it called on the C side. The function's execution will use the desired channel to send a notification back to the waiting goroutine.

After a few attempt, we found that the only possible way is to attach a global function pointer and gets it called through a function wrapper:

 1/*
2int myfunc(void* go_value);
3*/
4import "C"
5
6// This funcCallback tries to avoid a runtime panic error when directly
7// pass it to Cgo because it violates the pointer passing rules:
8//
9//   panic: runtime error: cgo argument has Go pointer to Go pointer
10var (
11	funcCallback   func()
12	funcCallbackMu sync.Mutex
13)
14
15type gocallback struct{ f func() }
16
17func main() {
18	go func() {
19		ret := C.myfunc(unsafe.Pointer(&gocallback{func() {
20			funcCallbackMu.Lock()
21			f := funcCallback // must use a global function variable.
22			funcCallbackMu.Unlock()
23			f()
24		}}))
25		// ... do work ...
26	}()
27	// ... do work ...
28}


In above, the gocallback pointer on the Go side is passed through the C function myfunc. On the C side, there will be a call using go_func_callback that being called on the C, via passing the struct gocallback as a parameter:

1// myfunc will trigger a callback, c_func, whenever it is needed and pass
2// the gocallback data though the void* parameter.
3void c_func(void *data) {
4	void *gocallback = userData;
5	// the gocallback is received as a pointer, we pass it as an argument
6	// to the go_func_callback
7	go_func_callback(gocallback);
8}


The go_func_callback knows its parameter is typed as gocallback. Thus a type casting is safe to do the call:

1//export go_func_callback
2func go_func_callback(c unsafe.Pointer) {
3	(*gocallback)(c).call()
4}
5
6func (c *gocallback) call() { c.f() }


The function f in the gocallback is exactly what we would like to call:

1func() {
2	funcCallbackMu.Lock()
3	f := funcCallback // must use a global function variable.
4	funcCallbackMu.Unlock()
5	f()               // get called
6}


Note that the funcCallback must be a global function variable. Otherwise, it is a violation of the cgo pointer passing rules. An immediate reaction to the above code is that it is too complicated. Moreover, the demonstrated approach can only assign one function at a time, which is also a violation of the concurrent nature. Any per-goroutine dedicated application will not benefit from this approach because they need a per-goroutine function callback instead of a single global callback. By then, we wonder if there is a better and elegant approach to deal with it.

This need occurs quite often and also had proposed to offer such an in issue 37033. But luckily, such a facility is ready in Go 1.17 :)

## What is runtime/cgo.Handle?

The new runtime/cgo.Handle provides a way to pass values that contain Go pointers (pointers to memory allocated by Go) between Go and C without breaking the cgo pointer passing rules. A Handle is an integer value that can represent any Go value. A Handle can be passed through C and back to Go, and the Go code can use the Handle to retrieve the original Go value. The final API design looks like this:

 1package cgo
2
3type Handle uintptr
4
5// NewHandle returns a handle for a given value.
6//
7// The handle is valid until the program calls Delete on it. The handle
8// uses resources, and this package assumes that C code may hold on to
9// the handle, so a program must explicitly call Delete when the handle
10// is no longer needed.
11//
12// The intended use is to pass the returned handle to C code, which
13// passes it back to Go, which calls Value.
14func NewHandle(v interface{}) Handle
15
16// Value returns the associated Go value for a valid handle.
17//
18// The method panics if the handle is invalid.
19func (h Handle) Value() interface{}
20
21// Delete invalidates a handle. This method should only be called once
22// the program no longer needs to pass the handle to C and the C code
23// no longer has a copy of the handle value.
24//
25// The method panics if the handle is invalid.
26func (h Handle) Delete()


As we can see: cgo.NewHandle returns a handle for any given value; the method Handle.Value returns the corresponding value of the handle; whenever we need to delete it, one can call Handle.Delete.

The most straightforward example is to pass a string between Go and C using Handle. On the Go side:

 1package main
2/*
3#include <stdint.h> // for uintptr_t
4extern void MyGoPrint(uintptr_t handle);
5void myprint(uintptr_t handle);
6*/
7import "C"
8import "runtime/cgo"
9
10func main() {
11	s := "Hello golang.design Initiative"
12	C.myprint(C.uintptr_t(cgo.NewHandle(s)))
13	// Output: Hello golang.design Initiative
14}


The string s is passed through a created handle to the C function myprint, and on the C side:

1#include <stdint.h> // for uintptr_t
2
3// A Go function
4extern void MyGoPrint(uintptr_t handle);
5// A C function
6void myprint(uintptr_t handle) {
7	MyGoPrint(handle);
8}


The myprint passes the handle back to a Go function MyGoPrint:

1//export MyGoPrint
2func MyGoPrint(handle C.uintptr_t) {
3	h := cgo.Handle(handle)
4	s := h.Value().(string)
5	println(s)
6	h.Delete()
7}


The MyGoPrint queries the value using Handle.Value() and prints it out. Then deletes the value using Handle.Delete().

With this new facility, we can simplify the previously mentioned function callback pattern much better:

 1/*
2#include <stdint.h>
3
4int myfunc(void* go_value);
5*/
6import "C"
7
8func main() {
9
10	ch := make(chan struct{})
11	handle := cgo.NewHandle(ch)
12	go func() {
13		C.myfunc(C.uintptr_t(handle)) // myfunc will call goCallback when needed.
14		...
15	}()
16
17	<-ch // we got notified from the myfunc.
18	handle.Delete() // no need thus delete the handle.
19	...
20}
21
22//export goCallback
23func goCallback(h C.uintptr_t) {
24	v := cgo.Handle(h).Value().(chan struct{})
25	v <- struct{}
26}


More importantly, the handles allocated by cgo.NewHandle is a concurrent-safe mechanism, which means that whenever we have the handle number, we will fetch the value (if still available) anywhere without suffering from data race.

Next question: How to implement it?

## First Attempt

The first attempt was a lot complicated. Since we need a centralized way to manage all pointers in a concurrent-safe way, the quickest thing that comes to our mind was the sync.Map that maps an unique number to the desired value. Thus we can easily use a global sync.Map:

1package cgo
2
3var m = &sync.Map{}


However, we have to think about the core challenge in the problem: How to allocate a runtime-level unique ID? Since passing an integer between Go and C is easy, what could represent unique information for a given value?

The first idea is the memory address. Because every pointer or value are stored somewhere in memory, if we can have the information, it would be very easy to use as the ID of the value.

To complete this idea, we need to be a little bit cautious: Will the memory address of a living value is changed at some point? This leads to two more questions:

1. What if a value is on the goroutine stack? If so, the value will be released when the goroutine is dead.
2. Go is a garbage-collected language. What if the garbage collector moves and compacts the value to a different place? Then the memory address of the value will be changed, too.

Based on our years of experience and understanding to the runtime, we learned that the Go's garbage collector before 1.17 is always not moving. That means, if a value is living on the heap, it will not be moved to other places. With this fact, we are good with the second question. It is a little bit tricky for the first question: a value stay on the stack can be itself was written as a local variable of the goroutine. It is likely to be a value on the stack. However, the more intractable part is that compiler optimization may move values between stacks, and runtime may move the stack when the stack ran out of its size.

Naturally, we might ask: is it possible to make sure a value always be allocated on the heap instead of the stack? The answer is: Yes! If we turn it into an interface{}. Until 1.17, the Go compiler's escape analysis always marks the value that should escape to the heap if it is converted as an interface{}.

With all the knowledge above, we can write the following part of the implementation that utilizes the memory address of an escaped value:

 1// wrap wraps a Go value.
2type wrap struct{ v interface{} }
3
4func NewHandle(v interface{}) Handle {
5	var k uintptr
6
7	rv := reflect.ValueOf(v)
8	switch rv.Kind() {
9	case reflect.Ptr, reflect.UnsafePointer, reflect.Slice,
10		reflect.Map, reflect.Chan, reflect.Func:
11		if rv.IsNil() {
12			panic("cgo: cannot use Handle for nil value")
13		}
14
15		k = rv.Pointer()
16	default:
17		// Wrap and turn a value parameter into a pointer. This enables
18		// us to always store the passing object as a pointer, and helps
19		// to identify which of whose are initially pointers or values
20		// when Value is called.
21		v = &wrap{v}
22		k = reflect.ValueOf(v).Pointer()
23	}
24
25	...
26}


Note that the implementation above treats the values differently: For reflect.Ptr, reflect.UnsafePointer, reflect.Slice, reflect.Map, reflect.Chan, reflect.Func types, they are already pointers escaped to the heap, we can safely get the address from them. For the other kinds, we need to turn them from a value to a pointer and also make sure they will always escape to the heap. That is the part:

1		// Wrap and turn a value parameter into a pointer. This enables
2		// us to always store the passing object as a pointer, and helps
3		// to identify which of whose are initially pointers or values
4		// when Value is called.
5		v = &wrap{v}
6		k = reflect.ValueOf(v).Pointer()


Now we have turned everything into an escaped value on the heap. The next thing we have to ask is: what if the two values are the same? That means the v passed to cgo.NewHandle(v) is the same object. Then we will get the same memory address in k at this point.

The easy case is, of course, if the address is not on the global map, then we do not have to think but return the address as the handle of the value:

 1func NewHandle(v interface{}) Handle {
2	...
3
4	// v was escaped to the heap because of reflection. As Go do not have
5	// a moving GC (and possibly lasts true for a long future), it is
6	// safe to use its pointer address as the key of the global map at
7	// this moment. The implementation must be reconsidered if moving GC
8	// is introduced internally in the runtime.
9	actual, loaded := m.LoadOrStore(k, v)
10	if !loaded {
11		return Handle(k)
12	}
13
14	...
15}


Otherwise, we have to check the old value in the global map, if it is the same value, then we return the same address as expected:

 1func NewHandle(v interface{}) Handle {
2	...
3
4	arv := reflect.ValueOf(actual)
5	switch arv.Kind() {
6	case reflect.Ptr, reflect.UnsafePointer, reflect.Slice,
7		reflect.Map, reflect.Chan, reflect.Func:
8		// The underlying object of the given Go value already have
9		// its existing handle.
10		if arv.Pointer() == k {
11			return Handle(k)
12		}
13
14		// If the loaded pointer is inconsistent with the new pointer,
15		// it means the address has been used for different objects
16		// because of GC and its address is reused for a new Go object,
17		// meaning that the Handle does not call Delete explicitly when
18		// the old Go value is not needed. Consider this as a misuse of
19		// a handle, do panic.
20		panic("cgo: misuse of a Handle")
21	default:
22		panic("cgo: Handle implementation has an internal bug")
23	}
24}


If the existing value shares the same address with the newly requested value, this must be a misuse of the Handle.

Since we have used the wrap struct to turn everything into the reflect.Ptr type, it is impossible to have other kinds of values to fetch from the global map. If that happens, it is an internal bug in the handle implementation.

When implementing the Value() method, we see why a wrap struct beneficial:

 1func (h Handle) Value() interface{} {
2	v, ok := m.Load(uintptr(h))
3	if !ok {
4		panic("cgo: misuse of an invalid Handle")
5	}
6	if wv, ok := v.(*wrap); ok {
7		return wv.v
8	}
9	return v
10}


Because we can check when the stored object is a *wrap pointer, which means it was a value other than pointers. We return the value instead of the stored object.

Lastly, the Delete method becomes trivial:

1func (h Handle) Delete() {
2	_, ok := m.LoadAndDelete(uintptr(h))
3	if !ok {
4		panic("cgo: misuse of an invalid Handle")
5	}
6}


See a full implementation in golang.design/x/clipboard/internal/cgo.

## The Accepted Approach

As one can see, the previous approach is more complicated than expected: it relies on the foundation that runtime garbage collector is not a moving garbage collector, and an argument though interfaces will escape to the heap.

Although several other places in the internal runtime implementation rely on these facts, such as the channel implementation, it is still a little over-complicated than what we expected.

Notably, the previous NewHandle actually behaves to return a unique handle when the provided Go value refers to the same object. This is the core that brings the complexity of the implementation. However, we have another possibility: NewHandle always returns a different handle, and a Go value can have multiple handles.

Do we really need to Handle to be unique and keep it satisfy idempotence? After a short discussion with the Go team, we share the consensus that for the purpose of a Handle, it seems unnecessary to keep it unique for the following reasons:

1. The semantic of NewHandle is to return a new handle, instead of a unique handle;
2. The handle is nothing more than just an integer and guarantee it to be unique may prevent misuse of the handle, but it cannot always avoid the misuse until it is too late;
3. The complexity of the implementation.

Therefore, we need to rethink the original question: How to allocate a runtime-level unique ID?

In reality, the approach is more manageable: we only need to increase a number and never stop. This is the most commonly used approach for unique ID generation. For instance, in database applications, the unique id of a table row is always incremental; in Unix timestamp, the time is always incremental, etc.

If we use the same approach, what would be a possible concurrent-safe implementation? With sync.Map and atomic, we can produce code like this:

 1func NewHandle(v interface{}) Handle {
2	h := atomic.AddUintptr(&handleIdx, 1)
3	if h == 0 {
4		panic("runtime/cgo: ran out of handle space")
5	}
6
7	handles.Store(h, v)
8	return Handle(h)
9}
10
11var (
12	handles   = sync.Map{} // map[Handle]interface{}
13	handleIdx uintptr      // atomic
14)


Whenever we want to allocate a new ID (NewHandle), one can increase the handle number handleIdx atomically, then the next allocation will always be guaranteed to have a larger number to use. With that allocated number, we can easily store it to a global map that persists all the Go values.

The remaining work becomes trivial. When we want to use the handle to retrieve the corresponding Go value back, we access the value map via the handle number:

1func (h Handle) Value() interface{} {
2	v, ok := handles.Load(uintptr(h))
3	if !ok {
4		panic("runtime/cgo: misuse of an invalid Handle")
5	}
6	return v
7}


Further, if we are done with the handle, one can delete it from the value map:

1func (h Handle) Delete() {
2	_, ok := handles.LoadAndDelete(uintptr(h))
3	if !ok {
4		panic("runtime/cgo: misuse of an invalid Handle")
5	}
6}


In this implementation, we do not have to assume the runtime mechanism but just the language. As long as the Go 1 compatibility keeps the promise sync.Map to work, there will be no need to rework the whole Handle design. Because of its simplicity, this is the accepted approach (see CL 295369) by the Go team.

Aside from a future re-implementation of sync.Map that optimizes parallelism, the Handle will automatically benefit from it. Let us do a final benchmark that compares the previous method and the current approach:

 1func BenchmarkHandle(b *testing.B) {
2	b.Run("non-concurrent", func(b *testing.B) {
3		for i := 0; i < b.N; i++ {
4			h := cgo.NewHandle(i)
5			_ = h.Value()
6			h.Delete()
7		}
8	})
9	b.Run("concurrent", func(b *testing.B) {
10		b.RunParallel(func(pb *testing.PB) {
11			var v int
12			for pb.Next() {
13				h := cgo.NewHandle(v)
14				_ = h.Value()
15				h.Delete()
16			}
17		})
18	})
19}

name                     old time/op  new time/op  delta
Handle/non-concurrent-8  407ns ±1%    393ns ±2%   -3.51%  (p=0.000 n=8+9)
Handle/concurrent-8      768ns ±0%    759ns ±1%   -1.21%  (p=0.003 n=9+9)


Simpler, faster, why not?

## Conclusion

This article discussed the newly introduced runtime/cgo.Handle facility coming in the Go 1.17 release that we contributed. The Handle facility enables us to pass Go values between Go and C back and forth without breaking the cgo pointer passing rules. After a short introduction to the usage of the feature, we first discussed a first attempt implementation based on the fact that the runtime garbage collector is not a moving GC and the escape behavior of interface{} arguments. After a few discussions of the ambiguity of the Handle semantics and the drawbacks in the previous implementation, we also introduced a straightforward and better-performed approach and demonstrated its performance.

As real-world demonstration, we have been using the mentioned two approaches in two of our released packages for quite a long time: golang.design/x/clipboard and golang.design/x/hotkey before in their internal/cgo package. We are looking forward to switch to the officially released runtime/cgo package in the Go 1.17 release.

For future work, one can foresee that a possible limitation in the accepted implementation is that the handle number may run out of the handle space very quickly in 32-bit or lower operating systems (similar to Year 2038 Problem. When we allocate 100 handles per second, the handle space can run out in 0xFFFFFFF / (24 * 60 * 60 * 100) = 31 days).

*If you are interested and think this is a serious issue, feel free to CC us when you send a CL, it would also interesting for us to read your excellent approach.