Allocgate is coming in Zig 0.9, and you will have to change your code
Version 0.9 of the Zig programming language is planned to release in about a
week. One of the major changes coming in this release is a restructuring of how allocators work,
which has been termed Allocgate. This is a breaking
change to the Allocator
API provided by the standard library, so pretty much any code that uses
allocators will have to be updated in response. I hope to use this article to explain the
justification for this change and what steps you need to take to adjust your code.
The TL;DR is:
- Where you previously passed around
*Allocator
to functions that need to allocate, you now pass aroundAllocator
directly.Allocator
is now a struct, which is the size of two pointers. - When you construct an allocator (such as
var gpa: std.heap.GeneralPurposeAllocator(.{});
), you callgpa.allocator()
to get the type-erased allocator implementation rather than writing&gpa.allocator
.
Unless you are writing your own allocator, this should be all you need to change.
Two ways to approach polymorphism
Zig’s allocators rely on dynamic dispatch. The choice of allocation and deallocation functions is
not known until runtime, so if you want to write code that is agnostic to the user’s choice of
allocator (and you generally do when writing library code), you need to accept these functions as a
parameter in some way. Most languages use a structure called a virtual function table (vtable)
which stores the addresses of each dynamically-known function. Zig’s standard library provides a
variety of allocators (such as ArenaAllocator
and GeneralPurposeAllocator
), each of which has
some associated state along with a set of functions (alloc
, resize
, and free
) which operate on
that state to work as a memory allocator. The addresses of these three functions are collected into
an “allocator vtable”.
Any polymorphic object’s state needs to be passed along with its vtable somehow. One way to do it is
to represent an object as a pair of pointers: one to the object’s fields, and one to a vtable which
knows how to operate on those fields. I’ll follow Rust’s terminology and call this (impl, vtable)
pair a “fat pointer”.
Dynamic Object
---------- ----------
| impl | -----> | fields |
|--------| | ... |
| vtable | ---. ----------
---------- |
| Vtable
| ----------
`-> | alloc |
| resize |
| free |
----------
Another way would be to store a separate copy of the vtable in every object. By knowing its address,
the vtable’s functions would be able to deduce the address of the struct containing them, an
operation sometimes called container_of
in C and provided by Zig under the name @fieldParentPtr
.
Indeed, this is such a common use of this builtin that the technique is commonly described as just
“the @fieldParentPtr
idiom.”
Object
--------------
| fields |
Dynamic | ... |
---------- | ---------- |
| vtable | -------> | alloc | |
---------- | | resize | |
| | free | |
| ---------- |
| ... |
--------------
(We could also imagine a third solution in which the object contains a pointer to the vtable in its fields, and we pass around a pointer to that pointer. This has the advantage of sharing vtables, while still requiring only a single pointer to be passed around. However, this also means we would need a double indirection to access the vtable, which would be more expensive.)
Polymorphic allocators in Zig
Without stooping so low as to actually benchmark, what might the respective advantages of these two approaches be?
- Using fat pointers, we get to make fewer copies of the vtable. Every polymorphic value can point to the same one, which saves memory and allows dynamic objects to be smaller.
- With the
@fieldParentPtr
approach, we’re able to pass a dynamic object around as just a single pointer. This improves performance and saves memory for any struct that needs to hold a dynamic allocator (notably, moststd
containers).
In the case of allocators, it might seem like the @fieldParentPtr
approach would win out. After
all, the average program passes relatively many pointers to relatively few different allocators, so
making the latter large at the expense of keeping the former small is a sensible trade-off.
However, as it turns out, this approach has a performance problem with has to do with a compiler
optimization known as devirtualization. Virtual function calls are more expensive than calls to
known functions for a number of reasons, so LLVM would like to rewrite them to static calls anywhere
it can prove that a function pointer always has a particular value. This becomes more difficult when
the function pointer lives inside a structure like GeneralPurposeAllocator
. Zig’s struct
s don’t
even have compile-time encapsulation, much less runtime encapsulation. There is nothing stopping you
from reaching into your GeneralPurposeAllocator
and changing the vtable functions to do something
else. It would definitely be against std
‘s contract, but it wouldn’t immediately cause undefined
behavior. This means any code which reads vtables out of the embedded Allocator
has to be
defensive against the possibility that the vtable was modified, making devirtualization impossible.
In contrast, when using fat pointers, the vtables are shared constant objects, and a new fat pointer
is constructed on every call to gpa.allocator()
. So while we incur the cost of passing around a
larger fat pointer, the allocation and deallocation functions potentially become much faster to
call.
Allocgate
As far as I can tell, the performance issues of the @fieldParentPtr
approach were first noticed
about two months ago in the std.rand
API, which
uses a similar approach to allow code to be polymorphic over the caller’s choice of RNG algorithm
(perhaps the performance issues were more apparent here because RNG algorithms are designed to have
high throughput, so the relative overhead of indirect calls is larger). It was noted in that issue
that any other code using this approach would suffer the same problem.
With Allocgate, the std.mem.Allocator
API was also changed from embedded vtables to fat pointers.
This is the reason behind the two API changes I mentioned at the beginning of the article:
Allocator
was changed from holding the vtable itself to being a fat pointer struct, and you call
gpa.allocator()
instead of &gpa.allocator
because the vtable is no longer a field of the
allocator and you instead need to construct a fat pointer.
This also happens to address a specific failure mode I have run into before, where instead of writing
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const alloc = &gpa.allocator;
the user writes:
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
var alloc = gpa.allocator;
This copies the allocator out of its parent struct and then attempts to call methods like
alloc.alloc()
. This will generally compile correctly (since Zig will automatically take the
address of an object when calling a method that takes self
by pointer), but causes undefined
behavior at runtime because alloc
attempts to use @fieldParentPtr
despite no longer being
contained in the struct it expects to be in. Post-Allocgate, this code instead becomes:
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const alloc = gpa.allocator();
… which is pretty much impossible to silently get wrong.
What’s next for vtable-based polymorphism?
Allocgate has been merged into Zig’s master branch and should ship in Zig 0.9. This involved an impressive amount of work, with much of the standard library needing to be changed. It should hopefully yield better performance for code that uses any of the standard library’s allocator abstractions. Zig’s contributors have gone back and forth on the specifics of the change, and looked at a number of benchmarks, but I’m sure they would welcome additional data on this.
I think this is a great example of the kind of ecosystem-wide improvements Zig can make as a relatively young language that has yet to tie itself to any stability guarantees. It sucks for language users who will have to change their code, of course, but hopefully they knew what they were getting into. One of Zig’s principles is “avoid local maxima,” which, at this point in the language’s evolution, means to be willing to move to a better solution even if it means breaking backwards compatibility.
A number of topics are also being discussed that have the potential to further improve the performance and ergonomics of runtime polymorphism. For example, it could be that improvements to Zig or LLVM’s aliasing model will make it easier to optimize virtual function calls (for example, by making it possible to mark a particular field of a struct as immutable). Zig’s developers have also expressed some interest in having first-class support for interfaces in the standard library or even the language itself, which would make it easier for the user and reduce the burden of having to modify every API in the standard library when changes like this are made.
Credit for the change
I had no personal involvement in Allocgate. I am simply an outsider who is interested in Zig and thought others might benefit from an explanation of this change. Instead, I ought to credit:
- Andrew Kelley (@andrewrk), creator of Zig and initial designer of the allocator API;
- @ominitay, who first diagnosed the problem in
std.rand
and provided benchmarks; - Martin Wickham (@SpexGuy), who discovered the optimization problems
with LLVM and the need to move away from
@fieldParentPtr
-based APIs; - Lee Cannon (@leecannon, who implemented the bulk of the change to
Allocator
, including updating the entire standard library; - and everyone else who contributed feedback, bug fixes, and benchmarks on the PR.