How much can you pack into bytes?
I like Zig because it encourages me to pay attention to details when I use
it. I have recently come up with an interesting pattern in two separate use cases. I think it
warrants a quick blog post. The pattern is a curious mix of abstraction levels. On the one hand,
it’s high-level because it prefers Zig’s
std.ArrayList
over the lower level
std.mem.Allocator.
On the other, it’s low-level because it works best when you take memory alignment into account.
The TL;DR version is that I use std.ArrayList
as a special-purpose
std.heap.ArenaAllocator.
This opens a memory saving opportunity by converting pointers to indices.
Flattening ASTs
The first use case happened by of a confluence of 3 events:
- I struggled with a memory leak in a hand-written recursive descent parser
- I read Adrian Sampson’s Flattening ASTs (and Other Compiler Data Structures)
- I recalled watching Andrew Kelly’s Practical Data Oriented Design (DoD)
Although I didn’t realize it at the time, one sentence in Adrian’s post summarizes the idea:
“Arenas or regions mean many different things to different people, so I’m going to call
the specific flavor I’m interested in here data structure flattening.” My initial motivation
was that I didn’t like how a seemingly simple Node
data structure required such a complex
.destroy
method:
const Node = struct {
token: Token,
args: []const *const Node,
pub fn destroy(self: *Node, allocator: Allocator) void {
for (self.args) |arg| arg.destroy(allocator);
allocator.free(self.args);
allocator.destroy(self);
}
};
Cloudef suggested an arena but I was reluctant because I feared it would create too much coupling between a type and the allocator that creates its instances. Teralux also warned that “[f]reeing graphs is generally a slow operation.” Furthermore, echoes of Andrew’s “Data Oriented Design” talk were still brewing in my left hippocampus. I put them all together into
pub const Node = packed union {
head: packed struct(u32) { token: u24, count: u8 },
node: u32,
};
This definition replaces a copy of each token by an index into []const Token
(weighing in at a mere 3
bytes intead of 24
). It also broke up []const *const Node
into
a 1-byte count
squeezed with the token followed by indices into []const Node
. The memory
footprint of the old Node
was 10 + 8n
bytes where n = node.args.len
:
.token.tag
:8
bytes.token.src.ptr
:8
bytes.token.src.len
:8
bytes.args.ptr
:8
bytes.args.len
:8
bytes- each additional
arg: *const Node
:8
bytes
The new Node
, by contrast, only occupies 4
bytes.
That’s a 2½-fold size reduction for a node with zero arguments and 2⅙-fold for a binary node!
Of course, the main attraction is that nodes
can be freed using either nodes.deinit()
if you’re still holding a reference to the std.ArrayList
or allocator.free(nodes)
if you
have invoked .toOwnedSlice()
instead.
Hybrid bump allocator
The second use case for this idea goes back to
my first Zig project:
I initially failed to grok ArrayList.
This first project was translating a C
implementation of FORTH (which in turn
was translating an x86 implementation). As a consequence,
focus was more on how than why. Re-reading my discord question, I realize now that I lacked
the vocabulary to precisely articulate what I was trying to achieve. The original x86 was using
brk(2) and raw pointers to implement a
bump allocator which supports
individual bytes (for C!
and @
), words (for C!
and C@
), and code addresses (for ,
).
I didn’t make the effort to understand std.ArrayList
to see which parts lent themselves to my
purpose. The trick turned out to be
fn InterpAligned(comptime alignment: u29) type {
return struct {
state: isize,
latest: *Word,
s0: [*]const isize,
base: isize,
r0: [*]const [*]const Instr,
buffer: [32]u8,
memory: std.ArrayListAligned(u8, alignment), // ⇐ here
here: [*]u8,
...
};
}
const Interp = InterpAligned(@alignOf(Instr));
(Had to make it a generic
to avoid the dreaded struct Instr depends on itself
). Interp.memory.items
are a slice of
bytes for maximum flexibility but they’re Instr
-aligned which obviates the manual
alignment math in CREATE
.
Artificial constraints
Mitchell Hashimoto said something on Changelog which resonated with me: embrace constraints when programming. Programmers often have a tendency to go for the most generic solution. This leads to unnecessary complexity and anti-patterns like softcoding or Greenspun’s tenth rule. In this case, the constraint was making sure that consecutive memory allocations remain contiguous to facilitate FORTH’s poor man’s reflection. After a few detours, the constraint took me to a reasonably useful idea. Meanwhile, my code remained legible and efficient in the process.