Why are some functions foreign?

In real-world software, layers accumulate like archeological stratigraphy. I recently cooked up a little utility which ended up touching layers spanning several decades. It all started when I found the official recommendation from AWS to create a python layer underwhelming.

Per the AWS documentation, a layer is a .zip archive which includes a python top-level directory. That root directory contains the platlib path of the relevant virtual environment. The AWS Lambda Developer Guide addresses this in 3 steps:

create a new directory named python (mkdir)
copy <venv>/lib recursively to the newly created python (cp -r)
zip the python directory recursively (zip -r)

This is objectively silly. You shouldn’t need to copy an entire folder just so you can rename it. 7-Zip lets you rename files (even folders) inside a zip archive in-place. But that still requires two separate commands (oh, the humanity)! But hang on, the python standard library supports the ZIP format! And if you’re worried about performance, fear not, python’s zlib is a C wrapper around this zlib.

So we can write a simple python program that looks roughly like

platlib = Path(sysconfig.get_path("platlib"))
data = Path(sysconfig.get_path("data"))

with ZipFile(args.zipfile, "w") as zf:
    for root, _, files in platlib.walk():
        arcroot = Path("python") / root.relative_to(data)
        for file in files:
            zf.write(filename=root / file, arcname=arcroot / file)

(slightly circular to package your own virtual environment but very reasonable if you think about it.) This is great but it runs for a while and gives you no sense of what it’s doing. Gotta fix that.

And this is where the story gets interesting because I decided to get cute. I didn’t want my utility to just “vomit” a wall of text to its standard output. That’s not particularly helpful. It would be much more helpful to update one or a few lines of output as we progress. I knew that was possible but the details were hazy. So I read up on Control Sequence Introducer commands. Unfortunately, python does not understand "\e" like Bash. People often use "\033" or "\x1b" instead but that’s not super readable. Fear not, python accepts "\N{escape}" which is rather readable. With that,

print(f"\N{escape}[{n}F\N{escape}[J", end="")

F (or Cursor Previous Line) “moves the cursor to beginning of the line n (default 1) lines up” then J (or Erase in Display) “clears part of the screen. If n is 0 (or missing), clear from cursor to end of screen.” Printing this rather cryptic screen lets you print some a few lines (like, for example, the last n files added to the archive), erase them, and print them more. The refresh rate on your terminal is likely high enough that your output looks animated. Furthermore, you can use collections.deque’s maxlen parameter to easily keep track of those last n files. That’s quite nice but is it nice enough?

Actually, one can use box-drawing characters to make those few lines look like the output of tree. That lets you shrink the width of the output since virtual environments nest, leading to long paths. (Funny side note, my initial implementation was not always clearing the screen properly because it didn’t account for line wraps requiring clearing more lines than printed). At that point, I remembered that building zig generated precisely that kind of “scrolling tree” output. A quick web search took me to Zig’s New CLI Progress Bar Explained. Yikes, Andrew really went nuts on that “infallible and non-heap-allocating” implementation! For once, my laziness beat out my hubris and I decided to not reimplement Progress.zig in python. Instead, I decided to expose it to python.

As luck would have it, I recently explored How do you call Zig from python? With that knowledge, I quickly whipped up

const std = @import("std");
const py = @import("pydust");

const root = @This();

pub const Progress = py.class(struct {
    const Self = @This();
    index: std.Progress.Node.OptionalIndex,

    pub fn __init__(self: *Self) !void {
        self.index = std.Progress.start(.{}).index;
    }

    pub fn start(
        self: *const Self,
        args: struct { name: py.PyString, estimated_total_items: usize },
    ) !*const Self {
        const parent: std.Progress.Node = .{ .index = self.index };
        const node = parent.start(try args.name.asSlice(), args.estimated_total_items);
        return py.init(root, Self, .{ .index = node.index });
    }

    pub fn end(self: *const Self) void {
        const node: std.Progress.Node = .{ .index = self.index };
        node.end();
    }

    pub fn complete_one(self: *const Self) void {
        const node: std.Progress.Node = .{ .index = self.index };
        node.completeOne();
    }
});

comptime {
    py.rootmodule(root);
}

based on zp.zig (with thanks to @bridgeQiao for removing root where it was not strictly needed).

Writing the code was trivial but getting meson-python to build and install it was a real trip, see mesonbuild/meson#14763. And it made CI almost 30s slower! I started wondering how much Ziggy Pydust’s comptime magic contributed to that slowdown so I ripped it out! That led me to discover and report a bug with the Zig toolchain. And the savings were barely noticeable. But, as the poet laureate of Bed-Stuy said it best: “And if you don’t know, now you know”.

Ok, cool, so what did we learn from all this? To me, the real amusing part was the relative ages of the various bits of software involved in the making of this silly utility. In chronological order:

Component	Year	Role played
Linux Signals	1972	Handle SIGWINCH to truncate lines and erase correctly
Control Sequence Introducers	1976	Manipulate standard output beyond appending to it
ZIP file format¹	1989	Because that’s what AWS chose in 2014
Python	1989	High-level and general-purpose
Meson	2013	Best build backend for compiled extension
Zig	2016	“programming language designed for making perfect software”
Ziggy Pydust²	2023	“Framework for building native Python extension in Zig”
uv	2024	“extremely fast Python package installer and resolver”

If anything, this proves that, in the world of software, maybe you can teach old dogs new tricks! Obviously, all of this goes back to the Von Neuman architecture from 1945 but that’s true of every software project so I chose to leave it out. The above selection is trying to enumerate choices that are particularly salient to the tool being discussed.

Truncating is one way to avoid line wraps but the VT100 has escape sequences to control auto wrap ("\N{escape}[?7l") ↩
“It’s not the destination, it’s the journey.” ― Ralph Waldo Emerson ↩