Why are some functions foreign?
In real-world software, layers accumulate like archeological stratigraphy. I recently cooked up a little utility which ended up touching layers spanning several decades. It all started when I found the official recommendation from AWS to create a python layer underwhelming.
Per the AWS documentation,
a layer is a .zip
archive which includes a python
top-level directory. That root
directory contains the platlib
path of the relevant
virtual environment.
The AWS Lambda Developer Guide
addresses this in 3 steps:
- create a new directory named
python
(mkdir
) - copy
<venv>/lib
recursively to the newly createdpython
(cp -r
) - zip the
python
directory recursively (zip -r
)
This is objectively silly. You shouldn’t need to copy an entire folder just so you can rename it.
7-Zip lets you rename files (even folders) inside a zip archive in-place.
But that still requires two separate commands (oh, the humanity)! But hang on, the
python standard library supports the
ZIP format! And if you’re worried about
performance, fear not, python’s zlib
is a C wrapper
around this zlib.
So we can write a simple python program that looks roughly like
platlib = Path(sysconfig.get_path("platlib"))
data = Path(sysconfig.get_path("data"))
with ZipFile(args.zipfile, "w") as zf:
for root, _, files in platlib.walk():
arcroot = Path("python") / root.relative_to(data)
for file in files:
zf.write(filename=root / file, arcname=arcroot / file)
(slightly circular to package your own virtual environment but very reasonable if you think about it.) This is great but it runs for a while and gives you no sense of what it’s doing. Gotta fix that.
And this is where the story gets interesting because I decided to get cute. I didn’t want my utility
to just “vomit” a wall of text to its standard output. That’s not particularly helpful. It would be
much more helpful to update one or a few lines of output as we progress. I knew that was possible but
the details were hazy. So I read up on
Control Sequence Introducer
commands. Unfortunately, python
does not understand
"\e"
like Bash.
People often use "\033"
or "\x1b"
instead but that’s not super readable. Fear not, python
accepts "\N{escape}"
which is rather readable. With that,
print(f"\N{escape}[{n}F\N{escape}[J", end="")
F (or Cursor Previous Line) “moves the cursor to beginning of the line n (default 1) lines up” then
J (or Erase in Display) “clears part of the screen. If n is 0 (or missing), clear from cursor to end of screen.”
Printing this rather cryptic screen lets you print some a few lines (like, for example, the last n files
added to the archive), erase them, and print them more. The refresh rate on your terminal is likely high
enough that your output looks animated. Furthermore, you can use
collections.deque
’s maxlen
parameter to easily keep track of those last n files. That’s quite nice but is it nice enough?
Actually, one can use box-drawing characters to
make those few lines look like the output of tree
.
That lets you shrink the width of the output since virtual environments nest, leading
to long paths. (Funny side note, my initial implementation was not always clearing the screen
properly because it didn’t account for line wraps requiring clearing more lines than printed). At
that point, I remembered that building zig generated precisely that kind of “scrolling tree” output.
A quick web search took me to
Zig’s New CLI Progress Bar Explained.
Yikes, Andrew really went nuts on that “infallible and non-heap-allocating” implementation! For once,
my laziness beat out my hubris and I decided to not
reimplement Progress.zig
in python.
Instead, I decided to expose it to python.
As luck would have it, I recently explored How do you call Zig from python? With that knowledge, I quickly whipped up
const std = @import("std");
const py = @import("pydust");
const root = @This();
pub const Progress = py.class(struct {
const Self = @This();
index: std.Progress.Node.OptionalIndex,
pub fn __init__(self: *Self) !void {
self.index = std.Progress.start(.{}).index;
}
pub fn start(
self: *const Self,
args: struct { name: py.PyString, estimated_total_items: usize },
) !*const Self {
const parent: std.Progress.Node = .{ .index = self.index };
const node = parent.start(try args.name.asSlice(), args.estimated_total_items);
return py.init(root, Self, .{ .index = node.index });
}
pub fn end(self: *const Self) void {
const node: std.Progress.Node = .{ .index = self.index };
node.end();
}
pub fn complete_one(self: *const Self) void {
const node: std.Progress.Node = .{ .index = self.index };
node.completeOne();
}
});
comptime {
py.rootmodule(root);
}
based on zp.zig
(with thanks to @bridgeQiao
for removing root
where it was
not strictly needed).
Writing the code was trivial but getting meson-python to build and install it was a real trip, see mesonbuild/meson#14763. And it made CI almost 30s slower! I started wondering how much Ziggy Pydust’s comptime magic contributed to that slowdown so I ripped it out! That led me to discover and report a bug with the Zig toolchain. And the savings were barely noticeable. But, as the poet laureate of Bed-Stuy said it best: “And if you don’t know, now you know”.
Ok, cool, so what did we learn from all this? To me, the real amusing part was the relative ages of the various bits of software involved in the making of this silly utility. In chronological order:
Component | Year | Role played |
---|---|---|
Linux Signals | 1972 | Handle SIGWINCH to truncate lines and erase correctly |
Control Sequence Introducers | 1976 | Manipulate standard output beyond appending to it |
ZIP file format1 | 1989 | Because that’s what AWS chose in 2014 |
Python | 1989 | High-level and general-purpose |
Meson | 2013 | Best build backend for compiled extension |
Zig | 2016 | “programming language designed for making perfect software” |
Ziggy Pydust2 | 2023 | “Framework for building native Python extension in Zig” |
uv | 2024 | “extremely fast Python package installer and resolver” |
If anything, this proves that, in the world of software, maybe you can teach old dogs new tricks! Obviously, all of this goes back to the Von Neuman architecture from 1945 but that’s true of every software project so I chose to leave it out. The above selection is trying to enumerate choices that are particularly salient to the tool being discussed.
-
Truncating is one way to avoid line wraps but the VT100 has escape sequences to control auto wrap (
"\N{escape}[?7l"
)Â ↩ -
“It’s not the destination, it’s the journey.” ― Ralph Waldo Emerson ↩