What makes Julia delightful?
Or how I Learned to Stop Worrying and Love the JIT
As my previous post shows, my early programming had to do with numerical problems. So when I left graduate school to enter the industry, I read books and articles to pick up new skills. K&R, TAOCP, and Bentleyâs Programming Pearls made big impressions. Then I heard about an article where Ken Thompson explained how he implemented the original grep command as a JIT compiler on the IBM 7094 and I had to know more.
I found the PDF online (let me not digress into a rant about paywalls and academic journals) and slowly worked my
way through it. I even reached out to Ken when I couldnât figure out some assembly syntax that might be central
to the paper. Ever the gentleman, Ken took time out of his busy MiG-29 flying schedule to answer esoteric
questions from a nobody. Specifically, I couldnât find any explanation for the AXC **,7
syntax in the IBM 7094
instruction manual. Ken was so kind and patient and Iâm truly sad that I have lost access to the email address
I was using at the time. Russ Cox has a great
IBM 7094 cheat sheet on his website which explains that â** is just
0 to the assembler, but by convention denotes a values that is manipulated at run-time (i.e., in self-modifying
code) by instruction such as PAC, PCA, and SCA.â This is part of a set of pages about
regular expressions.
I was so intrigued by the fact that Kenâs âimplementation of this algorithm is a compiler that translates a regular expression into IBM 7094 code. [âŠ] In the compiled code, the lists mentioned in the algorithm are not characters, but transfer instructions into the compiled code.â Pretty wild, uh. Search algorithms like Boyer-Moore typically keep track of offsets into the pattern being searched for instead. Translating the original IBM 7094 into x86 taught me assembly conventions like instruction encoding, immediate operands, etc. The result is a buggy little hack which you can find on Russ Coxâs website. Russ did report that âit runs for a very long time, making me think it is stuck in an infinite loop somewhere. I havenât looked at it closely.â To be honest, I havenât enough because I started my first Wall Street job around that time and hobby programming took a back seat.
Fast forward almost twenty years. I probably have the 10,000 hours Malcolm Gladwell conjectures are required to master programming. I wish I could say computers looks to me like the Matrix to Neo in that final fight scene but Iâd be lying. I have, however, reached the point where I can read code and focus on intent rather than language specifics. Jumping around between languages instead of becoming an expert in any one of them will do that. I have picked up Julia recently because of the (well deserved) hype around it and Iâm a big fan. Julia is known to be implemented as a JIT compiler which, in theory, makes it a very suitable environment to play around with Thomsonâs article once more. Let us see how far we get!
The original article explains that the âcompiler consists of three concurrently running stages. The first stage is a syntax sieve that allows only syntactically correct regular expressions to pass. This stage also inserts the operator â·â for juxtaposition of regular expressions. The second stage converts the regular expression to reverse Polish form. The third stage is the object code producer. The first two stages are straightforward and not discussed.â That description is entirely fair. An immense body of literature describes lexical analysis and parsing. Most of it feels like a chore, âfeeling that itâs all a lot of oysters, but no pearlsâ. So how do we turn this mundane task into a learning opportunity? Well, by using it as an excuse to explore Juliaâs foreign function interface of course!
Juliaâs batteries already include regular expressions
which are implemented by ccalling https://www.pcre.org. Surely PCRE
first converts the regular expression into an intermediate representation. Let us find and understand that instead of
rolling our own! Julia gives us 2 ways to access the underlying PCRE struct
: either via the :regex
attribute of
regex objects or by invoking Base.PCRE.compile
directly. The latter approach guarantees weâre calling pcre2_compile
instead of pcre2_jit_compile so letâs try that:
julia> Base.PCRE.compile("a(b|c)*d", 0)
Ptr{Nothing} @0x0000000002950e00
Great, we have a raw C pointer to a pcre2_code
struct, now what? We could read the
PCRE source and guess field offsets. That sounds tedious and not particularly fun.
Fortunately, I noticed two things when I started doing exactly that:
/* Find start of compiled code */
code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_entry_size * re->name_count;
in pcre2_study.c and
case PCRE2_INFO_NAMETABLE:
*((PCRE2_SPTR *)where) = (PCRE2_SPTR)((char *)re + sizeof(pcre2_real_code));
break;
in pcre2_pattern_info.c.
pcre2_pattern_info is conveniently exposed as Base.PCRE.info
so we can âfind start of compiled codeâ in Julia with
using Base.PCRE: INFO_NAME_COUNT, INFO_NAMENTRYSIZE, INFO_NAMETABLE, info
name_count = info(ptr, INFO_NAMECOUNT, UInt32)
name_entry_size = info(ptr, INFO_NAMEENTRYSIZE, UInt32)
info(ptr, INFO_NAMETABLE, Ptr{UInt8}) => (name_count * name_entry_size + 1)
At this point, we have the absolute address of the beggining of the PCRE bytecode stream. We need a little more logic to decode its contents. Fortunately, Philip Hazel included a handy text document named HACKING. This document begins with some historical context then goes on to describe PCREâs internals. We quickly learn that the âcompiled form of a pattern is a vector of unsigned code units (bytes in 8-bit mode, shorts in 16-bit mode, 32-bit words in 32-bit mode), containing items of variable length. The first code unit in an item contains an opcode, and the length of the item is either implicit in the opcode or contained in the data that follows it.â [emphasis added]
Cool, so PCRE compiled patterns are variable length, a little like UTF-8. Whatâs the most idiomatic way to decode it in Julia? It took me a couple attempts to realize that Juliaâs multiple dispatch and âvalue typesâ lend themselves particularly well to this problem:
using Base: Fix1
const OpChar = Val{0x1d}
const OpAlt = Val{0x78}
const OpKet = Val{0x79}
const OpBra = Val{0x86}
link(ptr::Ptr{UInt8}, i::Int) =
UInt16(unsafe_load(ptr, i) << 8) | UInt16(unsafe_load(ptr, i + 1))
opcode(::OpChar, ptr::Ptr{UInt8}, i::Int) = Fix1(char!, Char(unsafe_load(ptr, i)))
opcode(::OpAlt, ptr::Ptr{UInt8}, i::Int) = Fix1(alt!, link(ptr, i) << 2)
opcode(::OpKet, ptr::Ptr{UInt8}, i::Int) = Fix1(ket!, link(ptr, i) << 2)
opcode(::OpBra, ptr::Ptr{UInt8}, i::Int) = Fix1(bra!, link(ptr, i) << 2)
Juliaâs iteration interface can also be
used to produce opcodes lazily. The OpCodes
iterator returns Int => Fix1
pairs similar to Base.Dict
because
link
arguments represent relative jumps. Youâre probably wondering what makes the left shift by 2 necessary. We
will get to it in a minute, I promise. Let us first turn our attention to the the inner core of the matching
engine:
function match(opcodes::Dict{Int,Function}, string::String)
curr = [2]
next = empty(curr)
for char â string * "\0"
for i â curr
opcodes[i>>2](curr, next, char, i) && return true
end
copy!(curr, next)
empty!(next)
end
false
end
Code like this is sometimes called call-threaded.
Following Thompsonâs article, two lists are maintained (named curr
and next
instead of CLIST
and NLIST
).
curr
and next
contain integers, not âTSX **,2 instructionsâ and, because of PCREâ2 internal representation,
these integers represent both the next instruction (think program counter)
as well as information about the match so far packed into their two least significant bits. The least significant bit
tells us whether this is a matching path. The next bit tells us to treat the next instruction as an
Δ-move. PCRE2 compiles the
Kleene star in "(ab)*"
as
OrderedDict{Int64, Function} with 7 entries:
0 => Fix1(bra!, 16 << 2)
3 => brazero!
4 => Fix1(cbra!, 9 << 2) # jump to 13
9 => Fix1(char!, 'a')
11 => Fix1(char!, 'b')
13 => Fix1(ketrmax!, 9 << 2) # jump to 4
16 => Fix1(ket!, 16 << 2)
The key difference between this and the simpler "(ab)"
is the brazero!
opcode at index 3
. It indicates that the
following group can be leap-frogged entirely since the Kleene star matches 0 or more times. All of the clever bits can
now be packed in the definitions of alt!
, char!
, bra!
, and friends, see here.
To summarize, Juliaâs foreign function interface and multiple dispatch let us explore PCRE2âs internal representation and implement a bare bones call-threaded matching engine. Juliaâs metaprogramming support should let us transform that call-threaded code to direct-thread code, essentially unrolling the âthreadâ of opcodes to get another step closer to Kenâs original article.