What makes Julia delightful, cont'd?
Or how I Finally Dipped my Toes into Macros
I explored the unreasonable effectiveness of Julia’s code dispatch in my previous article by using it to interpret PCRE’s internal representation. Dispatching on “value types” is a great fit for binary encoding and the whole affair was entirely pleasant.
I also started using multiple dispatch to match regular expressions against text but realized that
Base.Fix2 was just as readable. With that out of the way, I
could finally focus on my real target: JIT compilation! For some reason, my first thought was to interact
with LLVM directly. The llvmcall
examples in test/llvmcall.jl
looked frankly daunting and I quickly hoped for an easier way. The chapter on
metaprogramming provided an answer.
I am trying to generate a specialized matching function from the dictionary of opcodes which represent the
regular expression.
Once I got my head around quoting and interpolation, the function which generates the specialized matcher is only thrice the length of the generic matcher. And the most satisfyingly impressive part of it is that section spotted in the output of @code_llvm:
; @ /home/jburgy/blog/fun/regexp.jl:175 within `#42'
; ┌ @ int.jl:462 within `>>' @ int.jl:455
%47 = ashr i64 %value_phi7, 2
; â””
; @ /home/jburgy/blog/fun/regexp.jl:161 within `#42'
switch i64 %47, label %L405 [
i64 0, label %L101
i64 5, label %L150
i64 6, label %L174
i64 13, label %L223
i64 11, label %L230
i64 16, label %L258
i64 18, label %L286
i64 21, label %L341
i64 3, label %L369
i64 23, label %L397
]
Even though the generated code is a concatenation of conditional expressions, LLVM is “smart enough to lower the branch into a switch” as explained in this discourse answer. Using @code_native to dig even deeper, LLVM compiles to
; │ @ regexp.jl:175 within `#42'
; │┌ @ int.jl:462 within `>>' @ int.jl:455
movq %r13, %rax
sarq $2, %rax
; │└
; │ @ regexp.jl:161 within `#42'
cmpq $23, %rax
ja L1360
; │ @ regexp.jl within `#42'
movabsq $.rodata, %rcx
jmpq *(%rcx,%rax,8)
Does this last line ring a bell? Well, if it isn’t the familiar NEXT macro of threaded code frame! I’ll be darned!
52b6b1d is the commit that made this all work. The first half of it might not even be necessary but I simply couldn’t figure out how to recover the Expr from a Function so I grabbed those expressions before turning them into functions. Easy enough but I’d welcome a solution which didn’t require generating those functions as well.
You’ve got to admit, that’s pretty cute, isn’t it.
9/10/23 Update: Pretty cute but not cute enough. The way 52b6b1d dissected the matchers into bodies and argument lists bothered me because it hurts readability. Granted, this code was never particularly readable to begin with. All the more reason to mitigate noise. This bothered me so much that I eventually asked a question on https://discourse.julialang.org/. As is often the case, asking the question forced me to pinpoint what really bugged me so I could address it with 0e00457. As a result, 0e00457/fun/regexp.jl looks much more like e3e5bc7/fun/regexp.jl with the extra
matchers = quote
# all function definitions
end
exprs = Dict{Symbol,Tuple{Expr,Vararg{Expr}}}(
expr.args[1].args[1] => (expr.args[2], expr.args[1].args[2:end-4]...)
for expr in matchers.args if is_expr(expr, :function)
)
eval(matchers)
In other words, define the matchers inside a single
quote
instead of dissecting them.
This makes it possible
to eval
them and
access their names, parameters, and bodies.