Exploring JIT execution of luajit
#
I’ve always been impressed with the LuaJIT runtime. For
years, Mike Pall has been praised (and rightfully so) for creating it. It is the
fastest runtime for a dynamic language.
I’ve taken an interest in experimenting with it.
It can optimize dynamic code to efficient assembly instructions.
For example, we are calculating the sum of 1 to 100.
local sum = 0
for i = 1, 100 do
sum = sum + i
end
It generates efficient assembly. The assembly using luajit -jdump
. This
outputs a lot, but our interest lies in the LOOP
scope.
->LOOP:
xorps xmm6, xmm6
cvtsi2sd xmm6, ebp
addsd xmm7, xmm6 # sum = sum + i
add ebp, +0x01 # i = i + 1
cmp ebp, +0x64 # if i <= 100
jle ->LOOP # goto LOOP
jmp ->EXIT
The numbers assigned to sum
and i
are used as typed information. These types
don’t change during the execution of the script, and this allows the JIT to make
efficient assembly instructions.
LuaJIT (like Lua) assumes all numbers are floating-point, though. The
instruction cvtsi2sd
converts integers to floating-point. Though this may be
fast, it is an extraneous instruction.
I wanted to see if LuaJIT could just do integer-specific instructions by
providing hints to the JIT. Lua doesn’t have a static type system, though.
local ffi = require("ffi")
ffi.cdef [[
typedef struct { int v; } int_t;
]]
local int = ffi.metatype("int_t", {})
local sum = int(0)
for i = 1, 100 do
sum.v = sum.v + i
end
->LOOP:
add ebp, ebx # sum = sum + i
mov [rax+0x10], ebp # not sure what this is for
add ebx, +0x01 # i = i + 1
cmp ebx, +0x64 # if i <= 100
jle ->LOOP # goto LOOP
jmp ->EXIT
We have some extraneous instructions, which don’t seem to be helpful here. Can
we get this loop to be tighter.
With type information from ffi
struct forces the instruction integer addition
(add
). The example above is simple and raw, but it shows how easy adding types
was. LuaJIT has ways of
abstracting the struct field,
so the implementation does bleed through.
Using the metamethods, we can have the struct use the +
operator.
local ffi = require("ffi")
ffi.cdef [[
typedef struct { int v; } int_t;
]]
local int
int = ffi.metatype("int_t", {
__add = function(a, b) return int(a.v + b) end
})
local sum = int(0)
for i = 1, 100 do
sum = sum + i
end
->LOOP: add ebp, ebx # sum = sum + i
add ebx, +0x01 # i = i + 1 cmp
ebx, +0x64 # if i <= 100
jle ->LOOP # goto LOOP
jmp ->EXIT
Here we’ve managed to take standard lua code and optimize it using the ffi
functionality. The JIT is using the type information for more optimized
assembly.
The loop cannot get any tighter unless we remove the loop entirely. I have not
been able to get LuaJIT to perform this optimization.