# Exploring JIT execution of luajit

I’ve always been impressed with the LuaJIT runtime. For years, Mike Pall has been praised (and rightfully so) for creating it. It is the fastest runtime for a dynamic language.

I’ve taken an interest in experimenting with it.

It can optimize dynamic code to efficient assembly instructions.

For example, we are calculating the sum of 1 to 100.

local sum = 0
for i = 1, 100 do
	sum = sum + i
end

It generates efficient assembly. The assembly using luajit -jdump. This outputs a lot, but our interest lies in the LOOP scope.

->LOOP:
xorps xmm6, xmm6
cvtsi2sd xmm6, ebp
addsd xmm7, xmm6           # sum = sum + i
add ebp, +0x01             # i = i + 1
cmp ebp, +0x64             # if i <= 100
jle ->LOOP                 #   goto LOOP
jmp ->EXIT

The numbers assigned to sum and i are used as typed information. These types don’t change during the execution of the script, and this allows the JIT to make efficient assembly instructions.

LuaJIT (like Lua) assumes all numbers are floating-point, though. The instruction cvtsi2sd converts integers to floating-point. Though this may be fast, it is an extraneous instruction.

I wanted to see if LuaJIT could just do integer-specific instructions by providing hints to the JIT. Lua doesn’t have a static type system, though.

local ffi = require("ffi")
ffi.cdef [[
    typedef struct { int v; } int_t;
]]

local int = ffi.metatype("int_t", {})
local sum = int(0)
for i = 1, 100 do
	sum.v = sum.v + i
end
->LOOP:
add ebp, ebx        # sum = sum + i
mov [rax+0x10], ebp # not sure what this is for
add ebx, +0x01      # i = i + 1
cmp ebx, +0x64      # if i <= 100
jle ->LOOP          #   goto LOOP
jmp ->EXIT

We have some extraneous instructions, which don’t seem to be helpful here. Can we get this loop to be tighter.

With type information from ffi struct forces the instruction integer addition (add). The example above is simple and raw, but it shows how easy adding types was. LuaJIT has ways of abstracting the struct field, so the implementation does bleed through.

Using the metamethods, we can have the struct use the + operator.

local ffi = require("ffi")
ffi.cdef [[
    typedef struct { int v; } int_t;
]]

local int
int = ffi.metatype("int_t", {
	__add = function(a, b) return int(a.v + b) end
})
local sum = int(0)
for i = 1, 100 do
	sum = sum + i
end
->LOOP: add ebp, ebx # sum = sum + i
add ebx, +0x01       # i = i + 1 cmp
ebx, +0x64           # if i <= 100
jle ->LOOP           # goto LOOP
jmp ->EXIT

Here we’ve managed to take standard lua code and optimize it using the ffi functionality. The JIT is using the type information for more optimized assembly.

The loop cannot get any tighter unless we remove the loop entirely. I have not been able to get LuaJIT to perform this optimization.