Ensure JIT execution of luajit

I've always been impressed with the LuaJIT runtime. Mike Pall has been praised (and rightfully so) for years for creating it. It is the fastest runtime for a dynamic language.

I've taken an interest in experimenting with it.

It has the ability to optimize dynamic code to efficient assembly instructions.

For example, calculating the sum of 1 to 100.

local sum = 0
for i = 1, 100 do
    sum = sum + i
end

It will generate assembly that is efficient. We can get a print out of the assembly using luajit -jdump. This outputs a lot, but our interest lies in the LOOP scope.

->LOOP:
xorps xmm6, xmm6
cvtsi2sd xmm6, ebp
addsd xmm7, xmm6           # sum = sum + i
add ebp, +0x01             # i = i + 1
cmp ebp, +0x64             # if i <= 100
jle ->LOOP                 #   goto LOOP
jmp ->EXIT

The numbers assigned to sum and i are used as typed information. These types don't change during the execution of the script. This allows the JIT to make efficient assembly instructions.

LuaJIT (like Lua) assumes all numbers are floating point, though. The instruction cvtsi2sd converts integers to floating point. Though this may be fast, it is an extraneous instruction.

I wanted to see if LuaJIT could just do integer specific instructions. This could probably done if we could provide hints to the JIT. Lua doesn't have a static type system, though.

local ffi = require("ffi")
ffi.cdef [[
    typedef struct { int v; } int_t;
]]

local int = ffi.metatype("int_t", {})
local sum = int(0)
for i = 1, 100 do
    sum.v = sum.v + i
end
->LOOP:
add ebp, ebx        # sum = sum + i
mov [rax+0x10], ebp # not sure what this is for
add ebx, +0x01      # i = i + 1
cmp ebx, +0x64      # if i <= 100
jle ->LOOP          #   goto LOOP
jmp ->EXIT

We have some extraneous instructions, which don't seem to be helpful here. Can we get this loop to be tighter.

With type information from ffi struct forces the instruction integer addition (add). The example above is simple and raw, but shows how easy it was to add types. LuaJIT has ways of abstracting the struct field, so the implementation does bleed through.

Using the metamethods, we can have the struct use the + operator.

local ffi = require("ffi")
ffi.cdef [[
    typedef struct { int v; } int_t;
]]

local int
int = ffi.metatype("int_t", {
    __add = function(a, b) return int(a.v + b) end
})
local sum = int(0)
for i = 1, 100 do
    sum = sum + i
end
->LOOP:
add ebp, ebx    # sum = sum + i
add ebx, +0x01  # i = i + 1
cmp ebx, +0x64  # if i <= 100
jle ->LOOP      #   goto LOOP
jmp ->EXIT

The loop can literally not get tighter here. Unless we remove the loop entirely. LuaJIT can perform loop unrolling, which means the loop flattens, so that sum = sum + 1 is repeated 100 times. I have not been able to get LuaJIT to perofmr this optimization.

Here we've managed to take standard lua code and using he ffi functionality optimize it. The JIT is using the type information for more optimized assembly.