There recently was a discussion on http://lisper.ru/ about inline assembly in SBCL, and here's a guide on including inline assembly in SBCL.
SBCL (as well as in other compilers) contains primitives - language constructs that are not lowerable to other constructs.
SBCL implements these intrinsics both in the IR1 (which represents the code as a control/data flow graph) and in the IR2 (virtual machine).
At the IR1 level, constructs such as let
and lambda
are implemented, while simpler “intrinsics” (such as arithmetics, memory allocation, etc.) are implemented as virtual machine instructions that have associated code generation procedures.
These instructions are called VOPs in the SBCL parlance.
SBCL is architected in such a way that the set of VOPs is extensible rather than fixed in stone.
For example, the sb-cga
project (a computer graphics algebra library) implements fast matrix and vector multiplication using the SSE2 instruction set by adding a couple of VOPs.
Let's dissect a simple example - let's add an inline assembly code that XORs two numbers together.
Abstractly, the assembly for this looks like the following (using the Intel assembly syntax):
mov result, arg1
xor result, arg2
result
is the registry which will contain the result;arg1
and arg2
are the registers that contain arguments.In order to invoke this sequence of assembly instructions from Lisp code, we have to do several things:
my-xor
)my-xor
By doing the first two operations, SBCL will be able to compile expressions like (my-xor a b)
.
The macro SB-C:DEFKNOWN
is responsible for defining a primitive. It accepts types of arguments and types of returned value(s).
In this example, we will use only short integers (the FIXNUM
type).
(sb-c:defknown my-xor (fixnum fixnum) fixnum)
The macro SB-C:DEFINE-VOP
registers a new VOP. This macro has a lot of parameters related to codegen (such as register selection, argument types, clobbered registers, and lifetimes of arguments), optimization (the “cost” of an operation), and some others. For this example, the VOP definition looks like this:
(sb-c:define-vop (my-xor)
(:translate my-xor)
(:args (a :scs (sb-vm::any-reg))
(b :scs (sb-vm::any-reg)))
(:arg-types fixnum fixnum)
(:results (c :scs (sb-vm::any-reg)))
(:result-types fixnum)
(:policy :fast-safe)
(:generator
0
(sb-c::inst sb-vm::mov c a)
(sb-c::inst sb-vm::xor c b)))
Let's dissect parts of this definition
(sb-c:define-vop (my-xor)
defines the VOP name. In general, the VOP name might be different than the function name. It is often the case that one function has several corresponding VOPs for different argument types (such as multiple variants of +
)
(:translate my-xor)
specifies that the VOP being defined implements the primitive MY-XOR
. This part of the definition specifies that this VOP should be used to translate the (my-xor a b)
expression
(:args ...)
and (:results ...)
defines the names of the arguments and result values, and the argument passing ABI.
:scs
specified the set of allowed storage classes. We may consider that a storage class is either a set of registers, the CPU stack, or the FPU stack. The sb-vm::any-reg
storage class corresponds to any of the general-purpose registers.(:arg-types ..)
и (:result-types ...)
specifies the types of arguments and return values.
(:policy :fast-safe)
specifies the set of circumstances in which to invoke this VOP (when there is ambiguity). Allowed values are:
:small
- when a smaller code size is desired:fast
- when faster (but maybe unsafe) code is desired:fast-safe
- when it is desired to produce code that is both safe and fast(:generator ...)
defines the meat of the VOP - the code generation procedure.
This procedure may be an arbitrary Lisp code that should invoke some special macros to emit instructions.
Inside this procedure, we use the SB-C::INST
macro that implements an assembly syntax similar to the Intel assembly language.
Arguments for instructions may be:
SB-VM::EAX-TN
is a reference to EAX
registereax + edi * 2 + 3
: (sb-vm::make-ea :dword :base sb-vm::eax-tn :index sb-vm::edi-tn :scale 2 :disp 3)
gen-label
and emit-label
:
(let ((l (gen-label)) (inst jmp l) (emit-label l))
(sb-vm::make-fixup "strlen" :foreign)
is the reference to the external function strlen
So, having defined a VOP, it remains for us to do just a small thing - define a wrapper function MY-XOR
around the primitive MY-XOR
:
(defun my-xor (a b)
(my-xor a b))
This would allow us to not just compile expressions such as (my-xor a b)
but also take a reference to normal Lisp function #'MY-XOR
.
At a first glance, this definition of my-xor
is nonsensical: function is defined as a plain recursion without the base non-recursive case.
But there is a catch: the function MY-XOR
calls the primitive MY-XOR
. That's why the compiler generates not a function call but rather generates the code for the VOP corresponding to the primitive MY-XOR
.
Let's make sure that this understanding of this code is correct:
(disassemble 'my-xor)
=>
; disassembly for MY-XOR
; 02DED843: 488BD1 MOV RDX, RCX ; no-arg-parsing entry point
; 46: 4831FA XOR RDX, RDI ; <-- beginning of our code
; 49: 488BE5 MOV RSP, RBP
; 4C: F8 CLC
; 4D: 5D POP RBP
; 4E: C3 RET
; 4F: CC0A BREAK 10 ; error trap
; 51: 02 BYTE #X02
; 52: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; 53: 54 BYTE #X54 ; RCX
; 54: CC0A BREAK 10 ; error trap
; 56: 02 BYTE #X02
; 57: 08 BYTE #X08 ; OBJECT-NOT-FIXNUM-ERROR
; 58: 95 BYTE #X95 ; RDX
; 59: CC0A BREAK 10 ; error trap
; 5B: 04 BYTE #X04
; 5C: 08 BYTE #X08 ; OBJECT-NOT-FIXNUM-ERROR
; 5D: FED501 BYTE #XFE, #XD5, #X01 ; RDI
This listing has 2 instructions starting at offset 0x02DED843 that are exactly the output of just defined VOP. And the rest is the standard prolog and epilogue, and the code to check arguments’ types.
Let's also try to use the newly defined primitive inside a normal function:
(defun foo (a b)
(declare (type fixnum a b)
(optimize (speed 3) (safety 0)))
(my-xor (1+ a) (1+ b)))
And let's have a look at its disassembly:
; disassembly for FOO
; 0325A727: 48FFC2 INC RDX ; no-arg-parsing entry point
; 2A: 48FFC7 INC RDI
; 2D: 48C1E203 SHL RDX, 3
; 31: 48C1E703 SHL RDI, 3
; 35: 488BD2 MOV RDX, RDX ; <-- the code generated for MY-XOR
; 38: 4831FA XOR RDX, RDI ; <--
; 3B: 488BE5 MOV RSP, RBP
; 3E: F8 CLC
; 3F: 5D POP RBP
; 40: C3 RET
In this listing, we can see that the SBCL generated exactly the code that we wanted.
The code to generate such a simple inline assembly got quite verbose. This is because we utilize the full capabilities of the compiler, not just a simpler inline assembly.