On inline assembly in SBCL

Introduction

There recently was a discussion on http://lisper.ru/ about inline assembly in SBCL, and here's a guide on including inline assembly in SBCL.

SBCL (as well as in other compilers) contains primitives - language constructs that are not lowerable to other constructs. SBCL implements these intrinsics both in the IR1 (which represents the code as a control/data flow graph) and in the IR2 (virtual machine). At the IR1 level, constructs such as let and lambda are implemented, while simpler “intrinsics” (such as arithmetics, memory allocation, etc.) are implemented as virtual machine instructions that have associated code generation procedures.

These instructions are called VOPs in the SBCL parlance. SBCL is architected in such a way that the set of VOPs is extensible rather than fixed in stone. For example, the sb-cga project (a computer graphics algebra library) implements fast matrix and vector multiplication using the SSE2 instruction set by adding a couple of VOPs.

Example

Let's dissect a simple example - let's add an inline assembly code that XORs two numbers together.

Abstractly, the assembly for this looks like the following (using the Intel assembly syntax):

mov result, arg1
xor result, arg2
  • result is the registry which will contain the result;
  • arg1 and arg2 are the registers that contain arguments.

In order to invoke this sequence of assembly instructions from Lisp code, we have to do several things:

  • Define a new primitive function (we'll call it my-xor)
  • Define a new VOP corresponding to this primitive
  • Define a normal function my-xor

Defining a new primitive function

By doing the first two operations, SBCL will be able to compile expressions like (my-xor a b).

The macro SB-C:DEFKNOWN is responsible for defining a primitive. It accepts types of arguments and types of returned value(s). In this example, we will use only short integers (the FIXNUM type).

(sb-c:defknown my-xor (fixnum fixnum) fixnum)

Defining a new VOP

The macro SB-C:DEFINE-VOP registers a new VOP. This macro has a lot of parameters related to codegen (such as register selection, argument types, clobbered registers, and lifetimes of arguments), optimization (the “cost” of an operation), and some others. For this example, the VOP definition looks like this:

(sb-c:define-vop (my-xor)
  (:translate my-xor)
  (:args (a :scs (sb-vm::any-reg))
         (b :scs (sb-vm::any-reg)))
  (:arg-types fixnum fixnum)
  (:results (c :scs (sb-vm::any-reg)))
  (:result-types fixnum)
  (:policy :fast-safe)
  (:generator
   0
   (sb-c::inst sb-vm::mov c a)
   (sb-c::inst sb-vm::xor c b)))

Let's dissect parts of this definition

  • (sb-c:define-vop (my-xor) defines the VOP name. In general, the VOP name might be different than the function name. It is often the case that one function has several corresponding VOPs for different argument types (such as multiple variants of +)

  • (:translate my-xor) specifies that the VOP being defined implements the primitive MY-XOR. This part of the definition specifies that this VOP should be used to translate the (my-xor a b) expression

  • (:args ...) and (:results ...) defines the names of the arguments and result values, and the argument passing ABI.

    • :scs specified the set of allowed storage classes. We may consider that a storage class is either a set of registers, the CPU stack, or the FPU stack. The sb-vm::any-reg storage class corresponds to any of the general-purpose registers.
  • (:arg-types ..) и (:result-types ...) specifies the types of arguments and return values.

  • (:policy :fast-safe) specifies the set of circumstances in which to invoke this VOP (when there is ambiguity). Allowed values are:

    • :small - when a smaller code size is desired
    • :fast - when faster (but maybe unsafe) code is desired
    • :fast-safe - when it is desired to produce code that is both safe and fast
  • (:generator ...) defines the meat of the VOP - the code generation procedure. This procedure may be an arbitrary Lisp code that should invoke some special macros to emit instructions.

    Inside this procedure, we use the SB-C::INST macro that implements an assembly syntax similar to the Intel assembly language. Arguments for instructions may be:

    • numeric constants
    • symbolic references to arguments, results, or temporaries
    • references to specific registers. For example, SB-VM::EAX-TN is a reference to EAX register
    • memory address references. For example, the following references a 4-byte word at an address eax + edi * 2 + 3: (sb-vm::make-ea :dword :base sb-vm::eax-tn :index sb-vm::edi-tn :scale 2 :disp 3)
    • references to labels. The labels are created with gen-label and emit-label:
      (let ((l (gen-label)) (inst jmp l) (emit-label l))
      
    • references to various objects. For example, (sb-vm::make-fixup "strlen" :foreign) is the reference to the external function strlen

Defining a wrapper function

So, having defined a VOP, it remains for us to do just a small thing - define a wrapper function MY-XOR around the primitive MY-XOR:

(defun my-xor (a b)
  (my-xor a b))

This would allow us to not just compile expressions such as (my-xor a b) but also take a reference to normal Lisp function #'MY-XOR.

At a first glance, this definition of my-xor is nonsensical: function is defined as a plain recursion without the base non-recursive case. But there is a catch: the function MY-XOR calls the primitive MY-XOR. That's why the compiler generates not a function call but rather generates the code for the VOP corresponding to the primitive MY-XOR.

Let's make sure that this understanding of this code is correct:

(disassemble 'my-xor)
=>
; disassembly for MY-XOR
; 02DED843:       488BD1           MOV RDX, RCX               ; no-arg-parsing entry point
;       46:       4831FA           XOR RDX, RDI               ; <-- beginning of our code
;       49:       488BE5           MOV RSP, RBP
;       4C:       F8               CLC
;       4D:       5D               POP RBP
;       4E:       C3               RET
;       4F:       CC0A             BREAK 10                   ; error trap
;       51:       02               BYTE #X02
;       52:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
;       53:       54               BYTE #X54                  ; RCX
;       54:       CC0A             BREAK 10                   ; error trap
;       56:       02               BYTE #X02
;       57:       08               BYTE #X08                  ; OBJECT-NOT-FIXNUM-ERROR
;       58:       95               BYTE #X95                  ; RDX
;       59:       CC0A             BREAK 10                   ; error trap
;       5B:       04               BYTE #X04
;       5C:       08               BYTE #X08                  ; OBJECT-NOT-FIXNUM-ERROR
;       5D:       FED501           BYTE #XFE, #XD5, #X01      ; RDI

This listing has 2 instructions starting at offset 0x02DED843 that are exactly the output of just defined VOP. And the rest is the standard prolog and epilogue, and the code to check arguments’ types.

Conclusion

Let's also try to use the newly defined primitive inside a normal function:

(defun foo (a b)
  (declare (type fixnum a b)
           (optimize (speed 3) (safety 0)))
  (my-xor (1+ a) (1+ b)))

And let's have a look at its disassembly:

; disassembly for FOO
; 0325A727:       48FFC2           INC RDX                    ; no-arg-parsing entry point
;       2A:       48FFC7           INC RDI
;       2D:       48C1E203         SHL RDX, 3
;       31:       48C1E703         SHL RDI, 3
;       35:       488BD2           MOV RDX, RDX               ; <-- the code generated for MY-XOR
;       38:       4831FA           XOR RDX, RDI               ; <--
;       3B:       488BE5           MOV RSP, RBP
;       3E:       F8               CLC
;       3F:       5D               POP RBP
;       40:       C3               RET

In this listing, we can see that the SBCL generated exactly the code that we wanted.

The code to generate such a simple inline assembly got quite verbose. This is because we utilize the full capabilities of the compiler, not just a simpler inline assembly.