Since I've Windows threads patches to SBCL for review, I'm publishing my implementation notes.
For each thread, Windows allocates a Thread Information Block
(ref. 1, ref. 2, ref. 3). Thread can access its own Thread Information Block through the FS register (e.g., %FS:0
is the first field in TIB). TIB contains frequently used thread-specific data such as Last Error Number, pointer to SEH (structured exception handling) frame, thread id, and TLS backing store. This structure is documented in ref. 2 and ref. 4. TIB has an ‘Arbitrary’ field which is described as:
The
14h DWORD pvArbitrary
field is theoretically available for applications to use however they want. It's almost like an extra thread-local storage slot for you to use, although I've never seen an application use it.
This sound just like what is needed for implementing thread-local storage - a memory location that we can freely use and that is easily accessible. Unfortunately, there is a possibility that some libraries would also use this field for their purposes (and there are examples of such libraries). And when some other conflicting code will run, it will be hard to detect this.
So clearly, a better way of storing the TLS pointer is needed. TIB also contains the first 64 slots for Windows thread-local storage. There are several options for implementing TLS with that:
Currently, TLS is implemented as option 2. On initialization, we take slot 63 (or fail if we couldn't - but I can't imagine the situation where this can happen). When the windows-threads
branch will be merged to SBCL, we should consider option 3 because if implemented now, the change would be more than necessary for Windows threading support.
TLS notes:
This only applies to Win32. Win64 is very different
Clozure CL, on the other hand, uses ES segment register and undocumented Win32 functions to allocate a segment and store it in a segment register. This is problematic because WOW64 (the Windows subsystem to run Win32 applications in Windows 64) does not preserve the value of ES register during context switches (including thread preemption). This is the reason why the 32-bit version of Clozure CL does not work on Windows 64.
Lisp code in SBCL runs in a managed environment – SBCL needs to be able to safely suspend threads (because it uses the stop-the-world garbage collector) and interrupt them (i.e., call some function in a specific thread).
On Unix-like systems, suspending and interrupting threads is simple (but correctly synchronizing threads and writing code that is tolerable to asynchronous interruptions is not simple). This is achieved by using POSIX signals.
Windows API, on the other hand, does not provide equivalent asynchronous interruptions. They have to be emulated. I'll list several ways of emulating them:
‘Thread hijacking’. Windows lets stop a thread, examine its context, modify it, and restart a thread. This way, another thread may modify the stack and registers in such a way that thread will execute some other function and then return to where it was.
This might seem to be the way to emulate signals, but it is not. This is racy with Windows internals in several ways (ref. 5).
QueueUserAPCEx. Windows has support for Asynchronous Procedure Calls (APCs), but only ‘kernel APCs’ can interrupt a thread when it is in user mode. ‘kernel APCs’ are available from drivers and there is the QueueUserAPCEx project (ref. 6) that provides support for them.
This has a very big drawback in that it requires a kernel-mode driver to be installed to function correctly. Because of that, I haven't tried it. By the way, the pthreads_win32 (ref. 7) project uses QueueUserAPCEx if it's available.
Thread polling. Thread periodically checks if it should suspend or interrupt. This does not require any special support from an operating system. But it requires injecting polling into the code and to tolerate potentially infinite delays (they may occur when a thread executes a blocking operation).
Among the noted approaches, the only one that really works is the last one. So it's clear that we have no other choice but to implement polling in SBCL threads. Luckily, I didn't even need to do this myself – Paul Khuong implemented ‘gc safepoints’ for SBCL (ref. 8, ref. 9). The safepoints are implemented by reading a specific memory location and discarding the result (e.g., with test %eax, GC_POLL_PAGE_ADDR
instruction). In normal conditions, this instruction will not affect the running code. But if the memory page is read-protected or unmapped, this will cause an exception and we will land in the exception handler where we can analyze what has happened and what to do next.
In the next post, I'll write about how threads are synchronized for timely suspension and interruption.