For now, my work on threading support in SBCL is stalled since I still can not implement thread suspension for garbage collection.
The main problem is that Windows API does not offer asynchronous signals (which are available on all other platforms and are surfaced as a pthread_kill
function in POSIX Threads API).
I've tried several options:
The thread in pseudo-atomic sections suspends on its own, and in all the rest cases uses the SuspendThread
. But it's fraught with problems in many places: threads sometimes just hung, there were errors inside the garbage collector and I was getting spurious page faults. I later understood that this was due to stopping the threads in the wrong place; SBCL uses not just pseudo-atomic sections for non-interruptible code, but also the thread signal mask.
Then I tried adding a signal mask to a thread and considered this mask when suspending the thread: first, we suspend the thread using SuspendThread
, and if the thread has masked the SIG_STOP_FOR_GC
signal, the thread is resumed using ResumeThread
and we sleep for a while and then repeat. Having done this, threads have stopped hanging, but now I got different behavior: as soon as garbage collection completes and the threads resume, some threads discover invalid virtual memory protection flags. I attribute this to the thread suspending inside the exception handler and so it observes the exception handler context instead the proper thread context.
With all of that, I conclude that I won't be able to achieve asynchronous non-cooperative thread suspension.
One of the things that I've researched was the way the .NET garbage collector handles thread suspension (http://msdn.microsoft.com/en-us/library/678ysw69.aspx). .NET CLR tracks suspension requests for each thread. And compiler inserts special instructions - gc safepoints - so that the thread would check from time to time whether it should suspend. Specific CPU instructions used for the safepoints are not as important. These instructions could be, for example, a read from the special region of memory which is unmapped when garbage collection is started, causing page fault and exception in the reading thread and giving it a chance to react to garbage collection. When the thread leaves the bounds of CLR-managed code (that is, performs a blocking operation or calls out to foreign code), it sets the flag that signals that this thread will not touch the GC heap. And when the thread returns back into CLR-managed code, it checks whether it should stop and wait for GC to complete. And there is a weird quirk - if a thread did not stop within 250 milliseconds it will be forcefully stopped using SuspendThread
. A thread might not stop timely if did not reach the safepoint within a long time. In the case of .NET, this might happen if the thread is running a long loop that does not call other methods or perform memory allocations. In this regard, the .NET garbage collector is much less careful than the JVM which inserts safepoints not just to method calls but also to backward jumps (any loop is implemented as a backward jump) which guarantees that thread will eventually reach a safepoint.
I shall try to employ a similar technique.
Safepoints will be placed in several kinds of places:
pseudo-atomic
sectionWhen entering a foreign code (such as a C function call) or invoking blocking operations, a thread will set a flag to signify that it will not do anything with the GC-managed heap, so this thread may be ignored for the purpose of suspending all mutator threads during GC cycle
As I don't want to go too deep into the compiler, I will try to implement GC as follows:
SIG_STOP_FOR_GC
, then it is resumed and GC awaits while the thread leaves this sectionjmp
or call
instruction by replacing it with a trap instruction and resume the thread. Soon enough the thread will reach the breakpoint and enter the exception handler. The exception handle will restore the proper instruction and suspend the thread.