Windows threads progress for SBCL

I am continuing to hack on SBCL to have working thread support for Windows.

The Thread-Local Storage area is stored in the Arbitrary Data Slot of TIB (Thread Information Block) which is mapped in every thread to the address FS:0x14. Some of the VOPs now take more instructions and use more registers and do more memory accesses since in other ports SBCL can directly address the TLS area, but instead, in Windows, we have to load the pointer to the TLS area into a register. So far is too hard to quantify the performance effects of this.

I'm still kind of slow at this low-level stuff. I've spent a week debugging an issue caused by my inattentiveness. Incidentally, this made me get to know the GDB better. GDB is a brilliant debugger. This might be hard to see behind its lowly command-line UI. One of the nice things is that GDB has macros that help implement application-specific debugging functionality. Before that, I mostly used the Visual Studio debugger and didn't expect the GDB to be so good.

As for SBCL, today I had a first working build that would run (earlier it wouldn't get past the cold init). TLS is apparently also working.

Now I need to work on the runtime to teach it to create threads and suspend them for garbage collection purposes.

For the thread suspension, I'm planning to inject a stack frame for a fake call to the function that will await the resumption signal (as in http://github.com/dmitryvk/testw32/blob/master/plant_stack.c). This should allow for correctly suspending the threads that are performing blocking syscalls as well as threads that running userspace code.