Notes on threads and spaghetti code

2012-02-13

Recently there were a lot of talks going along the lines of “threads and blocking code is bad and callbacks-based asynchrony is good”.

TL;DR

event-driven concurrency is pretty much equivalent to thread-based concurrency
it is possible to implement threads with a low overhead
threads allow writing natural sequential code
therefore threads are more useful than asynchrony based on events and callbacks

A little bit of theory

Both synchronous blocking computation processes and async non-blocking computation processes have some state associated with them. For example, for an HTTP server that state might contain:

network socket and associated kernel resources
buffer in userspace for socket IO
HTTP protocol parser state
HTTP headers and body
HTTP request body parser state
response buffer
other state that is necessary for HTTP handler state (e.g., state and buffer for database query processing)

This state is always present independent of the style that the code is written in - whether it uses callbacks or is represented as a state machine or if it's sequential blocking code running in its own thread.

Proponents of some fashionable and hyped technologies sometimes present the idea that the only way to write “web scale” apps is to ditch threads and blocking calls, and switch to a style that I call “callback spaghetti”. A more formal name for “callback spaghetti” is the continuation-passing style. Transforming code to CPS (continuation-passing style) is a mechanical task, but when done manually it becomes quite tedious and error-prone and it also balloons the size of the code.

Continuation-passing style by passing into each function a callback which is to be called by the inner function the result is available. Furthermore, composing functions involves composing callbacks. It is possible to see that the resulting callback chain is equivalent to a call stack, albeit represented as a linked list instead of a contiguous array.

My argument for sequential code running in separate threads is as follows: such sequential code is easier to read, and flow control structures are implemented more or less easily. Such code is more reliable since there are simple and reliable mechanisms for error handling; there are exceptions; stack traces contain useful information; it is easier to ensure the timely release of resources.

In practice

The main argument against thread-based concurrency is that threads occupy a lot of resources, hence we can't create too many of them. For native threads this might be true: they have relatively large creation overhead, they occupy a significant amount of memory for stack (usually several megabytes) and thread-local storage, and they have a lot of overhead for context switching. But OS-native threads are just one way to implement threads: threads can be implemented with lower overhead.

Let's consider Linux glibc threads as an example. Here's what is contained inside a thread:

per-thread OS task scheduler state inside a kernel
stack space
CPU context (saved CPU and FPU registers)
signal mask
thread return value
synchronization objects

The main areas of overhead for such threads are:

thread switching always happens from the OS kernel, i.e. for each thread switch there is a transition from userspace to kernel and back. This is bad for performance if such switches happen often
thread scheduler, depending on its implementation, might visibly slow down if the number of threads gets too large
threads have quite a large stack by default - around several megabytes. In my opinion, this is the main reason why we cannot have too many native threads.

POSIX defines several means to manage contexts and switch them - makecontext, getcontext, setcontext, swapcontext. A context in POSIX contains the entire thread state except for kernel objects. With the help of the contexts, we can make our own threads. Such threads that are created in userspace are called green threads. When using green threads, we have to provide the stack memory ourselves. This lets us have control over and minimize memory usage by threads.

The largest downside of POSIX contexts is that a context also contains the signal mask which entails a syscall to change it in order to maintain context switch semantics defined by POSIX. Oftentimes we don't have a particular need to manage a signal mask. If we implement swapcontext manually (which would take us a dozen of assembly instructions per CPU architecture) then the overhead would reduce significantly.

As a simple optimization, we can skip saving unused CPU registers. As a further optimization, we could employ liveness analysis to avoid saving local variables that are not used while a thread is not being scheduled.

In conclusion, the usage of green threads allows us to write the same good old straightforward sequential code and at the same time have a lot of threads running. For this reason, I much prefer using green threads for network IO-related tasks and native OS threads for everything else.