Recently there were a lot of talks going along the lines of “threads and blocking code is bad and callbacks-based asynchrony is good”.
Both synchronous blocking computation processes and async non-blocking computation processes have some state associated with them. For example, for an HTTP server that state might contain:
This state is always present independent of the style that the code is written in - whether it uses callbacks or is represented as a state machine or if it's sequential blocking code running in its own thread.
Proponents of some fashionable and hyped technologies sometimes present the idea that the only way to write “web scale” apps is to ditch threads and blocking calls, and switch to a style that I call “callback spaghetti”. A more formal name for “callback spaghetti” is the continuation-passing style. Transforming code to CPS (continuation-passing style) is a mechanical task, but when done manually it becomes quite tedious and error-prone and it also balloons the size of the code.
Continuation-passing style by passing into each function a callback which is to be called by the inner function the result is available. Furthermore, composing functions involves composing callbacks. It is possible to see that the resulting callback chain is equivalent to a call stack, albeit represented as a linked list instead of a contiguous array.
My argument for sequential code running in separate threads is as follows: such sequential code is easier to read, and flow control structures are implemented more or less easily. Such code is more reliable since there are simple and reliable mechanisms for error handling; there are exceptions; stack traces contain useful information; it is easier to ensure the timely release of resources.
The main argument against thread-based concurrency is that threads occupy a lot of resources, hence we can't create too many of them. For native threads this might be true: they have relatively large creation overhead, they occupy a significant amount of memory for stack (usually several megabytes) and thread-local storage, and they have a lot of overhead for context switching. But OS-native threads are just one way to implement threads: threads can be implemented with lower overhead.
Let's consider Linux glibc threads as an example. Here's what is contained inside a thread:
The main areas of overhead for such threads are:
POSIX defines several means to manage contexts and switch them - makecontext
, getcontext
, setcontext
, swapcontext
. A context in POSIX contains the entire thread state except for kernel objects. With the help of the contexts, we can make our own threads. Such threads that are created in userspace are called green threads. When using green threads, we have to provide the stack memory ourselves. This lets us have control over and minimize memory usage by threads.
The largest downside of POSIX contexts is that a context also contains the signal mask which entails a syscall to change it in order to maintain context switch semantics defined by POSIX. Oftentimes we don't have a particular need to manage a signal mask. If we implement swapcontext
manually (which would take us a dozen of assembly instructions per CPU architecture) then the overhead would reduce significantly.
As a simple optimization, we can skip saving unused CPU registers. As a further optimization, we could employ liveness analysis to avoid saving local variables that are not used while a thread is not being scheduled.
In conclusion, the usage of green threads allows us to write the same good old straightforward sequential code and at the same time have a lot of threads running. For this reason, I much prefer using green threads for network IO-related tasks and native OS threads for everything else.
See also:
PS[0]. In node.js ecosystem there is a node-fibers
package. It uses the libcoroutine
underhood to implement lightweight threads on V8 engine which somewhat alleviates the callback spaghetti problem.
PS[1]. Erlang is a great example of what green threads are capable of. Green threading allows it to have hundreds of thousands of lightweight processes on a single machine.