Efficient batching

2024-01-25

There is a popular misconception that introducing batching sacrifices latency to improve throughput.

I want to share a technique that enables batching without incurring any additional latency. I call it the “non-blocking batching” but this name isn't super popular.

I first read about it in an article by Lorenzo Alberton but I haven't seen much use of this technique in the wild.

Batching

Batching is a common technique to improve the efficiency of processing a queue of tasks. It is performed by gathering multiple tasks into one batch and processing multiple tasks at once instead of one by one.

When processing tasks in a queue, there are two metrics that we care about:

Throughput - How many tasks per second we can process
Latency - How much time has passed between enqueueing the task and processing it

Batching improves throughput by amortizing the overhead of expensive operations that can be done once per batch instead of once per task.

Examples of expensive operations:

database transaction commit
write to a file on a disk
network request
locking a mutex.

How batching is usually done

Usually, batch processing is implemented like this:

Wait for a new task in a queue (blocking indefinitely)
Add the task to a new batch
Wait until a specified timeout has passed, adding new tasks to the batch
- If the maximum batch size was reached before the timeout, stop waiting
Process all tasks of the batch
Loop to the beginning

For a random task distribution (governed by a Poisson random process), the batch processing looks as follows:

Timeline of blocking batching

On this diagram, there are two timelines:

The upper timeline is the arrival of new tasks into the queue.
The bottom timeline is the processing. The yellow portion represents the waiting for new tasks in the batch, green is the batch processing.
Lines connect queue tasks with the batch that they were processed in.

Analysis of batching efficiency

We have two parameters for batching:

Maximum batch size.

Batch processing time is non-linear in the batch size, and after some threshold, batching stops being effective. Limiting the batch size prevents this from happening.
Batch waiting time.

Increasing batch waiting time allows more tasks to accumulate into a single batch.

But simultaneously this is a wasted time that increases the latency of every task by approximately half of the batch waiting time.

Usually, the batch waiting time is set to around half of the acceptable latency or just by eyeballing it.

But, counterintuitively, it is best to set this time to zero. That is, we have the following batching algorithm:

Wait for a new task in a queue (blocking indefinitely)
Get up to MaximumBatchSize tasks from the queue without waiting
Process the tasks
Loop back to the beginning

Let's analyze the behavior of this batching in several scenarios.

Scenario 1. The rate of arrival of new tasks is slow (much slower than processing time). In this case, every task will be processed immediately with no additional latency. Additional waiting time would bring no benefit and just increase the latency.
Scenario 2. Tasks arrive at regular intervals, but the average interval length is smaller than the processing time of one task.

We get full utilization of processing resources with no additional latency. The batch size automatically adapts to the speed of new tasks’ arrival and the processing time.
Scenario 3. New tasks arrive irregularly, in bursts. When idle, the first task of a burst gets processed immediately and the rest gets into one batch.

This might be the only case when waiting might make sense since it will give a change for the entire burst to be processed as one batch.

Benchmarks

I did a simulation of different cases and found out that the non-blocking batching beats the conventional batching every time.

Here's how two batching algorithms compare in different scenarios.

Scenario	Average latency with non-blocking batching	Average latency with blocking batching
Slow rate of new tasks	1.55s	1.94s
Fast rate of new tasks	5.13s	7.1s
Bursty new tasks	2s	2.06s
New tasks arrive at the same as they can be processed	5.14s	7.12s

We can see that in all scenarios it is better to do non-blocking batching compared to conventional blocking batching.

In all cases, the following parameters were used:

Cost of processing: 1s per batch + 0.07 s per task in batch
Batch waiting time: 0.5s

The code for the benchmark is available at https://github.com/dmitryvk/nowait-batch-modeling.

Conclusion

The non-blocking batching is the superior batching algorithm. It gives a better latency, has fewer edge cases, and does not require complicated tuning.

It should be the default when developing systems that have a queue somewhere.

Some further notes

Although non-blocking batching improves latency and throughput, it also in many cases increases the utilization (that is, the fraction of a time when the processor is active). If we have a low rate of arrival of new tasks and we want to minimize the utilization of the processor, then it is beneficial to introduce artificial delays.
This batching easily generalizes to multiple workers. Each worker would do batching independently.