Object Pools in C#: Examples, Internals and Performance Benchmarks
#csharp #objectpool #performance #benchmark #algorithmsObject Pool is a design pattern that allows reusing objects instead of creating new ones. This can be very useful in scenarios where object initialization is expensive. It is widely used, especially in game development and applications where low memory usage is critically important. In this article, we will look at how this pattern is implemented in C# and how it can improve performance.
This article is presented as a part of C# Advent 2024.
Table Of Contents
Disclaimer
The results of the benchmarks in this article are very conditional. I admit, that the benchmark may show different results on a different computer, with a different CPU, with a different compiler or in a different scenario. Always check your code in your specific conditions and don’t trust to the articles from the internet.
The source code and raw results are located in this repo.
What is Object Pool?
Object Pool is a design pattern that allows reusing objects instead of creating new ones. This can be very useful in scenarios where object initialization is expensive. The typical usage of an object pool consists of these steps:
- Rent an object from the pool.
- Use the object to perform some work.
- Return the object to the pool.
- Optionally, the object pool can reset the object’s state when it is returned.
The pseudocode for using an object pool looks like this:
var obj = objectPool.Get();
try
{
// do some work with obj
}
finally
{
objectPool.Return(obj, reset: true);
}
The Object Pool pattern is widely used, especially in game development and applications where low memory usage is critically important.
Example of searching Object Pool in GitHub
.NET provides several classes that implement the Object Pool pattern:
- ObjectPool: A general-purpose object pool.
- ArrayPool: A class designed specifically for pooling arrays.
These classes may look similar, but their implementation is different. We will consider them separately.
ObjectPool class
The ObjectPool
class is available by default only in ASP.NET Core applications. You can find Its source code here. For other types of C# applications, you need to install the Microsoft.Extensions.ObjectPool package.
To use a pool, call the Create<T>
method from the static ObjectPool class:
var pool = ObjectPool.Create<SomeType>();
var obj = pool.Get();
You can also define a custom pooling policy and pass it to the Create<T>
method. A policy lets you control how objects are created and cleaned up. For example, to reuse a list of integers, you can define the following policy:
public class ListPolicy : IPooledObjectPolicy<List<int>>
{
public List<int> Create() => [];
public bool Return(List<int> obj)
{
obj.Clear();
return true;
}
}
Now let’s take a look at how the ObjectPool
class works internally.
How it works
When retrieving an object from the pool, the ObjectPool
works as follows:
- It checks if
_fastItem
is not null and can be taken by the current thread usingInterlocked.CompareExchange
. - If
_fastItem
isnull
or already taken by another thread, it tries to dequeue an object from theConcurrentQueue _items
. - If both
_fastItem
and the queue are empty, a new object is created using the factory function.
ObjectPool<T> internals
When returning an object to the pool, the ObjectPool
works in an opposite way:
- It checks if the object passes the
_returnFunc
validation. If not, it means that the object should be discarded by policy. - If
_fastItem
isnull
, the object is stored there usingInterlocked.CompareExchange
. - If
_fastItem
is already in use, the object is added to theConcurrentQueue
if the total number of items is within the maximum capacity. - If the pool is full, the object is discarded, and the item count is adjusted.
Performance
To test how ObjectPool<T>
affects performance, I created two benchmarks:
- without pooling (creates a new list for each operation);
- with the object pool.
Each benchmark does the following in a loop:
- Creates a new list or rents from the pool.
- Adds the values in the list.
- Returns the list to the pool (if pooling is used).
The benchmarks repeat this process 100 times for each thread. The threads count varies from 1 to 32. The list size varies from 10 to 1,000,000.
The results are shown in the diagram below. The x-axis is a logarithmic scale, and the y-axis shows the percentage difference compared to the baseline without pooling.
ObjectPool<T> benchmark results. Percentage difference compared to the baseline without pooling.
From the results, we can see that using ObjectPool
in a single-thread scenario is 10% – 50% faster, compared to creating a new list for each iteration. However, in a multithreaded scenario, ObjectPool
performs worse for relatively small objects. This is most probably due to thread synchronization latency when accessing to the _fastItem
and ConcurrentQueue
.
ObjectPool<T> benchmark results. Absolute values.
ArrayPool
ArrayPool<T>
is a class which is available from any C# application. It locates in System.Buffers
namespace. You can find its source code here. The ArrayPool
class is an abstract and has 2 implemenations: SharedArrayPool and ConfigurableArrayPool.
The usage of ArrayPool<T>
follows the typical object pool pattern and is quite simple. Here’s an example that uses the shared pool internally.
var pool = ArrayPool<int>.Shared;
var buffer = pool.Rent(10);
try
{
// do some work with array
}
finally
{
pool.Return(buffer, clear: true);
}
You can also configure the pool. Static method Create returns a ConfigurableArrayPool
instance.
var pool = ArrayPool<int>.Create(
maxArrayLength: 1000,
maxArraysPerBucket: 20);
This method lets you specify the maximum array length and the maximum number of arrays per bucket (we’ll learn about buckets later). By default, these values are 2^20
and 50
respectively.
It’s important to note that the size of the array returned will always meet the requested size, but it may be larger:
using System.Buffers;
var (pow, cnt) = (4, 0);
while (pow <= 30)
{
var x = (1 << pow) - 1;
var arr = ArrayPool<int>.Shared.Rent(x);
Console.WriteLine(
"Renting #{0}. Requested size: {1}. Actual size: {2}.",
++cnt, x, arr.Length);
pow++;
}
// Renting #1. Requested size: 15. Actual size: 16.
// Renting #2. Requested size: 31. Actual size: 32.
// Renting #3. Requested size: 63. Actual size: 64.
// ...
// Renting #26. Requested size: 536870911. Actual size: 536870912.
// Renting #27. Requested size: 1073741823. Actual size: 1073741824.
How it works
As said earlier, ArrayPool<T>
has 2 implementations. We will consider them separately.
SharedArrayPool
SharedArrayPool has 2 tiers of cache: per-thread and shared caches.
The per-thread cache is implemented as a private static field named t_tlsBuckets
, which is essentially an array of arrays. Each thread gets its own instance of this cache due to Thread Local Storage, achieved by applying the ThreadStaticAttribute
to the t_tlsBuckets
field.
This allows each thread to maintain a small cache for various array sizes, ranging from 2^4
to 2^30
(27 buckets in total).
When we’re trying to get an array from a pool, the algorithm tries to get it from t_tlsBuckets
field. If an array of the needed size is not found in t_tlsBuckets
, the algorithm checks the shared cache, stored in _buckets
. This shared cache is an array of Partitions
objects, one for each allowed bucket size (27 buckets in total). Each Partitions
object contains an array of N Partition
objects, where N is the number of processors. Each Partition
works like a stack that can hold up to 32 arrays. Yeah, sounds complicated, so see the diagram below.
SharedArrayPool<T> internals
When we’re returning the array to the pool, the algorithm first tries to store it in the per-thread cache. If t_tlsBuckets
already contains an array for the same size, the existing array from t_tlsBuckets
is pushed into the shared cache and the new array is saved in t_tlsBuckets
for better performance (CPU cache locality). If the current core’s stack is full, it searches for space in the stacks of other cores. If all stacks are full, the array is dropped.
ConfigurableArrayPool
ConfigurableArrayPool
is simpler compared to SharedArrayPool
. It has only one private field for storing pooled arrays, called _buckets
. This field is an array of Bucket
instances, where each Bucket
represents a collection of arrays (see the diagram below). Since _buckets
field is shared across all threads, each Bucket
uses a SpinLock for thread-safe access.
ConfigurableArrayPool<T> internals
Performance
The ArrayPool<T>
benchmarks are similar to the ObjectPool<T>
benchmarks:
- without pooling (creates a new array for each operation);
- with the shared pool;
- with the configurable pool.
ArrayPool<T> benchmark results. Percentage difference compared to the baseline without pooling.
As we can see from the results, SharedArrayPool
is faster almost in all cases, especially with a multiple threads scenario. The only exception is when the array size is 10.
The opposite situation with a ConfiguratbleArrayPool
. This class has worse performance in multithreading scenario for relatively small arrays. I believe the reason is the same as in ObjectPool<T>
: thread synchronization latency when accessing arrays inside Bucket instances.
ArrayPool<T> benchmark results. Absolute values.
Conclusion
ObjectPool
and ArrayPool
can improve performance in scenarios where objects are expensive to create and reuse is possible. However, in multithreaded scenarios, the benefits of pooling are less clear. For small objects, the overhead of synchronization mechanisms can outweigh the performance gains. Developers should carefully benchmark and evaluate pooling in their specific use cases before integrating it into production systems.