Object Pool is a design pattern that allows reusing objects instead of creating new ones. This can be very useful in scenarios where object initialization is expensive. It is widely used, especially in game development and applications where low memory usage is critically important. In this article, we will look at how this pattern is implemented in C# and how it can improve performance.

Object Pools in C#: Examples, Internals and Performance Benchmarks

This article is presented as a part of C# Advent 2024.

Table Of Contents

Disclaimer

The results of the benchmarks in this article are very conditional. I admit, that the benchmark may show different results on a different computer, with a different CPU, with a different compiler or in a different scenario. Always check your code in your specific conditions and don’t trust to the articles from the internet.

The source code and raw results are located in this repo.

What is Object Pool?

Object Pool is a design pattern that allows reusing objects instead of creating new ones. This can be very useful in scenarios where object initialization is expensive. The typical usage of an object pool consists of these steps:

  1. Rent an object from the pool.
  2. Use the object to perform some work.
  3. Return the object to the pool.
  4. Optionally, the object pool can reset the object’s state when it is returned.

The pseudocode for using an object pool looks like this:

var obj = objectPool.Get();

try  
{  
    // do some work with obj  
}  
finally  
{  
    objectPool.Return(obj, reset: true);  
}

The Object Pool pattern is widely used, especially in game development and applications where low memory usage is critically important.

Example of searching Object Pool in GitHub Example of searching Object Pool in GitHub

.NET provides several classes that implement the Object Pool pattern:

  • ObjectPool: A general-purpose object pool.
  • ArrayPool: A class designed specifically for pooling arrays.

These classes may look similar, but their implementation is different. We will consider them separately.

ObjectPool class

The ObjectPool class is available by default only in ASP.NET Core applications. You can find Its source code here. For other types of C# applications, you need to install the Microsoft.Extensions.ObjectPool package.

To use a pool, call the Create<T> method from the static ObjectPool class:

var pool = ObjectPool.Create<SomeType>();  
var obj = pool.Get();

You can also define a custom pooling policy and pass it to the Create<T> method. A policy lets you control how objects are created and cleaned up. For example, to reuse a list of integers, you can define the following policy:

public class ListPolicy : IPooledObjectPolicy<List<int>>  
{  
    public List<int> Create() => [];

    public bool Return(List<int> obj)  
    {  
        obj.Clear();  
        return true;  
    }  
}

Now let’s take a look at how the ObjectPool class works internally.

How it works

When retrieving an object from the pool, the ObjectPool works as follows:

  1. It checks if _fastItem is not null and can be taken by the current thread using Interlocked.CompareExchange.
  2. If _fastItem is null or already taken by another thread, it tries to dequeue an object from the ConcurrentQueue _items.
  3. If both _fastItem and the queue are empty, a new object is created using the factory function.

ObjectPool\<T\> internals ObjectPool<T> internals

When returning an object to the pool, the ObjectPool works in an opposite way:

  1. It checks if the object passes the _returnFunc validation. If not, it means that the object should be discarded by policy.
  2. If _fastItem is null, the object is stored there using Interlocked.CompareExchange.
  3. If _fastItem is already in use, the object is added to the ConcurrentQueue if the total number of items is within the maximum capacity.
  4. If the pool is full, the object is discarded, and the item count is adjusted.

Performance

To test how ObjectPool<T> affects performance, I created two benchmarks:

  • without pooling (creates a new list for each operation);
  • with the object pool.

Each benchmark does the following in a loop:

  1. Creates a new list or rents from the pool.
  2. Adds the values in the list.
  3. Returns the list to the pool (if pooling is used).

The benchmarks repeat this process 100 times for each thread. The threads count varies from 1 to 32. The list size varies from 10 to 1,000,000.

The results are shown in the diagram below. The x-axis is a logarithmic scale, and the y-axis shows the percentage difference compared to the baseline without pooling.

ObjectPool<T> benchmark results. Percentage difference compared to the baseline without pooling. ObjectPool<T> benchmark results. Percentage difference compared to the baseline without pooling.

From the results, we can see that using ObjectPool in a single-thread scenario is 10% – 50% faster, compared to creating a new list for each iteration. However, in a multithreaded scenario, ObjectPool performs worse for relatively small objects. This is most probably due to thread synchronization latency when accessing to the _fastItem and ConcurrentQueue.

ObjectPool<T> benchmark results. Absolute values. ObjectPool<T> benchmark results. Absolute values.

ArrayPool

ArrayPool<T> is a class which is available from any C# application. It locates in System.Buffers namespace. You can find its source code here. The ArrayPool class is an abstract and has 2 implemenations: SharedArrayPool and ConfigurableArrayPool.

The usage of ArrayPool<T> follows the typical object pool pattern and is quite simple. Here’s an example that uses the shared pool internally.

var pool = ArrayPool<int>.Shared;  
var buffer = pool.Rent(10);  
try  
{  
    // do some work with array  
}  
finally  
{  
    pool.Return(buffer, clear: true);  
}

You can also configure the pool. Static method Create returns a ConfigurableArrayPool instance.

var pool = ArrayPool<int>.Create(
    maxArrayLength: 1000, 
    maxArraysPerBucket: 20);

This method lets you specify the maximum array length and the maximum number of arrays per bucket (we’ll learn about buckets later). By default, these values are 2^20 and 50 respectively.

It’s important to note that the size of the array returned will always meet the requested size, but it may be larger:

using System.Buffers;

var (pow, cnt) = (4, 0);  
while (pow <= 30)  
{  
    var x = (1 << pow) - 1;  
    var arr = ArrayPool<int>.Shared.Rent(x);  
    Console.WriteLine(  
        "Renting #{0}. Requested size: {1}. Actual size: {2}.",   
        ++cnt, x, arr.Length);  
    pow++;  
}

// Renting #1. Requested size: 15. Actual size: 16.  
// Renting #2. Requested size: 31. Actual size: 32.  
// Renting #3. Requested size: 63. Actual size: 64.  
// ...  
// Renting #26. Requested size: 536870911. Actual size: 536870912.  
// Renting #27. Requested size: 1073741823. Actual size: 1073741824.

How it works

As said earlier, ArrayPool<T> has 2 implementations. We will consider them separately.

SharedArrayPool

SharedArrayPool has 2 tiers of cache: per-thread and shared caches.

The per-thread cache is implemented as a private static field named t_tlsBuckets, which is essentially an array of arrays. Each thread gets its own instance of this cache due to Thread Local Storage, achieved by applying the ThreadStaticAttribute to the t_tlsBuckets field.
This allows each thread to maintain a small cache for various array sizes, ranging from 2^4 to 2^30 (27 buckets in total).

When we’re trying to get an array from a pool, the algorithm tries to get it from t_tlsBuckets field. If an array of the needed size is not found in t_tlsBuckets, the algorithm checks the shared cache, stored in _buckets. This shared cache is an array of Partitions objects, one for each allowed bucket size (27 buckets in total). Each Partitions object contains an array of N Partition objects, where N is the number of processors. Each Partition works like a stack that can hold up to 32 arrays. Yeah, sounds complicated, so see the diagram below.

SharedArrayPool<T> internals SharedArrayPool<T> internals

When we’re returning the array to the pool, the algorithm first tries to store it in the per-thread cache. If t_tlsBuckets already contains an array for the same size, the existing array from t_tlsBuckets is pushed into the shared cache and the new array is saved in t_tlsBuckets for better performance (CPU cache locality). If the current core’s stack is full, it searches for space in the stacks of other cores. If all stacks are full, the array is dropped.

ConfigurableArrayPool

ConfigurableArrayPool is simpler compared to SharedArrayPool. It has only one private field for storing pooled arrays, called _buckets. This field is an array of Bucket instances, where each Bucket represents a collection of arrays (see the diagram below). Since _buckets field is shared across all threads, each Bucket uses a SpinLock for thread-safe access.

ConfigurableArrayPool<T> internals ConfigurableArrayPool<T> internals

Performance

The ArrayPool<T> benchmarks are similar to the ObjectPool<T> benchmarks:

  • without pooling (creates a new array for each operation);
  • with the shared pool;
  • with the configurable pool.

ArrayPool<T> benchmark results. Percentage difference compared to the baseline without pooling. ArrayPool<T> benchmark results. Percentage difference compared to the baseline without pooling.

As we can see from the results, SharedArrayPool is faster almost in all cases, especially with a multiple threads scenario. The only exception is when the array size is 10.

The opposite situation with a ConfiguratbleArrayPool. This class has worse performance in multithreading scenario for relatively small arrays. I believe the reason is the same as in ObjectPool<T>: thread synchronization latency when accessing arrays inside Bucket instances.

ArrayPool<T> benchmark results. Absolute values. ArrayPool<T> benchmark results. Absolute values.

Conclusion

ObjectPool and ArrayPool can improve performance in scenarios where objects are expensive to create and reuse is possible. However, in multithreaded scenarios, the benefits of pooling are less clear. For small objects, the overhead of synchronization mechanisms can outweigh the performance gains. Developers should carefully benchmark and evaluate pooling in their specific use cases before integrating it into production systems.