Fastest way to extract a substring in C#

Today, we’ll dive back into a microbenchmarking and a concise article about performance in C#. Our focus will be on strings and the most effective way for extracting a substring from the original string.

Benchmark

In this benchmark, we’ll consider the following ways of extracting substring:

Substring method.
Range operator.
Split method.
ReadOnlySpan<T> Struct.
Regex Class;
SkipWhile method.

For benchmarking, I used the BenchmarkDotNet library. The whole code of the benchmark class can be found here.

Results

As usual, I run the benchmark on both .NET 6 and .NET 7 platforms. The results show minimal variation between the two.

Execution time

content The benchmark results

We observe that ReadOnlySpan<T>, Substring and Range operator show fairly similar performance results. Split, Regex and SkipWhile are notably slower, being 2.5, 8.5 and 23.5 times respectively.

Method	Mean, ns	Percent
`ReadOnlySpan<T>`	687.6	100
`Substring`	698.5	102
`Range`	710.5	103
`Split`	1696.3	247
`Regex`	5830.4	848
`SkipWhile`	16211.7	2358

If we’ll look at decompiled C# code, it becomes apparent that Range operator’s implementation is very similar to the implementation of Substring.

// Range Operator after decompiling
string text = data[num];
int num2 = text.IndexOf(_symbol);
string text2 = text;
int num3 = num2;
list.Add(text2.Substring(num3, text2.Length - num3));
num++;

The only difference is that Substring implementation has fewer local variables.

// Substring after decompiling
string text = data[num];
int startIndex = text.IndexOf(_symbol);
list.Add(text.Substring(startIndex));
num++;

ReadOnlySpan<T> shows better results. It looks like getting memory span and creating a new string from it is slightly faster, than getting substring by string.Substring method. I’m assuming that the reason of that is index bounds checks inside internal implementation of Substring method.

// ReadOnlySpan<T> after decompiling
string obj = data[num];
int start = obj.IndexOf(_symbol);
ReadOnlySpan<char> value = MemoryExtensions.AsSpan(obj, start);
list.Add(new string(value));
num++;

Split is slower because its internal implementation and use of this method to obtain a substring is incorrect.

// Split after decompiling
string text = data[num];
list.Add(text.Split(':')[1]);
num++;

Regex is a good option when you need to get a substring with a more complex pattern rather than a single char. But in this particular case it’s like breaking a butterfly on a wheel.

// Regex after decompiling
string input = data[num];
list.Add(Regex.Match(input, _pattern).Groups[1].Value);
num++;

SkipWhile is super slow because:

It creates a new delegate Func<char, bool>.
Enumerable.SkipWhile calls this delegate for each char in the string.
Enumerable.ToArray converts IEnumerable<char> to char[].

// SkipWhile after decompiling
string source = data[num];
list.Add(new string(
    Enumerable.ToArray(
        Enumerable.SkipWhile(
            source,
            new Func<char, bool>(<SkipWhile>b__5_0)))));
num++;

Memory

Speaking about memory allocations, ReadOnlySpan<T>, Substring and Range shows the same results. Other implementations require more memory.

Method	Gen0	Gen1	Allocated	Percent
`ReadOnlySpan<T>`	0.3901	0.0057	4.79 KB	100
`Substring`	0.3901	0.0057	4.79 KB	100
`Range`	0.3901	0.0057	4.79 KB	100
`Split`	0.7362	0.0114	9.03 KB	188
`Regex`	1.9150	0.0305	23.5 KB	490
`SkipWhile`	2.2888	0.0305	28.23 KB	589

Conclusion

The most efficient methods for extracting a substring in C# are ReadOnlySpan<T>, Substring and Range. I favor the Range operator due to its cleaner appearance compared to other implementations. However, it is worth noting that it is 1-3% slower than ReadOnlySpan<T> and Substring.