Today, we’ll dive back into a microbenchmarking and a concise article about performance in C#. Our focus will be on strings and the most effective way for extracting a substring from the original string.

Benchmark

In this benchmark, we’ll consider the following ways of extracting substring:

  1. Substring method.
  2. Range operator.
  3. Split method.
  4. ReadOnlySpan<T> Struct.
  5. Regex Class;
  6. SkipWhile method.

For benchmarking, I used the BenchmarkDotNet library. The whole code of the benchmark class can be found here.

Results

As usual, I run the benchmark on both .NET 6 and .NET 7 platforms. The results show minimal variation between the two.

Execution time

content The benchmark results

We observe that ReadOnlySpan<T>, Substring and Range operator show fairly similar performance results. Split, Regex and SkipWhile are notably slower, being 2.5, 8.5 and 23.5 times respectively.

Method Mean, ns Percent
ReadOnlySpan<T> 687.6 100
Substring 698.5 102
Range 710.5 103
Split 1696.3 247
Regex 5830.4 848
SkipWhile 16211.7 2358

If we’ll look at decompiled C# code, it becomes apparent that Range operator’s implementation is very similar to the implementation of Substring.

// Range Operator after decompiling
string text = data[num];
int num2 = text.IndexOf(_symbol);
string text2 = text;
int num3 = num2;
list.Add(text2.Substring(num3, text2.Length - num3));
num++;

The only difference is that Substring implementation has fewer local variables.

// Substring after decompiling
string text = data[num];
int startIndex = text.IndexOf(_symbol);
list.Add(text.Substring(startIndex));
num++;

ReadOnlySpan<T> shows better results. It looks like getting memory span and creating a new string from it is slightly faster, than getting substring by string.Substring method. I’m assuming that the reason of that is index bounds checks inside internal implementation of Substring method.

// ReadOnlySpan<T> after decompiling
string obj = data[num];
int start = obj.IndexOf(_symbol);
ReadOnlySpan<char> value = MemoryExtensions.AsSpan(obj, start);
list.Add(new string(value));
num++;

Split is slower because its internal implementation and use of this method to obtain a substring is incorrect.

// Split after decompiling
string text = data[num];
list.Add(text.Split(':')[1]);
num++;

Regex is a good option when you need to get a substring with a more complex pattern rather than a single char. But in this particular case it’s like breaking a butterfly on a wheel.

// Regex after decompiling
string input = data[num];
list.Add(Regex.Match(input, _pattern).Groups[1].Value);
num++;

SkipWhile is super slow because:

  1. It creates a new delegate Func<char, bool>.
  2. Enumerable.SkipWhile calls this delegate for each char in the string.
  3. Enumerable.ToArray converts IEnumerable<char> to char[].
// SkipWhile after decompiling
string source = data[num];
list.Add(new string(
    Enumerable.ToArray(
        Enumerable.SkipWhile(
            source,
            new Func<char, bool>(<SkipWhile>b__5_0)))));
num++;

Memory

Speaking about memory allocations, ReadOnlySpan<T>, Substring and Range shows the same results. Other implementations require more memory.

Method Gen0 Gen1 Allocated Percent
ReadOnlySpan<T> 0.3901 0.0057 4.79 KB 100
Substring 0.3901 0.0057 4.79 KB 100
Range 0.3901 0.0057 4.79 KB 100
Split 0.7362 0.0114 9.03 KB 188
Regex 1.9150 0.0305 23.5 KB 490
SkipWhile 2.2888 0.0305 28.23 KB 589

Conclusion

The most efficient methods for extracting a substring in C# are ReadOnlySpan<T>, Substring and Range. I favor the Range operator due to its cleaner appearance compared to other implementations. However, it is worth noting that it is 1-3% slower than ReadOnlySpan<T> and Substring.