Published on
- 2 min read
Fastest way to extract a substring in C#
Today, we’ll dive back into a microbenchmarking and a concise article about performance in C#. Our focus will be on strings and the most effective way for extracting a substring from the original string.
Benchmark
In this benchmark, we’ll consider the following ways of extracting substring:
Substringmethod.Rangeoperator.Splitmethod.ReadOnlySpan<T>Struct.RegexClass;SkipWhilemethod.
For benchmarking, I used the BenchmarkDotNet library. The whole code of the benchmark class can be found here.
Results
As usual, I run the benchmark on both .NET 6 and .NET 7 platforms. The results show minimal variation between the two.
Execution time
The benchmark results
We observe that ReadOnlySpan<T>, Substring and Range operator show fairly similar performance results. Split, Regex and SkipWhile are notably slower, being 2.5, 8.5 and 23.5 times respectively.
| Method | Mean, ns | Percent |
|---|---|---|
ReadOnlySpan<T> | 687.6 | 100 |
Substring | 698.5 | 102 |
Range | 710.5 | 103 |
Split | 1696.3 | 247 |
Regex | 5830.4 | 848 |
SkipWhile | 16211.7 | 2358 |
If we’ll look at decompiled C# code, it becomes apparent that Range operator’s implementation is very similar to the implementation of Substring.
// Range Operator after decompiling
string text = data[num];
int num2 = text.IndexOf(_symbol);
string text2 = text;
int num3 = num2;
list.Add(text2.Substring(num3, text2.Length - num3));
num++;
The only difference is that Substring implementation has fewer local variables.
// Substring after decompiling
string text = data[num];
int startIndex = text.IndexOf(_symbol);
list.Add(text.Substring(startIndex));
num++;
ReadOnlySpan<T> shows better results. It looks like getting memory span and creating a new string from it is slightly faster, than getting substring by string.Substring method. I’m assuming that the reason of that is index bounds checks inside internal implementation of Substring method.
// ReadOnlySpan<T> after decompiling
string obj = data[num];
int start = obj.IndexOf(_symbol);
ReadOnlySpan<char> value = MemoryExtensions.AsSpan(obj, start);
list.Add(new string(value));
num++;
Split is slower because its internal implementation and use of this method to obtain a substring is incorrect.
// Split after decompiling
string text = data[num];
list.Add(text.Split(':')[1]);
num++;
Regex is a good option when you need to get a substring with a more complex pattern rather than a single char. But in this particular case it’s like breaking a butterfly on a wheel.
// Regex after decompiling
string input = data[num];
list.Add(Regex.Match(input, _pattern).Groups[1].Value);
num++;
SkipWhile is super slow because:
- It creates a new delegate
Func<char, bool>. Enumerable.SkipWhilecalls this delegate for each char in the string.Enumerable.ToArrayconvertsIEnumerable<char>tochar[].
// SkipWhile after decompiling
string source = data[num];
list.Add(new string(
Enumerable.ToArray(
Enumerable.SkipWhile(
source,
new Func<char, bool>(<SkipWhile>b__5_0)))));
num++;
Memory
Speaking about memory allocations, ReadOnlySpan<T>, Substring and Range shows the same results. Other implementations require more memory.
| Method | Gen0 | Gen1 | Allocated | Percent |
|---|---|---|---|---|
ReadOnlySpan<T> | 0.3901 | 0.0057 | 4.79 KB | 100 |
Substring | 0.3901 | 0.0057 | 4.79 KB | 100 |
Range | 0.3901 | 0.0057 | 4.79 KB | 100 |
Split | 0.7362 | 0.0114 | 9.03 KB | 188 |
Regex | 1.9150 | 0.0305 | 23.5 KB | 490 |
SkipWhile | 2.2888 | 0.0305 | 28.23 KB | 589 |
Conclusion
The most efficient methods for extracting a substring in C# are ReadOnlySpan<T>, Substring and Range. I favor the Range operator due to its cleaner appearance compared to other implementations. However, it is worth noting that it is 1-3% slower than ReadOnlySpan<T> and Substring.