Fastest way to extract a substring in C#
#csharp #benchmarkToday, we’ll dive back into a microbenchmarking and a concise article about performance in C#. Our focus will be on strings and the most effective way for extracting a substring from the original string.
Benchmark
In this benchmark, we’ll consider the following ways of extracting substring:
Substring
method.Range
operator.Split
method.ReadOnlySpan<T>
Struct.Regex
Class;SkipWhile
method.
For benchmarking, I used the BenchmarkDotNet library. The whole code of the benchmark class can be found here.
Results
As usual, I run the benchmark on both .NET 6 and .NET 7 platforms. The results show minimal variation between the two.
Execution time
The benchmark results
We observe that ReadOnlySpan<T>
, Substring
and Range
operator show fairly similar performance results. Split
, Regex
and SkipWhile
are notably slower, being 2.5, 8.5 and 23.5 times respectively.
Method | Mean, ns | Percent |
---|---|---|
ReadOnlySpan<T> |
687.6 | 100 |
Substring |
698.5 | 102 |
Range |
710.5 | 103 |
Split |
1696.3 | 247 |
Regex |
5830.4 | 848 |
SkipWhile |
16211.7 | 2358 |
If we’ll look at decompiled C# code, it becomes apparent that Range
operator’s implementation is very similar to the implementation of Substring
.
// Range Operator after decompiling
string text = data[num];
int num2 = text.IndexOf(_symbol);
string text2 = text;
int num3 = num2;
list.Add(text2.Substring(num3, text2.Length - num3));
num++;
The only difference is that Substring
implementation has fewer local variables.
// Substring after decompiling
string text = data[num];
int startIndex = text.IndexOf(_symbol);
list.Add(text.Substring(startIndex));
num++;
ReadOnlySpan<T>
shows better results. It looks like getting memory span and creating a new string from it is slightly faster, than getting substring by string.Substring
method. I’m assuming that the reason of that is index bounds checks inside internal implementation of Substring
method.
// ReadOnlySpan<T> after decompiling
string obj = data[num];
int start = obj.IndexOf(_symbol);
ReadOnlySpan<char> value = MemoryExtensions.AsSpan(obj, start);
list.Add(new string(value));
num++;
Split
is slower because its internal implementation and use of this method to obtain a substring is incorrect.
// Split after decompiling
string text = data[num];
list.Add(text.Split(':')[1]);
num++;
Regex
is a good option when you need to get a substring with a more complex pattern rather than a single char. But in this particular case it’s like breaking a butterfly on a wheel.
// Regex after decompiling
string input = data[num];
list.Add(Regex.Match(input, _pattern).Groups[1].Value);
num++;
SkipWhile
is super slow because:
- It creates a new delegate
Func<char, bool>
. Enumerable.SkipWhile
calls this delegate for each char in the string.Enumerable.ToArray
convertsIEnumerable<char>
tochar[]
.
// SkipWhile after decompiling
string source = data[num];
list.Add(new string(
Enumerable.ToArray(
Enumerable.SkipWhile(
source,
new Func<char, bool>(<SkipWhile>b__5_0)))));
num++;
Memory
Speaking about memory allocations, ReadOnlySpan<T>
, Substring
and Range
shows the same results. Other implementations require more memory.
Method | Gen0 | Gen1 | Allocated | Percent |
---|---|---|---|---|
ReadOnlySpan<T> |
0.3901 | 0.0057 | 4.79 KB | 100 |
Substring |
0.3901 | 0.0057 | 4.79 KB | 100 |
Range |
0.3901 | 0.0057 | 4.79 KB | 100 |
Split |
0.7362 | 0.0114 | 9.03 KB | 188 |
Regex |
1.9150 | 0.0305 | 23.5 KB | 490 |
SkipWhile |
2.2888 | 0.0305 | 28.23 KB | 589 |
Conclusion
The most efficient methods for extracting a substring in C# are ReadOnlySpan<T>
, Substring
and Range
. I favor the Range
operator due to its cleaner appearance compared to other implementations. However, it is worth noting that it is 1-3% slower than ReadOnlySpan<T>
and Substring
.