Hi everyone! Today I want to share with you some .Net 5 performance tips with benchmarking!My system:
- BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.985 (20H2/October2020Update)
- Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
- .NET SDK=5.0.104
I will provide benchmarks results in percentages where 100% is fastest result.
1. StringBuilder for concatenation
As you probably know, strings are immutable. So whenever you concatenate strings, a new string object is allocated, populated with content, and eventually garbage collected. All of that is expensive and that’s why StringBuilder will always have better performance.
Benchmark example:
private static StringBuilder sb = new();
[Benchmark]
public void Concat3() => ExecuteConcat(3);
[Benchmark]
public void Concat5() => ExecuteConcat(5);
[Benchmark]
public void Concat10() => ExecuteConcat(10);
[Benchmark]
public void Concat100() => ExecuteConcat(100);
[Benchmark]
public void Concat1000() => ExecuteConcat(1000);
[Benchmark]
public void Builder3() => ExecuteBuilder(3);
[Benchmark]
public void Builder5() => ExecuteBuilder(5);
[Benchmark]
public void Builder10() => ExecuteBuilder(10);
[Benchmark]
public void Builder100() => ExecuteBuilder(100);
[Benchmark]
public void Builder1000() => ExecuteBuilder(1000);
public void ExecuteConcat(int size)
{
string s = "";
for (int i = 0; i < size; i++)
{
s += "a";
}
}
public void ExecuteBuilder(int size)
{
sb.Clear();
for (int i = 0; i < size; i++)
{
sb.Append("a");
}
}
Results:
- 3 string concatenations - 218% (35.21 ns)
- 3 StringBuilder concatenations - 100% (16.09 ns)
- 5 string concatenations - 277% (66.99 ns)
- 5 StringBuilder concatenations - 100% (24.16 ns)
- 10 string concatenations - 379% (160.69 ns)
- 10 StringBuilder concatenations - 100% (42.37 ns)
- 100 string concatenations - 711% (2,796.63 ns)
- 100 StringBuilder concatenations - 100% (393.12 ns)
- 1000 string concatenations - 3800% (144,100.46 ns)
- 1000 StringBuilder concatenations - 100% (3,812.22 ns)
2. Initial size for dynamic collections
.NET provides a lot of collections like List<T>, Dictionary<T>, and HashSet<T>. All those collections have dynamic size capacity. They automatically expand their size as you add more items.When the collection reaches its size limit, it will allocate a new larger memory buffer (usually an array double in size). That means an additional allocation and deallocation.Benchmark example:
[Benchmark]
public void ListDynamicCapacity()
{
List<int> list = new List<int>();
for (int i = 0; i < Size; i++)
{
list.Add(i);
}
}
[Benchmark]
public void ListPlannedCapacity()
{
List<int> list = new List<int>(Size);
for (int i = 0; i < Size; i++)
{
list.Add(i);
}
}
In the first method, the List collection started with default capacity and expanded in size. In the second benchmark the initial capacity is set to the number of items it’s going to have.For 1000 items the results are:
- List Dynamic Capacity - 140% (2.490 us)
- List Planned Capacity - 100% (1.774 us)
Benchmarks for Dictionary and HashSet:
- Dictionary Dynamic Capacity - 233% (20.314 us)
- Dictionary Planned Capacity - 100% (8.702 us)
- HashSet Dynamic Capacity - 223% (17.004 us)
- HashSet Planned Capacity - 100% (7.624 us)
3. ArrayPool for short-lived large arrays
Allocation of arrays and the inevitable de-allocation can be quite costly. Performing these allocations in high frequency will cause GC pressure and hurt performance. An elegant solution is the System.Buffers.ArrayPool class found in the Systems.Buffers .The idea is pretty similar to to the ThreadPool. A shared buffer for arrays is allocated, which you can reuse without actually allocating and de-allocating memory. The basic usage is by calling ArrayPool<T>.Shared.Rent(size). This returns a regular array, which you can use any way you please. When finished, call ArrayPool<int>.Shared.Return(array) to return the buffer back to the shared pool.Benchmark example:
[Benchmark]
public void RegularArray()
{
int[] array = new int[ArraySize];
}
[Benchmark]
public void SharedArrayPool()
{
var pool = ArrayPool<int>.Shared;
int[] array = pool.Rent(ArraySize);
pool.Return(array);
}
Result for ArraySize = 1000:
- Regular Array - 2270% (440.41 ns)
- Shared ArrayPool - 100% (19.40 ns)
4. Structs instead of Classes
Structs have several benefits when it comes to deallocation:
- When structs are not part of a class, they are allocated on the stack and don’t require garbage collection at all.
- Structs are stored on the heap when they are part of a class (or any reference-type). In that case, they are stored inline and are deallocated when the containing type is deallocated. Inline means the struct’s data is stored as-is. As opposed to a reference type, where a pointer is stored to another location on the heap with the actual data. This is especially meaningful in collections, where a collection of structs is much cheaper to de-allocate because it’s just one buffer of memory.
- Structs take less memory than a reference type because they don’t have an ObjectHeader and a MethodTable.
Decide whether to use struct or not based on .Benchmark example:
class VectorClass
{
public int X { get; set; }
public int Y { get; set; }
}
struct VectorStruct
{
public int X { get; set; }
public int Y { get; set; }
}
private const int ITEMS = 10000;
[Benchmark]
public void WithClass()
{
VectorClass[] vectors = new VectorClass[ITEMS];
for (int i = 0; i < ITEMS; i++)
{
vectors[i] = new VectorClass();
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public void WithStruct()
{
VectorStruct[] vectors = new VectorStruct[ITEMS];
// At this point all the vectors instances are already allocated with default values
for (int i = 0; i < ITEMS; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
Results:
- With Class - 742% (88.83 us)
- With Struct - 100% (11.97 us)
5. StackAlloc for short-lived array allocations
The StackAlloc keyword in C# allows for very fast allocation and deallocation of unmanaged memory. That is, classes won’t work, but primitives, structs, and arrays are supported. Benchmark example:
struct VectorStruct
{
public int X { get; set; }
public int Y { get; set; }
}
[Benchmark]
public void WithNew()
{
VectorStruct[] vectors = new VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public unsafe void WithStackAlloc() // Note that unsafe context is required
{
VectorStruct* vectors = stackalloc VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public void WithStackAllocSpan() // When using Span, no need for unsafe context
{
Span<VectorStruct> vectors = stackalloc VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
Results:
- With New - 303% (10.870 ns)
- With StackAlloc - 102% (3.643 ns)
- With StackAllocSpan - 100% (3.580 ns)
6. ConcurrentQueue<T> instead of ConcurrentBag<T>
Never use ConcurrentBag<T> without benchmarking. This collection has been designed for very specific use-cases (when most of the time an item is dequeued by the thread that enqueued it) and suffers from important performance issues if used otherwise. If in need of a concurrent collection, prefer ConcurrentQueue<T>.Benchmark example:
private static int Size = 1000;
[Benchmark]
public void Bag()
{
ConcurrentBag<int> bag = new();
for (int i = 0; i < Size; i++)
{
bag.Add(i);
}
}
[Benchmark]
public void Queue()
{
ConcurrentQueue<int> bag = new();
for (int i = 0; i < Size; i++)
{
bag.Enqueue(i);
}
}
Results:
- ConcurrentBag - 165% (24.21 us)
- ConcurrentQueue - 100% (14.64 us)
P.S. Thanks for reading! More benchmarking comming soon!
Special thanks to and his ideas.