My area of expertise lies in real-time systems, motion capture, and character animation. I have gained experience in this field through my work in the previs and game industries, mainly using Autodesk Motionbuilder and various game engines such as UnrealEngine and Avalanche Studio Internal Engine. I have gained a strong understanding of how to use the C++ programming language and the compiler to accommodate real-time evaluation needs.
Sometime ago, I was hesitant about taking on the challenge of continuing my real-time systems work on the .NET platform. However, I soon realized that success was not so much dependent on the language or platform I was using, but rather on my understanding of limitations and the development of a "best practice" for utilizing the pros and cons of the language or platform to write effective code and achieve my objectives. My best practice includes reading books and articles, learning from the work of others on public repositories, doing my own benchmarks in order to draw conclusions about how to do my work in the best way, and of course, talking to my great colleagues at work to share the experience.
There are two things that I keep in mind during the development process - code readability and budgeting.
Code readability
It is important to achieve a balance between optimization and readability when writing code so that the review process and collaboration with other team members are smooth and efficient. Going too far in optimizing code can increase complexity and reduce readability, thus slowing down the development process.
Budgeting
Another challenge in developing real-time systems is the resources of a PC, especially when doing prototypes or unit tests in a small synthetic environment. It is easy to run a lot of tasks in a frame without noticing the amount of memory and CPU/GPU power being used or to overlook which parts are causing a bottleneck. Even if the unit tests are successful, it does not necessarily mean that the system is correctly configured. It is important to benchmark the memory and CPU usage and be aware of what the budget should be for the system in order to ensure that its execution, memory usage, and threading do not become a bottleneck when combined with other systems under high load.
I am pleased to see that .NET C# optimization is now getting more attention and allowing developers to have greater control over processes and resources. We must abide by .NET Core 3.1 C# 7.2, so I will share my best practice from that perspective with regard to features, memory, and CPU usage.
The main instruments I used to find best practices were:
- books and articles on .NET performance, repositories of high-performance code on .NET
- benchmarks using the BenchmarkDotNet package. The code and benchmark package were able to answer many questions, but I had to take into account the execution platform, as we are using Unity as well, and together with the platform, the results of execution could be different
- I also used profiling, from the use of Stopwatch, and manually prepared counters up to 3rd party .NET profilers, Unity profiler.
As for the Stopwatch, I personally try to avoid the creation of a new instance of Stopwatch class and I use some static methods like GetTimestamp and Frequency where the number of seconds could be calculated as:
double seconds = (Stopwatch.GetTimestamp() - lastTicksPerSecond) / (double)Stopwatch.Frequency;
One of the biggest obstacles to having a good real-time experience is Garbage Collection (GC), which can cause "hitches" in the run-time execution. As such, I believe it is important to have a zero allocation strategy to minimize the impact of GC on performance. I have read some interesting articles that discuss the challenges of making high-loaded real-time systems using .NET C#, which I will share at the end of the article.
So here is my best practice that I would like to share.
-
Avoid using runtime reflection for serialization, either consider using modes of compile-time source generation or using manual low-level primitive calls like BeginObject, WriteValue
-
I would recommend this instead of reflection-based serialization
[Serializable]
public class Item
{
public float a;
public int b;
}
...
Item item = new Item() {a = 1f, b = 5};
string jsonString = JsonSerializer.Serialize(item);
- use something like low-level primitives calls instead
writer.WriteStartObject();
writer.WritePropertyName("a");
writer.WriteValue(item.a);
writer.WriteEndObject();
-
or consider compile-time modes as I mentioned before. I will not show here an example of such usage, but I’ll share a link to the article at the end.
-
Aggressive inlining. It doesn’t mean that the compiler will strictly follow that manual annotation, that is a hint. I’m using that for some small and simple calls which also don’t contain any exception logic
using System.Runtime.CompilerServices;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
-
Consider using simple for loops instead of foreach, enumerable. And just to remind you that I’m writing from the perspective of c# 7.2, in the latest releases of c# the situation could be different, that some improvements in these areas. But in my case, keeping writing code with for loops for some critical real-time parts gives some visible performance numbers.
-
Avoid using Linq, as linq expressions look cool and handy, but the downside of that is a performance drop.
-
Avoid exceptions for the real-time part of code
-
Logging (IO operations) is also expensive for real-time applications, use a simpler logic with counting error-prone cases or making a status variable for the current evaluation. This could then be printed out with less frequency and in batches with some other accumulated log data.
-
And I also would like to mention parallel evaluation in c#. It, of course, refers to async await methods and task asynchronous programming model (TAP). I will not go into details in this article, I would just recommend studying the topic to understand the patterns.
I would like to mention, that for me the simple running thread gives a more straightforward, more readable, and functional solution. I think that async methods and Tasks are powerful instruments, but I would say that they have different roles of rescheduling the flow with some GC overhead, generating state machines behind the scenes, and performing depending on a final configuration of a system. That could even slow down the performance due to many tasks being evaluated without an understanding of the priority of subsystems. In Unity, I still think that Jobs and Burst's Ahead-of-time compilation of them perform the best. So I would only recommend learning the topic really well, as asynchronous execution is very actual for the current hardware we have now.
Memory
Zero-allocation strategy means avoiding massive triggering GC and reference counting during a frame evaluation. There are different strategies to tackle that. Some of them are covered in the articles that I will share at the end. My personal ways to deal with that is to:
-
Allocating on the stack if possible for small local processing arrays and using MemoryPool for bigger arrays. What we consider as small, I would say from what I’ve learned, is about 769 bytes for the overall method stack.
There is also a helper method: RuntimeHelpers.TryEnsureSufficientExecutionStack could ensure that the function has a sufficient stack for a normal .NET function execution
var arr = stackalloc int[8];
-
Memory pool could be used for bigger arrays, but you have to take care to always return arrays back to the pool
ItemData[] myData = ArrayPool<ItemData>.Shared.Rent(size);
ArrayPool<ItemData>.Shared.Return(myData);
-
Avoid allocating zero-length arrays and use empty arrays instead
var emptyIntArray = Array.Empty<int>();
-
Use aligned memory allocations and cast them to structs.
-
handy and efficient way for changing representation into a Span of type
byte[] buffer;
Span<ItemData> span = MemoryMarshal.Cast<byte, ItemData>(buffer);
-
very efficient way for accessing individual element
int offset = index * ItemData.SIZE;
ref ItemData fullKey = ref Unsafe.As<byte, ItemData>(ref buffer[offset]);
-
Reuse memory when possible, one of the examples is to use Span and ReadOnlySpan instead of copying and trimming strings or arrays
string path = "process.this.long.&.string";
var pathSpan = path.AsSpan();
int lastDelim = path.LastIndexOf('&');
if (lastDelim > 0)
{
var subPath = path.AsSpan(0, lastDelim);
// Console.WriteLine(subPath.ToString());
}
-
Strings are expensive to use as keys, consider using Guid or Hashes. And this a simple logic at work. Every symbol of the string is 2 bytes in size, while the whole uint hash code is 4 bytes in size.
Thank you for taking the time to read through my tips and tricks! I hope they are useful to you!
References
Here is a list of references with articles and repositories that can be useful when researching the topics of real-time systems and .NET:
System.Text.Json compile-time source generation modes - //devblogs.microsoft.com/dotnet/try-the-new-system-text-json-source-generator/
about writing zero allocation code
about allocations and high loaded system
example of repositories with performance code
- I think it’s a great work worth it to mention. The physics engine is written in pure c#.