Search Unity

Making string marshaling safe for the CoreCLR garbage collector

February 14, 2023 in Engine & platform | 10 min. read
Making AnimationEvent safe for the CoreCLR garbage collector | Hero image
Making AnimationEvent safe for the CoreCLR garbage collector | Hero image
Share

Is this article helpful for you?

Thank you for your feedback!

As mentioned in my previous blog, my team and I are working to bring you the latest .NET technology. This involves making existing Unity code work with the .NET CoreCLR JIT runtime from Microsoft, which includes a more advanced and efficient garbage collector (GC).

In this blog, I’ll help you understand some of the recent changes my team made to marshal string data across the managed/native boundary in a GC-safe way. For a bit of background information about why we’re doing this, check out my earlier blog about AnimationEvent marshaling.

Managed to native (and back again) – part 2

A quick recap: Unity’s engine code is written in both C# (managed) code and C++ (native) code. Crossing the boundary from managed to native code can be tricky and expensive, but it provides opportunities to deliver solutions with top-notch performance.

Take, for example, the C# String type. It’s just an array of characters, so it should be pretty simple, right? Not quite: String marshaling has lots of hidden and interesting complexities.

The implementation of Unity’s tooltip property calls into native code to get or set the proper data:

public static string tooltip { get { return Internal_GetTooltip();} set { Internal_SetTooltip(value); } }

Then, Internal_GetTooltip and Internal_SetTooltip are declared in C++ code like this:

ScriptingStringPtr Internal_GetTooltip(); void Internal_SetTooltip(const core::string& value);

Here, ScriptingStringPtr is a pointer to the string object, which is memory managed by the .NET garbage collector, and core::string is Unity’s internal representation of strings in C++ (similar to std::string). Bonus points if you’ve already spotted a few problems!

My string might not be your string

As strange as it sounds, modern programming languages have many different representations for strings. In C#, a string is represented by a 32-bit integer that stores the number of characters in the string, followed by an array of two-byte UTF16-encoded characters. Unity’s core::string in C++ uses a machine-sized integer for the number characters, but stores those characters in an array of one-byte, UTF8-encoded values.

These different representations mean strings are not blittable, so we need to do some marshaling to get the data back and forth across this managed/native boundary. This marshaling involves allocation of a properly sized buffer of data and conversion of the character information, ensuring that values from all possible locales work properly.

Normally, C# developers can ask the .NET runtime to handle these details, using its built-in p/invoke marshaling. Unity, though, has a custom marshaling tool that we call the bindings generator.

The ties that bind

In order to use this custom marshaling tool, Unity’s function calls from managed to native mode employ a special feature of the .NET runtime in Unity’s fork of CoreCLR: the internal call (or icall, for short). Although not recommended for general use, an icall is a call on a function pointer with no marshaling of return values or function arguments. The native side of the icall must have intimate knowledge of the layout and location of all arguments from the managed side of the function call. While icalls are inherently unsafe, they give us the ability to squeeze out more performance than we would have with default p/invoke marshaling.

The bindings generator parses the Unity C# code and looks for extern methods, which are implemented as icalls. For each icall, it generates both .NET Intermediate Language (IL) code, which is emitted directly into managed assemblies, and C++ code, which is emitted into files later compiled by the Unity build process. Unity has about 10,000 different icalls in its codebase, so the bindings generator is a valuable tool to automate this process and provide hooks for optimization.

Let’s make tooltip better

If you missed the problems with the definition of tooltip above, don’t worry – they’re pretty subtle. First, that raw pointer to GC-managed memory (ScriptingStringPtr) won’t work with the moving CoreCLR GC. Second, Internal_SetTooltip accepts a UTF8 string, but C# will use a UTF16 string. Can we avoid a conversion there to improve performance? Let’s see.

Avoiding unnecessary work

It turns out that core::string isn’t the only way to represent a string in Unity’s native code. Unity also has a type named UTF16String, which is a 32-bit integer indicating the number of characters and an array of two-byte UTF16-encoded values (just like C#’s string representation).

When you set a tooltip via the public C# API mention above, the native implementation looks like this:

static void Internal_SetTooltip(const core::string& value) { UTF16String str(value.c_str()); GUIState &cState = GetGUIState(); cState.m_OnGUIState.SetMouseTooltip(str); cState.m_OnGUIState.SetKeyTooltip(str); }

Internally, that UTF8 string in the core::string is copied back to a UTF16 representation. Instead, we can change the signature of the method to accept a UTF16String directly.

void Internal_SetTooltip(const UTF16String& str);

The bindings generator creates some Microsoft Intermediate Language (MSIL) code which treats the C# string as a ReadOnlySpan<char> and temporarily pins that memory so the GC does not move it during the call. This makes use of its existing support for marshaling of the Span type, already built into the bindings generator. Finally, some generated C++ code causes that Span to be used as a UTF16String.

There is one important special case we need to handle – a Span is either empty or not. A string, however, might be null or non-null, but empty. So, we’ve created a special ManagedSpanWrapper type to handle a null Span. The generated MSIL code looks like this in C#:

private unsafe static void Internal_SetTooltip(string value) { ManagedSpanWrapper managedSpanWrapper; if (!StringMarshaller.TryMarshalEmptyOrNullString(value, ref managedSpanWrapper)) { fixed (char* begin = value.AsSpan()){ managedSpanWrapper = new ManagedSpanWrapper(begin, value.Length); Internal_SetTooltip_Injected(in managedSpanWrapper); } } else { Internal_SetTooltip_Injected(in managedSpanWrapper); } }

Let the GC handle things

We mentioned before that the CoreCLR GC offers excellent performance at the cost of some restrictions on native code. Specifically, we can’t access GC-managed memory directly in native code any more. Since strings are GC-managed in C#, we cannot return a raw pointer from Internal_GetTooltip. We saw above that, internally, Unity is already representing that tooltip value as a UTF16 string. Since that lines up well with the C# representation of a string, change the signature to the following:

UTF16String Internal_GetTooltip();

Now, our bindings generator can handle letting the GC manage the string memory on the C# side, without requiring any special code to pause the GC or pin the memory, giving the GC full control.

Make it so

This is just one flavor of the string parameter marshaling that is done often in the Unity code base. Beyond just strings, the bindings generator must handle all of the types that can be passed from managed to native code. But it also provides a great tool to leverage the work of a small team across the large Unity codebase.

We can make sweeping changes like this to allow Unity to be safe for the CoreCLR GC, while also finding performance improvements.

Performance or safety? Choose both.

The CoreCLR runtime and GC bring the promise of increased performance across the board. Microsoft has been investing heavily in these for .NET, and we’re excited to bring these improvements to Unity users.

We expect to deliver the full performance of modern .NET applications while maintaining safety and stability in your existing code. The team will continue to apply the techniques learned during this investigation toward other managed/native boundary transitions in Unity engine code.

For more tips tied to string marshaling and CoreCLR, visit us in the forums. Or, feel free to connect with me directly on Twitter or Mastodon. And be sure to watch for new technical blogs from other Unity developers as part of the ongoing Tech from the Trenches series.

February 14, 2023 in Engine & platform | 10 min. read

Is this article helpful for you?

Thank you for your feedback!

Related Posts