Multithreading can be used to drastically speed up the performance of your application, but no speedup is free—managing parallel threads requires careful programming, and without the proper precautions, you can run into race conditions, deadlocks, and even crashes.
What Makes Multithreading Hard?
Unless you tell your program otherwise, all of your code executes on the “Main Thread.” From the entrypoint of your application, it runs through and executes all your functions one after another. This has a limit to performance, since obviously you can only do so much if you have to process everything one at a time. Most modern CPUs have six or more cores with 12 or more threads, so there’s performance left on the table if you’re not utilizing them.
However, it’s not as simple as just “turning on multithreading.” Only specific things (such as loops) can be properly multithreaded, and there’s a lot of considerations to take into account when doing so.
The first and most important issue is race conditions. These often occur during write operations, when one thread is modifying a resource that is shared by multiple threads. This leads to behavior where the output of the program depends on which thread finishes or modifies something first, which can lead to random and unexpected behavior.
These can be very, very simple—for example, maybe you need to keep a running count of something between the loops. The most obvious way to do this is creating a variable and incrementing it, but this isn’t thread safe.
This race condition occurs because it’s not just “adding one to the variable” in an abstract sense; the CPU is loading the value of
number into the register, adding one to that value, and then storing the result as the new value of the variable. It doesn’t know that, in the meantime, another thread was also trying to do exactly the same, and loaded an soon-to-be incorrect value of
number. The two threads conflict, and at the end of the loop,
number may not be equal to 100.
.NET provides a feature to help manage this: the
lock keyword. This doesn’t prevent making changes outright, but it helps manage concurrency by only allowing one thread at a time to obtain the lock. If another thread tries to enter a lock statement while another thread is processing, it will wait for up to 300ms before proceeding.
You’re only able to lock reference types, so a common pattern is creating a lock object beforehand, and using that as a substitute to locking the value type.
However, you may notice that there’s now another problem: deadlocks. This code is a worst case example, but here, it’s almost exactly the same as just doing a regular
for loop (actually a bit slower, since extra threads and locks are extra overhead). Each thread tries to obtain the lock, but only one at a time can have the lock, so only one thread at a time can actually run the code inside the lock. In this case, that’s the entire code of the loop, so the lock statement is removing all the benefit of threading, and just making everything slower.
Generally, you want to lock as needed whenever you need to make writes. However, you’ll want to keep concurrency in mind when choosing what to lock, because reads aren’t always thread safe either. If another thread is writing to the object, reading it from another thread can give an incorrect value, or cause a particular condition to return an improper result.
Luckily, there are a few tricks to doing this properly where you can balance the speed of multithreading while using locks to avoid race conditions.
Use Interlocked For Atomic Operations
For basic operations, using the
lock statement can be overkill. While it’s very useful for locking before complex modifications, it’s too much overhead for something as simple as adding or replacing a value.
Interlocked is a class that wraps some memory operations like addition, replacing, and comparision. The underlying methods are implemented at the CPU level and guaranteed to be atomic, and much faster than the standard
lock statement. You’ll want to use them whenever possible, though they won’t entirely replacing locking.
In the example above, replacing the lock with a call to
Interlocked.Add() will speed up the operation a lot. While this simple example isn’t faster than just not using Interlocked, it’s useful as a part of a larger operation and is still a speedup.
-- operations, which will save you a solid two keystrokes. They literally wrap
Add(ref count, 1) under the hood, so there’s no specific speedup to using them.
You can also use Exchange, a generic method that will set a variable equal to the value passed to it. Though, you should be careful with this one—if you’re setting it to a value you computed using the original value, this isn’t thread safe, since the old value could have been modified before running Interlocked.Exchange.
CompareExchange will check two values for equality, and replace the value if they’re equal.
Use Thread Safe Collections
The default collections in
System.Collections.Generic can be used with multithreading, but they aren’t entirely thread safe. Microsoft provides thread-safe implementations of some collections in
Among these include the
ConcurrentBag, an unordered generic collection, and
ConcurrentDictionary, a thread-safe Dictionary. There’s also concurrent queues and stacks, and
OrderablePartitioner, which can split orderable data sources like Lists into separate partitions for each thread.
Look to Parallelize Loops
Often, the easiest place to multithread is in big, expensive loops. If you can execute multiple options in parallel, you can get a huge speedup in the overall running time.
The best way to handle this is with
System.Threading.Tasks.Parallel. This class provides replacements for
foreach loops that execute the loop bodies on separate threads. It’s simple to use, though requires slightly different syntax:
Obviously, the catch here is that you need to make sure
DoSomething() is thread safe, and doesn’t interfere with any shared variables. However, that isn’t always as easy as just replacing the loop with a parallel loop, and in many cases you must
lock shared objects to make changes.
To alleviate some of the issues with deadlocks,
Parallel.ForEach provide extra features for dealing with state. Basically, not every iteration is going to run on a separate thread—if you have 1000 elements, it’s not going to create 1000 threads; it’s going to make as many threads as your CPU can handle, and run multiple iterations per thread. This means that if you’re computing a total, you don’t need to lock for every iteration. You can simply pass around a subtotal variable, and at the very end, lock the object and make changes once. This drastically reduces the overhead on very large lists.
Let’s take a look at an example. The following code takes a big list of objects, and needs to serialize each one separately to JSON, ending up with a
List<string> of all the objects. JSON serialization is a very slow process, so splitting each element over multiple threads is a big speedup.
There’s a bunch of arguments, and a lot to unpack here:
- The first argument takes an IEnumerable, which defines the data it’s looping over. This is a ForEach loop, but the same concept works for basic For loops.
- The first action initializes the local subtotal variable. This variable will be shared over each iteration of the loop, but only inside the same thread. Other threads will have their own subtotals. Here, we’re initializing it to an empty list. If you were computing a numeric total, you could
- The second action is the main loop body. The first argument is the current element (or the index in a For loop), the second is a ParallelLoopState object that you can use to call
.Break(), and the last is the subtotal variable.
- In this loop, you can operate on the element, and modify the subtotal. The value you return will replace the subtotal for the next loop. In this case, we serialize the element to a string, then add the string to the subtotal, which is a List.
- Finally, the last action takes the subtotal ‘result’ after all the executions have finished, allowing you to lock and modify a resource based on the final total. This action runs once, at the very end, but it still runs on a separate thread, so you will need to lock or use Interlocked methods to modify resources. Here, we call
AddRange()to append the subtotal list to the final list.
One final note—if you’re using the Unity game engine, you’ll want to be careful with multithreading. You can’t call any Unity APIs, or else the game will crash. It’s possible to use it sparingly by doing API operations on the main thread and switching back and forth whenever you need to parallelize something.
Mostly, this applies to operations that interact with the scene or physics engine. Vector3 math is unaffected, and you’re free to use it from a separate thread without issues. You’re also free to modify fields and properties of your own objects, provided that they don’t call any Unity operations under the hood.