Application performance can be a critical issue for many businesses. After all, server hosting costs directly affect your bottom line, so using a performance profiling tool to debug the code you run can end up saving you money.
What Problems To Look For
A “bottleneck” is any slow section of your app that is slowing down the rest of the otherwise faster code, much like the cap on a water bottle or a narrow road impeding traffic. Any code you write is likely to have bottlenecks somewhere, and whether they’re big or small, you can use performance profiling tools to identify them.
Every program is different, but many applications will suffer from a lot of the same problems:
- Functions called too often (where caching or scheduling would reduce the number of calls)
- IO blocking code, usually synchronous disk access but also excessive memory usage.
- Large loops with costly methods.
- Long startup times, especially in JIT compiled languages.
- Unecessary memory allocations, especially in runtimes with a garbage collector.
- Areas that would benefit from parallelization or asynchronous programming.
You’ll want to keep an eye out for any of these when examining your code with the profiler. Even if your app doesn’t have a serious, obvious bottleneck, any % of improvement helps your app run faster and more efficient, and a few percent speed increase here and there can build up to be a lot over time.
There’s also the possibility that your app is bottlenecked not by the code running on the server, but by its place in your overall network. For example, if you have an API application that connects to a slow database, it doesn’t matter how fast the web server is, as it will always be waiting for slow results. Performance profilers will only help you debug issues in your code, not your overall network architecture.
How Does Profiling Work?
Performance profiling tools differ from debugging tools in a few ways. Debugging tools, like breakpoints and inspection, are used by IDEs for testing and problem solving in development. Profilers usually operate under the assumption that you don’t know what the issue is, and want to profile all of your code to find it. The profiler hooks into your application, and uses a high accuracy timer to track which functions take the longest time. After running for a while, you’ll have enough data on track down what’s causing the problem.
Most profilers will present data in a stack sorted by highest time consumers. A common graph in most profilers is the flame graph—displaying an intuitive breakdown of the entire program’s call history.
The exact tool and method that you use will vary depending on what language or runtime you’re profiling for, and whether you need to profile applications in production environments, but the general idea is the same.
Because each profiler needs to have integrations with the code that’s running, you will need to download a profiler for the language your application uses. Some are easier to use than others, especially for languages like C# and Java where it’s easier to inject into the application than a compiled language.
Many IDEs will also have profiling tools built in on top of the standard debugging toolset, which you can use as well. Visual Studio, for example, can profile performance and memory usage in many apps.
- Java – JProfiler, IDEA/Eclipse/Netbeans IDEs
- Python – cProfiler, Palanteer
- C# – dotTrace, Visual Studio IDE
- C, C++ – Orbit
If you have an idea of what might be taking a while, you can always use a stopwatch timer library to run benchmarks.
For example, Benchmark.NET can run tests on different functions to a very high accuracy, and is commonly used to benchmark different algorithms against each other. You can also use a simple
Stopwatch surrounding the code you want to benchmark.
Using a Performance Profiler
For this guide, we’ll show how to use dotTrace, a performance profiler for .NET applications that is fully featured and has most of the tools found in other profilers. Unless you’re profiling C# code, you’ll likely be using a different application, but the overall process should be similar.
Once you open the app, you’ll be able to connect to running .NET processes, or set up your own run configuration so you can launch the app from dotTrace. Launching the app from the profiler can be especially useful for debugging slow startup times.
Once you run the app, it will begin collecting data. You can run it for as long as you want, just press “Get Snapshot and Wait” to open up the analysis for the collected time period.
Once it’s open, you’ll see a lot of graphs alongside the call stack and call tree, which probably looks unreadable. If you’re seeing a lot of stuff related to threading, locks, and waiting, that’s probably because you need to scope to the “main thread.”
The profiler picks up all threads, which are often used for background tasks that will wait for long periods of time. While these can be evidence of IO blocking issues, it’s a bit more nuanced than the profiler makes it out to be, and it really depends on what the thread is doing.
dotTrace also has a feature for filtering code based on the area of work it comes from, using the “Subsystems” filters on the left. System code, native code, and other laggy areas like reflection, collections, strings, and LINQ, can all be searched for.
In the main window, you’ll find the flame graph. This shows your entire application’s breakdown, starting at “All Calls” and breaking down the time it takes for each level of functions to execute. Some will be unresolved, and some will be too small to show here, but this graph can be zoomed in to any function to view a closer breakdown for that call stack.
Another staple feature of performance profilers is the Call Tree, which shows a nested breakdown of the most active functions, sorted by the time it takes them to execute. Here, dotTrace also shows a percentage, which represents the chunk of overall time that a given function and its children took up.
CPU time spent in functions isn’t always the problem, especially for a language like C# with a garbage collector. dotTrace also tracks memory usage and allocations, and can be used to find what’s putting unecessary pressure on your GC.