Our first
ANTS Memory Profiler 5 early access build is now available, and although it really is a very, very, early build, it’s still fantastic for finding memory leaks, as I discovered when I set it loose on the next release of
Exception Hunter in development. In this post I’m going to take you through the steps needed to track down memory leaks with ANTS Memory Profiler 5.
For anyone who doesn’t want to be bored with the details, download information is available here:
http://www.red-gate.com/MessageBoard/viewforum.php?f=92
(We’ll post new announcements in the forums as new builds become available, along with information about exactly what’s included.)
So why do
you care? What’s a memory profiler for exactly?
As I said above, ANTS Memory Profiler 5 can help you find memory leaks. In .NET, memory leaks occur if you hold on to references to objects you’re no longer using, because this prevents the garbage collector from cleaning them up. ANTS Memory Profiler helps you find these leaked objects. Once you’ve found them, you can make code changes to fix the problem, which is great because your application will become more stable, more reliable, less of a resource hog, and might even run faster.
So, if you’re experiencing memory problems, download this build and follow the guide below to see how to use it to track down problems. There’s really nothing to lose.
If you don’t know much about .NET memory management, I’d recommend a bit of background reading: I’ve collected some suggested reading in the
references section at the end of this post. From here on in I’m going to assume you know about the garbage collector, and what a GC root is. If not you should be able to get away with taking 10 minutes out to read the first article I’ve suggested.
Walkthrough: Using ANTS Memory Profiler 5 to track down a memory leak in Exception Hunter
We’ve been working on the next version of Exception Hunter, in a low key fashion, for some months now, but the new functionality we’ve added has given us a real headache with memory usage. I could have used some toy application for this walkthrough, but I thought something real would provide a more interesting example, albeit at the slight risk of a few people pointing and laughing at us. I’ve included plenty of screenshots so you can easily see what's happening. In most cases you can click on these to see a larger view.
There are several problem areas, but I’m going to focus on some deviant behaviour in the assemblies list: when I start off with an empty list, adding a load of assemblies and then immediately removing them leads to Exception Hunter using a lot more memory than it did to start off with. This is exactly the kind of situation where a memory profiler can really help.
On opening up the memory profiler I’m presented with a setup dialog which will look familiar to anyone who’s used ANTS Performance Profiler 4 already (
fig. 1). All I need to do is point it at Exception Hunter and click
Start Profiling. The profiler starts up Exception Hunter and begins collecting performance counter data (
fig. 2).
Figure 1. The ANTS Memory Profiler 5 setup dialog—no surprises here I hope.
Figure 2. Whilst profiling, ANTS Memory Profiler collects performance counter data in the same way as the performance profiler. Here we see Exception Hunter’s memory usage increasing as it starts up; all OK so far.
There are some differences between this and the performance profiler, notably the addition of a
Take memory snapshot button, and the absence of the events on the timeline. We’ve removed the latter because, to minimise the impact of the profiler on the profiled application, we've avoided instrumenting the JITed code as much as possible. Performance is something that’s really important to us, and has been highlighted by users of previous versions of the profiler as being a major pain point: you wanted it to be faster, so we’ve made this new version
much faster. In fact, the overhead added by the new version is hardly noticeable outside of taking a snapshot.
Taking and comparing memory snapshots is a key activity when looking for memory leaks, so my approach will be as follows:
- Wait for Exception Hunter to open and finish initialising, which involves it pre-loading some framework assemblies.
- Take my first snapshot with an empty assemblies list.
- Add the Red Gate.ExceptionHunter.Logic.dll assembly for analysis.
- Wait for the assemblies list to finish populating.
- Remove all assemblies, giving me an empty list again.
- Take my second snapshot.
- Examine the comparison ANTS Memory Profiler shows me after it has finished taking and analyzing the second snapshot.
Figure 3 shows an empty Exception Hunter waiting for me to add some assemblies, and
figure 4 shows the state of play in ANTS Memory Profiler after I’ve taken my first snapshot. The snapshot contains information about all objects on the managed heap, which generations they live in, whether or not they’re on the large object heap, and all the references linking them together. For the most part this does not include information about objects eligible for garbage collection because we force a full collection before taking the snapshot, however the process of garbage collection can itself create some garbage, so there may be a few unreachable objects in the snapshot. Eventually snapshots will also be able to hold member information to make finding leaks even easier. When you use the profiler you’ll notice your application pauses whilst the snapshot is being taken: this is because we need to ensure the managed heap remains in a consistent state whilst we collect the data.
Figure 3. Exception Hunter showing empty assemblies list.
Figure 4. Our first snapshot. Exception Hunter is using about 130MB.
As you can see, once ANTS Memory Profiler has finished processing the snapshot it shows you memory usage information about all the types with instances on the heap at the time the snapshot was taken. In future builds it’s likely to display a higher level summary of the snapshot instead.
Now I’m ready to add my assembly to Exception Hunter. This is shown in
figure 5.
Figure 6 shows the memory usage increasing to over 300MB as Exception Hunter loads the assembly and its dependencies.
Figure 7 shows Exception Hunter after it has finished doing all this.
Figure 5. Adding an assembly in Exception Hunter.
Figure 6. As Exception Hunter loads the assembly, and its dependencies, the memory usage climbs to over 300MB.
Figure 7. Populated assembly list in Exception Hunter.
You can see that a lot of dependencies have been loaded, so it’s hardly surprising that the memory usage is so high, especially when you consider the depth of analysis Exception Hunter has to perform in order to accurately determine all the exceptions that could be thrown by a method.
So that’s all well and good, but look at what happens when I click
File > Remove All Assemblies (
fig. 8).
That’s rather alarming, and it doesn’t take a genius to work out that something is very wrong indeed! I’ve ended up using even more memory than when the assemblies list was populated. Time to take a second snapshot I think: check out the results in
figure 9.
Figure 8. Whoa there, Big Momma! Exception Hunter is now so fat it has several other smaller fat applications trapped in orbit around it! You can see that although I’ve removed all the assemblies, overall memory usage has increased further to around 360MB. What the …?
Figure 9. ANTS Memory Profiler is showing me a comparison of memory usage for all classes after I’ve taken a second snapshot.
Taking the second snapshot and comparing it to the first involves collecting and collating a huge amount of data, but notice how the new version of the memory profiler doesn’t add any additional overhead to Exception Hunter’s memory usage whilst it’s doing this.
This behaviour is in marked contrast to the version 4 memory profiler which drastically increased the memory usage of the target application. ANTS Memory Profiler 5
does impose some memory overhead on the target application, but this is mainly taken up by a fixed size block used for memory mapping (approximately 20MB). The amount of memory required to track disposable objects varies depending upon how many of your objects are disposable, but since these are usually a relatively small proportion of the whole you shouldn’t notice any adverse effects from this. In most cases the amount of overhead is likely to remain pretty much constant regardless of how much memory your application uses.
Back to my snapshot comparison...
This being an EA build, there’s a bit of tweaking needed, and one of the things we need to change is the default sort column on this list: I’m going to sort by
Live Size in descending order (
fig.10).
Now I’m in a position to pick a type to investigate in more detail. Most of the memory is being used up by byte arrays, and then by strings. I could probably have picked either but at the time I did this I was just poking around so I started off with strings, more out of curiosity than anything else. Regardless, starting with something that uses up a lot of the memory, or has a large size or instance count diff is a good strategy, and strings certainly fit that bill here. Picking a low level type that’s used as a building block for other types is also a good plan, given the choice.
Figure 10. I’ve sorted by Live Size (descending), and I’m going to take a look at strings in more detail. Byte arrays would also be a decent starting point. Note the large values for both instance and size diffs.
Figure 11. A list of strings in the second snapshot. None of them are big enough for that in itself to be a concern.
Figure 11 shows the list of strings that appeared after I clicked the button to view the object list. Looking at the way the
Size with children bars are normalised I can quickly see that the size of individual strings is not in itself the problem, so I need to look for something else. We’ll be adding some powerful filters in future builds to make this easier, but for now a good strategy is to look for objects that are a long way from a GC root.
Distance from GC root is effectively the smallest number of objects between the object of interest and its nearest GC root: it’s actually the number of references, minus one, because the reference from the GC root itself isn’t counted. One of the interesting properties that leaked objects have is that they’re often a long way from a GC root because all the “obvious” reference paths have been broken, so this is why I’m interested in objects for which this value is large. Note that there are some data structures where this might not hold true but, in many cases, it’s still a useful heuristic.
Looking at objects that are new since the previous snapshot can also work well but, due to string interning, I’m suspicious that there may also be some older strings hanging around when they shouldn’t be. I’m therefore going to sort by
Distance from GC root column in descending order, and look at objects at the top of the list. I also sorted by the
New column, but found that the objects furthest from a GC root were actually in the previous snapshot (
fig. 12).
Figure 13 shows the object reference graph I created for the string at the top of this list. This graph shows the shortest path to every GC root from which this object is reachable. Again, this is somewhere we’ll be applying filtering in the future, but even at this very early stage, it’s pretty easy to deal with.
Figure 12. Taking a look at strings that are a long way from a GC root: objects that are a long way from a GC root are more likely to be leaked.
Figure 13. An object reference graph showing everything that references my string. Looks like I’ve picked one of the popular kids in class. I think I’m on to something here.
The object you’ve created the graph for will always appear at the bottom (circled), so all we need to do is zoom in there to find out what’s happening. Most of the roots shown on this graph are weak references, which the filtering we’ll be adding later will take care of. All we need to worry about is breaking the strong reference chains to leave the garbage collector free to collect the objects that are no longer strongly reachable.
Figures 14 and
15 show progressively zoomed in views of the object graph, with our string highlighted. You can see that it sits right at the end of a single long chain of references, with a lot of other stuff coming in from the top. We’ll take a look at this chain in more detail.
Figure 14. The object you create the graph for is always right at the bottom, so it’s easy to find. That long chain of references looks promising.
Figure 15. Zoomed in view showing my string at the bottom. Note the long reference chain. This looks really dodgy to me. Looking back up the chain a bit further should tell me whether or not I’m right.
Don’t worry too much about the pale blue boxes for now: these represent “strongly connected” objects, which generally means there’s some circular referencing going on, although we don’t render circular references explicitly. Note also we only show the shortest paths to root because otherwise the graph would take forever to calculate, and would be far too complicated to interpret if we did display it. Also note that you can’t start expanding things willy-nilly: the sole purpose of this graph is to show you reference paths to GC roots so that you can break them and thus enable the garbage collector to clean up your previously leaked objects.
If we look back up the chain of objects we eventually come to a type called
NetTypeRepository+NetAssembly. This seems suspicious given that we removed all the assemblies from the assemblies list, and indeed
figure 16 confirms our suspicions. The DevExpress grid is holding on to references to our
DotNetAssembly type, even though all the data has been removed from the grid, and its data model is empty. Busted! ... Or so it would seem.
Just because I like to be dead sure about what’s happening I had a bit more of a look around to confirm that it really was the grid used as part of the assemblies list (fig. 17).
Figure 16. My suspicions are confirmed. The grid is holding on to references to the objects representing the assemblies. Naughty, naughty.
Figure 17. Just being a bit anal retentive here, but this does show that it’s definitely the assemblies grid holding on to the references. It’s unlikely to have been anything else, but there’s no harm in checking to be sure.
Figure 18. The grid's empty so what on earth is that GridRowInfoCollection doing referencing through to our assemblies?
In
figure 16 we have this reference chain:
GridView > GridViewInfo > GridRowInfoCollection > etc. I’m really curious as to what that GridRowInfoCollection, highlighted in
figure 18, is because the grid is empty, so we need to look at that.
Right now ANTS Memory Profiler doesn’t store member information so you can’t see which member variable is referencing a particular object. Don’t worry though, we’re going to add this to a later build. For now I’m just going to use
.NET Reflector to find out where this might be referenced.
Figure 19 shows the two possible members I identified in
GridViewInfo that might be referencing this object. The most likely culprit seems to be
cachedRows because, after all, the grid
is empty.
Figure 19. .NET Reflector’s analyzer showing the members I suspect. Since the grid is empty cachedRows seems most likely to me.
So I need to figure out whether there’s any way I can access and clear this collection. I do some googling and open up my Exception Hunter solution in Visual Studio.
Figure 20 shows the rather embarrassing conclusion of all this activity: yep, looks like I goofed. Not the first time, certainly won’t be the last. Still, as figure 21 shows, it’s easily fixed.
Figure 20. Looking at the control in Visual Studio and... Argh! Catastrophic pwn@g3 ricochet! ... Maybe I blamed the grid too soon because it looks like I’ve forgotten to tweak the option to switch off row caching. Oops.
Figure 21. One of my more easily fixed mistakes. Just disable row caching.
After this I rebuild, re-run, and it’s all good, right? Unfortunately no, not so much actually. I found that the same thing was still happening, despite my fix, so it looks like it might be a bug in the grid itself, or at least in the version we’re using. As a result I needed to get a bit more serious about things.
We tend to use a lot of interfaces in our code, which can feel like overkill, but in this case it
really paid dividends. The
DotNetAssembly class implements an interface called
IPortableExecutable, which is the only thing the UI code sees. What this meant was that in the UI assembly I wrote an implementation of this interface that simply wrapped a
WeakReference pointing to the real implementation. In some ways it’s a bit mucky, but it’s quite neat in that when the session is dumped, which it is when all assemblies are removed, the garbage collector can still collect the
DotNetAssembly instances because they’re only weakly referenced by the grid.
Figure 22 illustrates this.
Figure 22. You know it’s getting serious when I resort to drawing UML diagrams in Visio. This shows my final solution to the problem using a proxy object holding a weak reference to the real implementation.
The observant amongst you will realise that I’m still leaking
WeakReferencedAssembly instances, but since these are tiny, it doesn’t really matter in the grand scheme of things… although yes, I will admit, it still chafes a little.
Anyway, did it actually work? See for yourself in
figure 23.
Figure 23. Oh yeah! I have exorcised the daemon!! This house is kaaaaaleeeeeeeeaaaaaah!! … Er, sorry, what I mean is that using the weak reference wrapper as a proxy object appears to have fixed the problem.
See how the memory usage drops away once all the assemblies have been removed from the grid, as we’d expect it to. That’s great, because we were leaking a whopping 240MB. We haven’t completely solved the problem, because there’s still about 20MB unaccounted for, but still, a massive improvement, and actually a really easy win. What I did wasn’t tricky, and the memory profiler made it really easy to find the problem. Don’t worry though, I’ll be investigating further to track down that last 20MB!
Hopefully that’s convinced you: if you’re having memory usage problems with your application, download ANTS Memory Profiler 5 EA now, and get digging:
http://www.red-gate.com/MessageBoard/viewforum.php?f=92
We’d love to hear your feedback as well, so please post in the forum, or drop an email to
memory_profiler_eap@red-gate.com.
Happy hunting!
Useful Memory Management References
Some good overviews about .NET memory management:
Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework -
http://msdn.microsoft.com/en-us/magazine/bb985010.aspx
If you read nothing else, read this!.
Memory Management in .NET -
http://www.c-sharpcorner.com/UploadFile/tkagarwal/MemoryManagementInNet11232005064832AM/MemoryManagementInNet.aspx
CLR Inside Out: Large Object Heap Uncovered -
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
This is specifically targeted at the large object heap, but actually contains a pretty good overview of how memory management and garbage collection work in general.
More detailed information about garbage collection can be found in
Using GC efficiently, Parts 1 – 4:
http://blogs.msdn.com/maoni/archive/2004/06/15/156626.aspx
http://blogs.msdn.com/maoni/archive/2004/09/25/234273.aspx
http://blogs.msdn.com/maoni/archive/2004/12/19/327149.aspx
http://blogs.msdn.com/maoni/archive/2005/05/06/415296.aspx
The above is a great series of articles, and although somewhat old, they give a good idea of how things work on CLR 2.0, which is what most of us are targeting. I’ve sourced them from Maoni’s blog, which I’d heartily recommend as a great source of information:
http://blogs.msdn.com/maoni/default.aspx.