Most .NET developers don’t have a very rigorous understanding of how the .NET framework actually manages memory, let alone how unmanaged memory interacts with our components and libraries.
For the most part that’s not a problem, but as soon as we start introducing third party libraries and components into our applications, we start to encounter problems. Those problems might be down to the fact that the third party code requires large amounts of memory to operate and we don’t know how to handle it, or it might be because our own application isn’t designed to properly interact with third-party code. It might even be just because the third-party library is just flat-out leaking memory, but there also ways of dealing with that seemingly intractable problem. At Red Gate, we’ve found that plenty of .NET developers can recognise when they’ve got a memory problem but significantly fewer are confident about actually diagnosing those problems, and are apt to point the finger at someone else’ code when the problem is actually at the interface.
To provide some advice on how to diagnose and recognise when memory problems are in the application vs. the library vs. the interaction between the two, we invited the developers from Aspose – who are actually building third party libraries – to share what they’ve learned on the subject. For a little bit of background, Aspose are “file format experts”. They provide a powerful set of file management components which allow developers to create and manipulate the majority of popular business file formats within their applications. Naturally the list includes Microsoft Excel spreadsheets, Word documents, PowerPoint presentations, and PDFs.
Given that they also produce components for Java, Android and SharePoint, Cloud APIs, as well as rendering extensions for SSRS and JasperReports exporters, I think it’s safe to say that the Aspose developers probably have some experience with third-party components. Here’s what they had to say on the subject:
Why is Third-Party Code so Tricky?
To start with, let’s talk about why memory management can be a particularly painful problem for .NET developers when working with third-party code – either in the form of visible libraries or completely black-box components, and ideally look at some common .NET memory problems when working with third-party components.
The Joys of Garbage Collection
Need a simple intro to Memory Management? Larry Gonick’s comic wallchart might do the trick.
One of the core features of the Microsoft .NET Framework is that software that is developed for it makes use of the garbage collector, which enables automatic memory management. This is both a great advantage, and the source of great challenges! The garbage collector hides a large portion of memory management that otherwise would have required the programmer to explicitly take care of. Therefore, with the help of a memory manager like the garbage collector in .NET, we need not worry about common tasks that existed in earlier languages like C and C++, such as de-allocating memory blocks and disposing of objects that no longer had a reference in the application.
The garbage collector does this by detecting when an object in memory will no longer be used or accessible, and then frees the particular segment of memory which was used by that object. Software created in C and C++ programs have traditionally been prone to memory leaks because developers had to manually allocate and free memory, and we mere humans are sloppy and forgetful. On the other hand, in Microsoft .NET the framework automatically allocates available memory for new objects, and garbage collection automatically “reclaims” any unused memory. This makes memory usage safer and more efficient, and makes it much faster to write code that runs on the .NET framework – however, this doesn’t mean the software created won’t suffer from memory issues (or that the garbage collector is a perfect system, but that’s another article).
Black Box Problems in the Real World
When dealing with a third-party library you normally are using a “black box”, unless you happen to be using open source software or have access to the code in some way. This means that, if there is a memory problem, you can often narrow down and identify what part of the overall code is causing it (and therefore which library is the likely culprit), but addressing the actual issue is harder. To start with, sometimes you must take into account that what looks to be a memory leak in a third-party library is not actually what it appears – memory that is retained after the primary operation has finished can often be held on purpose to improve the overall performance of the library and to ensure a smooth user experience.
For example, let’s take a look at Aspose.Words – it’s an impressive .NET library which offers a rich API to work with Word documents, manipulating them and converting from one format to another, and a good example of a library which manages its memory well. If you use Aspose.Words to render a document to PDF, you will load the file into a class called a Document,which then allows you to freely modify the document and do conversion via the Document object model. If you convert the document to a fixed page format such as PDF just once and use a memory profiler (such as Red gate’s very own ANTS Memory Profiler), you’ll see that most of the memory used was released when the Document instance goes out of scope, but not all memory will be released – this is to be expected.
When you convert another document to PDF and run your memory profiler, you might see that some more memory has not been released, and so on. The memory that is not released by Aspose.Words might look like a leak, but it is not a leak – Aspose.Words is actually caching some information in static fields. When designing Aspose.Words, the developers carefully selected what information should be cached to speed things up while still only having a minimal impact on their users’ systems, and that’s what you would be detecting in this instance.
For example, converting to PDF requires layout into pages, which requires measurements for each character in each font used in a document – this in turn requires finding and reading a true type font file, parsing it, and then extracting character width and other layout information. If Aspose.Words did this every time for every document converted, it would be a waste of CPU time. Likewise, if multiple documents are converted at the same time, it would also require more memory because each instance in conversion would have its own copies of fonts loaded.
So, perhaps a little ironically, Aspose.Words keeps overall resource demands down by caching character width (font data) information and reusing it over and over. The number of fonts used in any given document and the number of fonts installed on any given system are both finite (and actually quite limited in the former case), so the memory will never be all used up, and so this behaviour is not a leak (although it looks like one at first appraisal.) This is an example of how a good library, despite thoughtful design and purposeful behaviour, can still look as if it’s leaking memory.
This is a default behaviour of Aspose.Words, which suits most of their customers due to the increased performance it allows for a minimal resource trade-off. However (and this is by no means the case for all third party components), they acknowledge that some users will be very short on memory. To alleviate memory pressure for those users, they very consciously introduced an API that allows more control over the font cache, such as clearing it on demand.
While we’ve used Aspose as an example, this is a common issue for many third-party libraries – some information gets cached in static fields for performance or expediency. Good libraries require documentation that explain this and ideally allow more sophisticated customers who really need to control memory consumption to have at least some control over just what is cached and when.
How to Spot and Diagnose Common Memory Problems
As touched on previously, it’s often hard to identify if what appears to be a memory problem is really a problem at all, or is in fact designed behaviour – sometimes an application can chew up a lot of RAM in a pattern typical of memory leaks, and this might seem to be a buggy implementation. However you may just have a process that rightly requires a lot of memory to function – it’s often hard to tell! Furthermore, to add to the insidiousness of the problem, you most likely won’t see your machine run out of memory (“Out of memory” exceptions are increasingly rare thanks to virtual memory and sophisticated OS memory management.) What’s more likely to happen as a library starts to demand more memory is that the machine will just get slower and slower because it is simply running out of resources.
In Aspose’s case, their products load the document in memory and generate a virtual representation of it by creating a hierarchy of nodes, similar to how XmlDocument works. This grants the user the power to access and modify the document, but at the same time it demands a memory footprint up to 10 times the size of the original document to load it fully. With large input, the sudden burst of memory usage can appear like a memory leak or like something just going horribly wrong, but it’s just the expected behavior.
The bad news is that there isn’t a sure-fire way of spotting memory problems in these situations because you don’t always know the full details of the third-party’s expected behavior. If you trust that the creator of the component has done everything possible to avoid memory leaks and conserve memory then you can rest easy, or at least look elsewhere when you do encounter problems, but that is easier said than done.
Of course, if you have what you think is a memory problem with a third-party application, it pays to thoroughly consult the documentation or get in touch with the support people directly to try confirm whether or not what that is being seen is a problem. After all, if the component is paid-for, the support and insight into expected behaviors should definitely be part of what you’ve bought!
Memory Problems at the Code-Component Level
There are some problems that occur at the point when your code is interacting with a third-party component. This may not because there’s an actual leak in either piece of code, but rather an emergent problem due to the way the two systems operate internally. If you’re trying to work out whether a memory issue is an “interaction” problem or a true memory leak, strip down any code which involves the third party component down to a simple bare bones set of calls that are practically “word for word” with the code examples given by the software vendor. If the memory issue is still present, then it’s more likely to be a problem with the third-party component and not your own code. For another real world example, let’s turn to the Aspose team again.
Aspose.Words holds the entire document in memory, represented as a tree of nodes, and the entire tree will be in memory for as long as the client’s code has at least one reference to any of the nodes. So, if the client’s code wants to make sure the document is removed from memory when it is no longer used, they should make sure they do not keep references to it. If they hold a reference to just one Paragraph object, then the entire Document with all its children will be held in memory. This is similar to any libraries that use a Document Object Model system (DOM) such as System.Xml classes, and these kinds of problems should be easily identifiable by using a memory profiler.
How to Troubleshoot Problems with Third-party Components
Of course, at some point it’s likely that you will have to deal with a leaky library, and there are a few ways you can approach that situation. More often than not, simply asking the support team or engineers who made the software is the best way to resolve any memory issues or find workarounds. Unfortunately, in the case of open source software or simply unhelpful organizations, you may still need to take measures into your own hands.
One common way to minimize memory leaks is to host the third party component in its own ApplicationDomain, which will at least give you some control, in that you’ll have the ability to kill the entire domain where the component is hosted. So, if something is hogging the memory and won’t give it back, you have the option to contain and kill the offending component when needed and recover lost memory faster than the garbage collector could achieve on its own. You can then restart the process once again in a new AppDomain. It’s not an ideal solution, of course – but it does work.
How to Troubleshoot your own Application
In an ideal world, so long as you follow the correct procedures set out in the components’ documentation, you shouldn’t run into issues such as accidentally causing a memory leak or eating up memory by misusing components. That said, we rarely inhabit an ideal world, and problems do crop up.
The best way to identify where memory is being used is to use a profiler, and if that reveals that the problem is happening inside a third-party’s component then at least you know your code is not to blame! That’s not said to make you feel better – it helps you work out where to focus your energy. All component vendors should be responsible for ensuring memory is conserved and released when required, so if a third-party library uses any object that implements IDisposable,it should also be responsible for disposing of it. The one exception to this rule is if an object that implements IDisposable is passed to the user via the public API – when a third party component returns you an object that implements IDisposable, then it most likely means that you should dispose it when you no longer need it. Not doing so is also a common cause of memory issues.
However, it’s worth bearing in mind that an object that implements IDisposable usually wraps some unmanaged resource. If you don’t dispose an object like that then, while it won’t leak because .NET will dispose it anyway at some point, it might be disposed much later than you might like, which can cause there to be too many unmanaged resources held in memory for too long. On the other hand – when you (as a caller) pass an object with IDisposable to a third-party library, then you are normally still the one who is responsible for disposing it.
An example of returning disposable objects in Aspose.Words is when you take an object in the drawing layer of a Word document, which is represented in the Aspose.Words DOM as a shape. Using the shape class, you can call a method to retrieve the image associated with this shape, represented by a System.Drawing.Image object which can be used for whatever purpose. However, as the Aspose team helpfully point out in their documentation, it is up to the caller to dispose of the image object when they are finished with it, and the best way to dispose is to use the using block (or at least try/finally) as demonstrated below:
Document doc = new Document("Test Document.docx");
Shape shape = (Shape)doc.GetChild(NodeType.Shape, 0, true);
using (Image image = shape.ImageData.ToImage())
image.Save("Shape Image Out.bmp");
Wrapping up: How to Avoid Design Problems with Third-Party Components
To avoid memory problems, component authors should be proactive about code maintenance and follow a practice of rapid release cycles – Aspose’s short release cycles mean that if they or customers find a leak or a bottleneck then they can quickly fix it. So, when using a third-party library, we suggest making sure they use good development practices, they iterate often and their new releases are robust (not introducing regression issues to be fixed in future versions). Find a vendor who provides lively and useful customer support even if they appear to cost more than the competitors who don’t seem as active or as helpful. Most times, you get what you pay for.
Additionally, try to look for a solution that is completely managed, as the source of a lot of memory leaks come from unmanaged code. It’s deeply important for software providers these days to provide scalable and memory-efficient code as the world moves further and further into large scale processing of data, so the memory impact of your application and the components that make it up should be something you actively bear in mind.