.NET Memory Management and Finalization

In this excerpt from his new book, Practical Performance Profiling: Improving the Efficiency of .NET Code, Jean-Phillipe Gouigoux discusses the Dispose mechanism and the finalization process in the context of .NET Garbage Collection

The System.GC class provides access to several methods related to memory management. Unless you have a great deal of experience in .NET memory management and know precisely what you are doing, it is strongly recommended that you simply forget about its very existence.

There is a lot of temptation for beginner developers to explicitly call the Collect method at the end of a process which they know is memory-intensive. Of all the cases that the author is aware of since the early versions of .NET, none has ever shown that calling the GC mechanism explicitly could give better results in memory consumption or performance than simply letting the GC do its job.

Of course, in use cases where the usage does not vary at all, explicit garbage collection can allow control of memory use in one’s application, but this comes with a high price tag:

  • Sooner or later, the use of the application will change, and the memory consumption will vary.
  • Explicit GC calls at regular intervals have the natural consequence of increasing the number of garbage collections. In the end, this increases processing time.
  • As a general rule, there is no need to limit memory consumption of an application as long as the system keeps enough memory available. In the case of a system with a high memory pressure, the GC will adapt by running passes more frequently than on a system well equipped with RAM.

In short, it is essential to let the GC do its job, and not try to improve memory use by directing the collection process in any way.

We can have much more impact by helping the GC to recycle memory efficiently, by taking care of resources as explained below.

The question of high memory use
A question is often asked by beginner developers: is it normal that such and such a .NET process uses so much memory? The feeling that the CLR does not release enough of the memory used is very common, but one must understand that, as long as memory is available, it is normal for .NET to use it. And why would it do any differently? As long as the OS can provide it with memory, there is no sense in .NET limiting its use of it: running the GC more often would take time and thus slow down the application. The only important point to check is that the same process can also run in a smaller memory set.

Releasing external resources when the GC is fired

Firstly, let us make it clear that we are only talking about external resources here, such as connections to a database, memory spaces not controlled by managed code (typically in COM interoperability), or any other resources that are not controlled directly by the CLR. Indeed, contrary to what happens with C++, an object in .NET does not need to worry about whether the managed objects it references should be recycled or not. The CLR will check whether this object is referenced by any others, and if not, it will free them as well.

By contrast, in the examples below, it is important to release resources correctly. In the case of a database connection, this means calling the closing method on the corresponding API. But in most cases, as below, this operation is explicit, and there is no need to wait for the end of the object’s life to close the connection.

Listing 1

In the example above, the CommandBehavior.CloseConnection parameter used in the ExecuteReader method guarantees that the connection closing operation will be called automatically upon closure of the associated reader.

By contrast, we can imagine a .NET object for which we would need to initialize a connection during construction, and to close the connection only when the object is at the end of its life. To do so, there exists a way of informing the CLR that it should execute an action when freeing an object. Typically, this works like this:

Listing 2

Obviously, this example is over-simplified: keeping the connection open throughout the life of this object would only make sense if it was destined to be called extremely frequently on the Log() method. In the more plausible case of the method being called irregularly, it would definitely be better to open the connection and close it at the end of the function call.

This would remove the need to deal with closing the connection upon disposing of the instance, and would also free database connections for other uses, making the code more capable of handling high loads. But this is not the end of the matter, and one should remember that performance handling is often about choosing where to strike the balance between two extremes. In this example, one could argue that opening and closing the connection at each call takes processing time and slows the process down. In particular, opening a database connection is a heavy operation, which involves starting a new thread, calculating authorization levels, and several other complex operations.

So, how does one choose? Quite simply, by knowing the mechanisms used in database connection management. In practice, SQL Server will pool the connections, bringing better performance even if they are opened and closed frequently. When the Close instruction is called on an ADO.NET connection, the underlying object that deals with the actual database connection is in fact not abandoned, but only deactivated, and marked as available for another user. If the object is then taken from the pool, the opening of a connection is much less complex, since the object exists and the code only has to reactivate it for another use, usually only having to re-authorize it.

In short, since we have no need to deal with the object finalizer, we can write:

Listing 3

Early release of resources

The method described above (releasing a resource upon object recycling) still has a major drawback: if the resource is precious, it is a waste to wait minutes or even hours for the GC to release it.

This is the reason behind yet another .NET mechanism: the IDisposable interface. Implementing this interface forces a class to have a Dispose() method, allowing the class instances to release resources as soon as the developer calls the method, whether it be explicitly or through the using keyword. Let us take an example:

Listing 4

The user of such an object would work with a code that calls the method like this:

Listing 5

For readers that are not used to the using keyword, the code above is exactly equivalent to this:

Listing 6

By the use of Dispose the caller guarantees that the resources will be released as soon as possible.

Combining both operations

At this point in the evolution of our example code, something is still missing: what happens if the caller does not use the Dispose mechanism, by forgetting to include the using keyword or to call the equivalent method? Resources will not be released, even when the GC recycles the object, and there will be a resource leak.

It is thus necessary to apply both of the mechanisms we have described above, in a combined way:

Listing 7

This way, the Dispose mechanism can be called explicitly to release the associated resource as soon as possible, but if for some reason this is overlooked, the GC will eventually call the finalizer. This will be done later, but it is still better than never.

Nonetheless, a seasoned developer will notice the code duplication: the finalizer and the Dispose function use the same code, which is contrary to a well-known best practice. As a result, we should combine the resource freeing code, like this:

Listing 8

We are getting there, but there are still a few potential problems we have to deal with:

  • If Dispose is called explicitly, there is no use for the finalizer anymore, because we know it will not do anything: the resource has already been freed.
  • We should make sure that calling the method to free resources several times will not cause any problems.
  • We should take into account the fact that, when Dispose is called, the Dispose method for other managed resources should be called as well. Generally, the CLR takes care of this by using the finalizer, but in this case, we have to do it ourselves.

The final code is:

Listing 9

This code structure is known as the “Dispose” pattern, and is quite a standard form. Despite all the effort we have put into it, it is still not 100% complete. If we want to take care of all the possible situations, we should add one more safety feature: once Dispose has been called, the object cannot have its Log method called. A traditional modification is to set the connection to null, and then check its value in Log or any method that could use it.

Further details can be found by searching for “Dispose” and “Pattern” on the internet. There are numerous discussions on side-effects and how to avoid them, memory performance of each variant of the pattern, etc. The goal of this article is not to provide the reader with a state-of-the-art summary of these discussions, but to show the link between this pattern and the performance of an application. If it is not correctly implemented, there are risks of massively reducing the access to unmanaged resources.

A last note

It is essential to stress that the memory use of a process has absolutely nothing to do with the fact it cannot release it. This is a common misunderstanding of .NET memory management. As long as the OS does not restrict the CLR in its memory consumption, .NET has no reason whatsoever to run the GC at the risk of generating a drop in performance in the application.

It is perfectly normal for an application to grow in memory up until it reaches hundreds of megabytes. Even if one pass of the GC could make this drop to ten megabytes, as long as no other process needs memory, the CLR should not sacrifice even a small percentage of its time to freeing this memory. This is the origin of the reputation of .NET and Java as “memory hogs”. In fact, they are only using available resources as much as possible, while still maintaining a process to release them as much and as quickly as possible should the operating system ask for them.

Application In real life

A developer in my team created an application that processed XML in bulk. Each file was a few hundred kilobytes at most, and the corresponding instance of XmlDocument around one megabyte. The developer, who was watching memory consumption out of curiosity, was alarmed by the fact that is was growing consistently, for each file processed, and asked me whether he should cancel the process before reaching an OutOfMemoryException. After growing to 700 megabytes or so, it suddenly dropped to around 100 megabytes, and this cycle repeated itself like clockwork until the end of the application. This case is a good example of how .NET works: on this machine, that had 2 gigabytes RAM and almost no other active applications, it would have been counter-productive to have more GC activity, since the whole process would have taken a few more minutes, whereas reducing peak memory use would have made no difference at all. It is also revealing about the difficulty of grasping the GC mechanism for a developer that has not had it explained, which can cause performance issues, as explained above.

Tags: , , , ,


  • Rate
    [Total: 0    Average: 0/5]
  • abellix78

    Good article
    Simply amazing, great job

  • jhonatantirado

    Good article
    I got it! Thanks for the simple explanation

  • Anonymous

    Several errors
    SqlConnection is a managed resource and should not be disposed of if disposing == false. In addition, it is standard convention to make the ‘FreeResources’ method (which should really be called Dispose(bool disposing) to follow standard practices) protected virtual so that subclasses can override it.

    You’re also failing to dispose of your SqlCommand.

    There are plenty of articles on the internet covering how to correctly implement the dispose pattern, I don’t really understand why you’ve written one that does it incorrectly. This isn’t really a good advert for your new book.

    See also:

  • swells

    Deterministic Finalization
    Good article. I remember reading a similar article way back when .NET 1.0/1.1 was out and implementing this pattern as a matter of course. The article was called Deterministic Finalization.

    I have often wondered whether implementing it needlessly (i.e. when there are no precious resources to dispose of) is costing me anything I should be concerned about?

    A thought that crosses my mind though – if this were multithreading and two threads called dispose more or less simultaneously – should I be worried about locking the body of FreeResources? I cannot think of a reason why two threads might do such a thing…but it crossed my mind.

  • Stephen Leach

    Let the GC do its job
    Nice article. I would emphasise that the reason that garbage collectors allow memory usage to grow is that it’s part of a speed-space tradeoff. It’s not just that less space means that the GC runs more frequently but that the overall percentage of CPU consumed by the GC drops.

    As for implementing Disposal “needlessly” the overhead is modest but, yes, it is definitely a small overhead. The underlying reason is that, for the most popular GC algorithms, the GC only visits LIVE objects. As a consequence dead objects without a Dispose method incur ZERO overhead. Hence these types of GCs want to defer kicking in as long as possible in order to maximise the percentage of dead objects.

    But every time you add a Dispose method, you are guaranteeing a disposal overhead for every object. That overhead is small but measureable. In rare circumstances it could even matter. Having said that, it’s very unlikely to be significant in a real application.

    So I wouldn’t worry about it.

  • Anonymous[1]

    Bad example
    Do not copy this example! I agree with Anonymous, it has several errors and also bad practices.
    To add to the previously pointed out failings, the final code does not even compile (which probably means it’s not even been checked to see if it is actually correct). This is not an authoritative source.

  • swells

    Deterministic Finalization
    Thanks Stephen – one of the reasons it was applied almost as a matter of course was the fact that we had large tree structures of objects and we ‘knew’ at certain points in the program whole branches were no longer needed – so disposing of the branch took them of the finalization list so the GC could do its job quicker when it had cause to kick in.

    It also meant that if we were logically holding on to object references that we were not supposed to be we got “deliberate” and “forced” already disposed errors should we attempt to communicate with them – highlighing an area where memory leaks would be occurring – e.g. an event hanging onto some very large tree of objects we should have cleaned up.

    The down side is you get already disposed error where you would otherwise have not known…the upside being you knew you were trying to communicate with an object that should no longer exist and so must have a bug!

  • Anonymous

    The premise is wrong
    Using memory pell-mell is a “feature” of the IIS worker processes. Because there is no way to constrain the worker processes (except by making them crash when they reach a certain threshold), our development environments often slow to a crawl because they’re thrashing.

    Memory is still a scarce and finite resource, and it’s completely wrong IMHO to treat it as plentiful. Consequently, we don’t want — at least I don’t want — to emulate Microsoft’s practices …

  • Anonymous

    Calling the garbage collector
    Of all the cases that the author is aware of since the early versions of .NET, none has ever shown that calling the GC mechanism explicitly could give better results in memory consumption or performance than simply letting the GC do its job.

    That is not what we found. We have a large C# application and occasionally get out of memory errors.


  • Anonymous[1]

    Manual GC
    Out of memory errors are usually caused by the process being unable to allocate enough contiguous unused pages in its virtual address space, perhaps due to the number or size of objects created in an application, rather than a lack of memory. It’s likely the GC knows there is enough memory free, but not that there isn’t enough contiguous memory. Manually doing a GC collect from time to time might help to consolidate the memory in these situations.

  • JP Gouigoux

    Comments / answers from the author
    I am the author of the book this article has been extracted from. I will try and answer some of the questions I have seen in the comments. But first of all, thanks for writing here, it is always nice to receive some feedback!

    Secondly, this part of the book was not written as a reference about implementing the Dispose pattern, but more as a progressive explanation of the reasons behind it. In the end, the code is not 100% complete, and one can had some threadsafe checks, naming conventions, etc.

    The most important thing, in my opinion, is this idea of tradeoff Stephen Leach talked about. The GC is always doing a tradeoff between possible pause in the execution and memory pressure, hence the existence of two flavours of it, one for the workstation and the other for the server, where the tradeoff is not the same. Calling the Dispose also is a tradeoff. I personally chose to call Dispose only for objects with a large use of external resources: database connections, fonts, etc. For example, I do not dispose SqlCommand because the tradeoff is not in favour of doing so, whereas it is almost always so in the case of a SqlConnection.

    Anonymous (May 09), I agree the protected virtual is the accepted writing for the Dispose, but not everybody develops APIs. As a “consumer-software” developer, I would tend to close as much as possible the encapsulation for pragmatic reasons: I have yet to be given a good reason for inheriting a tracing class, and I very much doubt the average programmer could do so without breaking Liskov’s principle. If there is the slightest risk of mistaken inheritance, I prefer to make it impossible. If it was not out of the subject here, I would even have made this class sealed…

    swells, I completely agree with Stephen’s answer to your concerns: the overhead is going to be minimal and correctly disposing the objects that really need it is doing 99% of the job. As I understand it, the risk of a thread problem is mitigated by the use of a finalizer thread, which is cancelled by the GC.SuppressFinalize(this).

    Anonymous (May 14), sorry about the “connexion” instead of “connection” in the final code sample. A few variables remained in French in my Visual Studio projects, and we decided to correct them directly in the book. Sadly, we missed this one (and maybe a few others)… Taking this typographic mistake as a ground for attacking my honesty and the hard work of my editor / technical reader is quite unfair, though.

    Anonymous (May 14), your analysis of memory being scarce is true on a server, but there are client-side cases where the tradeoff is not the same. One prefers not to see its GUI freezing, particularly when there is so much memory available on a standard PC nowadays. On the other hand, there is a much bigger memory pressure on a server, where delaying a few threads by one second is not going to be noticed by the clients on the other side of the network most of the time. In this case, I agree memory should be used with great parcimony.

    Anonymous (May 15), I did not say these cases did not exist, I only said that I have not seen any in the ten years I worked with .NET… Also, the litterature is quite clear about letting the GC do its work, not only from Microsoft (http://blogs.msdn.com/b/ricom/archive/2004/11/29/271829.aspx or http://blogs.msdn.com/b/scottholden/archive/2004/12/28/339733.aspx), but also from external resources (http://www.dotnetperls.com/gc-collect, http://stackoverflow.com/questions/478167/when-is-it-acceptable-to-call-gc-collect, and many others). As I understand, you have a piece of code where using explicit GC.Collect() helped you out of MemoryException. But again, everything is a tradeoff: what about overall performance? If you call the GC more than strictly necessary, it will have a cost in time elapsed in the process. Would you agree to share your code so that we can benchmark it with/without?

    Anonymous (May 18), the GC always ends up with memory compression and what causes holes in the memory is when you have pinned objects for interop. So, either you have no pinned objects, and your memory is then contiguous, or you have pinned objects and calling the GC more than it would fire up by itself will only make memory fragmentation problems happen quicker! I am afraid we come back to this standard rule we talked about before: let the GC do its work.

    I understand some of my answers could make some people sceptical, when they are attached to following programming principles “by the book”. The thing is, I come from mechanical industry and I am more attached to evolving to what is working best. I am not saying what I come up with will be the solution for everybody. I am trying to explain how I came there, so that other people can have the same journey and come up with what is best for their own use. Everything is a tradeoff!


  • sureshgv2002

    The article would be completely sufficed if the author had explained about GC.SuppressFinalize usage.

  • sureshgv2002

    The article would be completely sufficed if the author had explained about GC.SuppressFinalize usage.