19 March 2009

The Dangers of the Large Object Heap

You'd have thought that memory leaks were a thing of the past now that we use .NET. True, but we can still hit problems. We can, for example, prevent memory from being recycled if we inadvertently hold references to objects that we are no longer using. However, there is another serious memory problem in .NET that can happen out of the blue, especially if you are using large object arrays. Andrew Hunter explains...

Usually, .NET developers don’t need to think too much about how their objects are being laid out in physical memory: after all, .NET has a garbage collector and so is capable of automatically removing ‘dead’ objects and rearranging memory so that the survivors are packed in the most efficient manner possible. The garbage collector has limits to what it can do, however; and when these limits are reached, then the runtime can exhaust memory in a way that is surprising and confusing to any developer who is not aware of how .NET chooses to lay out objects in memory.

.NET manages memory in four different regions, known as heaps. You can think of each of these as continuous regions of memory, though in practice .NET can create several fragmented regions for each heap. Three of the heaps, called the generation 0, 1 and 2 heaps are reserved for small objects: In current versions of .NET ‘small’ means objects that are 85,000 bytes or less. Any object is assigned to one of these generations according to the number of garbage collections it has survived, the veterans ending in generation 2. .NET knows that younger objects are more likely to be short lived and can reduce the performance cost of garbage collections by initially only looking at the recently allocated objects on generation 0. Perhaps more importantly, it can also move the survivors of a collection around so that there are no gaps, ensuring that the free space available for new objects is always together in one large lump. This helps with performance – .NET never needs to search for a hole big enough for a new object, unlike unmanaged code: if there’s enough memory available it’s always in the same place. When it needs to compact a series of objects, .NET actually copies all of them to a new block of memory rather than moving them in place; this improves performance by simplifying how objects are allocated. In these small heaps this means that the free space is always at the end, so there is never any need to scan elsewhere for a ‘hole’ big enough to store a new object.

The fourth heap is known as the Large Object Heap, or LOH. ‘Big’ objects go here – as the size at which an object may end up on this heap is 85,000 bytes, this usually means arrays with more than about 20,000 entries. It’s treated separately by the garbage collector, which will generally try to reclaim space from the other heaps before trying to tackle the giant objects that lurk here. Large objects pose a special problem for the runtime: they can’t be reliably moved by copying as they would require twice as much memory for garbage collection. Additionally, moving multi-megabyte objects around would cause the garbage collector to take an unreasonably long time to complete.

.NET solves these problems by simply never moving large objects around. After large objects are removed by the garbage collector, they leave behind holes in the large object heap, thereby causing the free space to become fragmented. When there’s no space at the end of the large object heap for an allocation, .NET searches these holes for a space, and expands the heap if none of the holes are large enough. This can become a problem. As a program allocates and releases blocks from the large object heap, the free blocks between the longer-lived blocks can become smaller. Over time, even a program that does not leak memory, and which never requires more than a fixed amount of memory to perform an operation, can fail with an OutOfMemoryException at the point that the largest free block shrinks to a point where it is too small for the program to use. In the worst cases, the amount of usable memory can shrink at a rate that seems unbelievable.

The worst case scenario

The design of the Large Object Heap means that its’ worse case will occur when the short-lived arrays are large and can vary in size a bit, and if the longer-lived arrays are small. The variation in size makes it likely that, when a new short-lived array is allocated, it won’t fit into the space left by one of its predecessors; .NET will then have to extend the LOH to ever-greater extents. This might not seem so bad: surely the smaller blocks can be put in the large gaps that are left behind?

Well, they can, but there’s a catch. As mentioned previously, .NET prefers to put new memory blocks at the end of the heap, as it has to do less work to find a space where a block will fit: This means that when there’s a new large block on the heap, the next smaller block will be allocated immediately after it, with the consequence that it will have to further extend the heap if the program ever wants to allocate a larger block.

674-image002.gif

What does this look like in a practical program? It seems fairly esoteric: allocate something big, then something small, keep the small thing and throw away the big thing. There are a few scenarios where this pattern can appear, and it can drastically reduce the amount of memory available to a program. The most trivial scenario could just be this:

A List object’s underlying implementation is an array and when it needs to grow, .NET allocates a new, larger array and throws the old one away. All that’s needed to start to cause problems with fragmentation is to put some other object on the LOH before the array needs to grow. The main effect here is that you might be surprised how small the size is that your ‘large’ list can grow to before .NET tells you that it has run out of memory – long before the program is using up its 2Gb theoretical maximum.

#

Perhaps a more interesting case is this: a program is designed to read in a large data file, one that varies in size but is usually around 16Mb or so. It reads this into memory, processes it, and then produces a 90k summary. It is later adapted to run as a batch job: processing many of these files and storing the results in memory. Quite a lot of data processing applications can look a bit like this. You might expect that it could process tens of thousands of files before memory becomes a problem: after all only the 90k summary is kept between files, but testing surprises us with failure after only a few hundred files, by using up much more memory than expected.

The following program demonstrates this case. It fills up a list with 90,000 byte arrays and prints out how many bytes were allocated before .NET throws an OutOfMemoryException. It can also, optionally, allocate larger arrays between these allocations. These are short-lived, so they don’t look like they should significantly affect the number of smaller arrays that can be allocated. Unfortunately, each time the program allocates a larger array it allocates one more byte than last time.

When run on a 32-bit Vista system, the program gives the following result:

674-image004.gif

Oh dear. When the program tries to allocate the larger blocks, it’s a disaster – it only manages to end up using around 20Mb of long-term storage, out of a theoretical maximum of 2GB! Forcing garbage collections barely helps at all. Even in the case where the larger blocks don’t grow the amount of memory the program can use is reduced by over half. If the large blocks are taken away, the program easily creates 1.6GB of data.

The last case in which the blocks don’t grow, but the amount of available memory is still reduced, is interesting. When the LOH fills up and the program wants to allocate a new small block, .NET tries to re-use free space before it grows the heap: this means that the space where the larger block was previously being put becomes too small, and the LOH has to grow in a similar way to before. It’s not as critical, but it does still result in the amount of available memory slowly going down.

Switching to a 64-bit system will improve matters: the program will be able to use up a lot more memory before it fails, but perhaps the situation is worse – because that memory is assigned to the program, the operating system will be paging it all out to disk resulting in considerably reduced performance for every program on the system and perhaps eventually running out of disk space.

It is difficult to diagnose this problem with the tools that are currently available. The best approach available at the moment is to combine a memory profiler with the performance counters. You can look at the ‘Large Object Heap Size’ performance counter to discover how many bytes are allocated to the large object heap. This counter includes free space, so to discover whether or not the problem is that memory has run out or because the memory that is available has become too fragmented to be used you will need to use a memory profiler. A good profiler will be able to tell you the total size of all of the live objects on the large object heap. If there is a big discrepancy between the two numbers, then it is possible that your program is suffering from heap fragmentation. Unfortunately, this isn’t guaranteed: there’s no problem if there’s a large amount of free space at the end of the heap – .NET just won’t clean it up until it has to for performance reasons.

Working around the problem

It might seem that nothing can be done to solve this problem apart from periodically stopping and restarting the afflicted program. This is probably the case if there is really no choice other than to deal with very large arrays of objects (or very large strings), but often it is possible to reduce or eliminate the need to use the large object heap, or to use it in such a way as to prevent fragmentation from occurring.

The best way to prevent fragmentation from occurring is to ensure that no objects above 85k in size are used for permanent storage. This ensures that the free space in the large object heap will always end up in one big chunk as short-lived objects are removed from play.

If large data structures need to live for a long time, and especially if they need to grow in size over time, the best approach is simply to consider using or writing a different data structure to store them. Arrays can contain up to around 10,000 elements before they are put on the large object heap and can cause problems, so a very effective way to store 100,000 entries might be to store 10 arrays each containing 10,000 elements: none will end up on the large object heap so no fragmentation will occur. This could be written as an IList subclass, which would make it easy to drop in transparently to replace existing code.

Depending on the design of the code, a simpler approach might simply be to make the ‘temporary’ but large structures more permanent. If a program needs to deal with a set of large files, it will make more efficient use of memory if it allocates enough space to deal with the largest one first, and then re-uses that for every file instead of allocating just enough for each and then throwing the memory away later. Many of the classes in System.Collections have a ‘Capacity’ property to facilitate this design pattern, and thread local static variables can be used (with care) to help share a large data structure in multithreaded applications.

It is unfortunate that the CLR garbage collector deals with large objects in this manner. The effect is similar to a memory leak, but one where we get no clue as to the cause by measuring the size of the objects in memory: a memory profiler will tell you that there’s no problem, while task manager tells you that the program’s memory usage is growing ever larger – which could be due to this issue or which could be due to the behavior of unmanaged code. What is needed instead is a way of finding out where objects are in memory and how they are affecting the free memory available to .NET. This can be done with some detective work: code can be inspected for areas where large amounts of memory are likely to be allocated, and a memory profiler can be used to help discover these areas. Once these are known, a debugger can be used to step through the code and some pen and paper can give an idea of what the large object heap looks like, which can be used to decide on the best way forward.

There is a feature on the drawing board for ANTS Memory Profiler 5 that will help diagnose this issue and distinguish it from an unmanaged memory leak, and there are plans for features in future versions that will help isolate the specific cause once the issue is diagnosed.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 111876 times – thanks for reading.

Tags: ,

  • Rate
    [Total: 243    Average: 4.7/5]
  • Share

Andrew Hunter is a Software Engineer at Red Gate who is responsible for much of the recent rewrite of ANTS Performance Profiler and ANTS Memory Profiler. Before that, he wrote the SQL Layout utilities for SQL Refactor/SQL Prompt. He has been described as resident master of the dark .NET arts. Dereferenced in a freak accident, he is forced to spend his days hiding from the garbage collector.

View all articles by Andrew Hunter

  • Ran Davidovitz

    Using dump to understand fragmentation
    Thank you for a great post, i have dedicated my last weekend to read several LOH related article mainly by Maoni and Tess and yours is also very good.

    I wanted to add one point that you can also use windbg on a dump that was created to understand the level of fragmentation you have see http://msdn.microsoft.com/en-us/magazine/cc534993.aspx or http://blogs.msdn.com/maoni/archive/2006/11/08/correlating-the-output-of-eeheap-gc-and-address.aspx

    thanks

  • Andre

    Cheating the LOH ?
    Great post. Thank you.

    I just wonder if dividing large objects into smaller ones is perhaps contra productive ?
    The LOH has been introduced to prevent large objects being moved around. If large objects are divided into smaller ones, they will effectively be moved around when a GC occurs.

    O.k. if I have the choice of a crash and slower execution I will choose the slower execution.
    The problem at all isn’t a new one, all native heaps suffer from the same problem.

    It’s good to know that the old problems still exist. You’ve found a good solution, although I think it shouldn’t be used as a general solution. A more performant one IMHO would be to use preallocated caches / try to make the LO (large objects) equally in size, so that the still reside on the LOH and that holes can be reused by new allocated LO.

  • Andrew Hunter

    Re: Cheating the LOH
    .NET won’t actually move objects around that much in practice – this is one of the advantages of using a generational garbage collector, and one particular advantage of allocating a large object in many small chunks is that the runtime can move around small parts at a time. Additionally, LOH collections are typically the most expensive that .NET does, so the effect on performance is not clear cut.

    Preallocating buffers will certainly work if it’s done at program initialisation but may not be suitable for larger, more complicated programs that deal with a variety of different data types, or where the amount of data is not easily predicted at program startup – growing large arrays are probably the most common cause of fragmentation in practical programs.

    Trying to keep objects all around the same size can help if care is taken about the order so that objects that are allocated together are collected together. However, the problem still exists: .NET prefers to put new objects at the end of the heap so holes don’t get filled up if the heap is growing, which tends to exacerbate any fragmentation that might have occurred.

    The final example in the program above shows that total possible memory consumption is still cut by a third if none of the blocks being allocated grow. The problem there is that .NET starts to fill up the gaps, then the LOH grows and it stops again.

    Ironically, it’s the point at which the gaps get filled that cause the heap to grow and the fragmentation to get worse, as the large block no longer fits in the existing heap. Preallocating the large block would fix the problem in this example.

  • Richard Minerich

    Low Fragementation Heap
    We’ve managed to mitigate this issue somewhat by trying to keep our dirty work in the unmanaged space where memory allocation can be manually controlled. It’s a shame that it’s necessary. Is anything being done in 4.0 to help?

  • Ulf

    Double array problem
    Great article.

    There is however another gotcha that is not so well known.
    The limit of 85k seem to be true for all data types except doubles. For doubles the limit is 1000 elements, i.e. 8k.

    This can easily be discovered by this simple code:
    double[] normalHeap = new double[999];
    Console.WriteLine(“double array with 999 elements. Generation: ” + GC.GetGeneration(normalHeap));
    double[] largeHeap = new double[1000];
    Console.WriteLine(“double array with 1000 elements. Generation: ” + GC.GetGeneration(largeHeap));

    The printout is:
    double array with 999 elements. Generation: 0
    double array with 1000 elements. Generation: 2

    Microsoft’s respond to this is:
    “It’s not a bug – we do this for performance reasons because doubles are accessed faster when they are aligned on a 8-byte boundary which large objects are”

  • Anonymous

    Yikes–Too Fragile
    Until MS really solves this problem, I’m not thrilled with trying to work around it by managing the LOH for them. It’s potentially too fragile with multiple developers getting into the same code and not knowing the gotchas just so. I’d be inclined to use disk-based solutions like StreamWriter or StringWriter.

    Charlie
    http://www.NoonmarkAntiques.com

  • Anonymous

    String ’em up
    The same issues apply to large strings.

    The memory usage patterns you describe are all too familiar. I use the Domino COM API to interact with Lotus Notes, specifically the NotesDXLExporter class. The Export method returns the contents of an email message as an XML string. Email messages vary hugely in size, many are over 85Kb, and some are very large. Consequently, processing a large sample of email messages often exhausts memory.

    Is there any way of pre-allocating strings to use the LOH memory more efficiently?

  • Ulf

    Double array problem
    Great article.

    There is however another gotcha that is not so well known.
    The limit of 85k seem to be true for all data types except doubles. For doubles the limit is 1000 elements, i.e. 8k.

    This can easily be discovered by this simple code:
    double[] normalHeap = new double[999];
    Console.WriteLine(“double array with 999 elements. Generation: ” + GC.GetGeneration(normalHeap));
    double[] largeHeap = new double[1000];
    Console.WriteLine(“double array with 1000 elements. Generation: ” + GC.GetGeneration(largeHeap));

    The printout is:
    double array with 999 elements. Generation: 0
    double array with 1000 elements. Generation: 2

    Microsoft’s respond to this is:
    “It’s not a bug – we do this for performance reasons because doubles are accessed faster when they are aligned on a 8-byte boundary which large objects are”

  • Ulf

    Double array problem
    Great article.

    There is however another gotcha that is not so well known.
    The limit of 85k seem to be true for all data types except doubles. For doubles the limit is 1000 elements, i.e. 8k.

    This can easily be discovered by this simple code:
    double[] normalHeap = new double[999];
    Console.WriteLine(“double array with 999 elements. Generation: ” + GC.GetGeneration(normalHeap));
    double[] largeHeap = new double[1000];
    Console.WriteLine(“double array with 1000 elements. Generation: ” + GC.GetGeneration(largeHeap));

    The printout is:
    double array with 999 elements. Generation: 0
    double array with 1000 elements. Generation: 2

    Microsoft’s respond to this is:
    “It’s not a bug – we do this for performance reasons because doubles are accessed faster when they are aligned on a 8-byte boundary which large objects are”

  • Nuri

    Great explanation and topic
    Thank you for the post. Very informative.
    Seems to me that the point of programming in managed code is ease and speed of development. When it comes to performance, deeper uncommon knowledge is required (such as the above post and ensuing discussion). One thing to keep in mind is that when large objects are created, they are created for a reason and should not be juggled around much during program lifetime. If this is not the case, then more careful object layout can be constructed. For instance, the post mentions crating an IList instead of just one big array. Another might be linked lists or trees using references rather than contiguous memory.
    Also, if a lot of memory is used and discarded (such as large string parsing or processing) one might revert to using byte array and BitConverter.* functions to map in and out of a very large buffer, thereby taking over memory management from the framework and having to do your own. This may sound extreme, but when performance is important, one has to simply bite the bullet and spend more time on thinking through the structure and usage scenario of objects over the program lifetime.
    I had some rather large data structures in web services and sites that required high performance. We ended up claiming a lot of memory upfront but reusing it again and again. The speed benefit at the cost of a few hundred MB RAM was well worth it. GC never re-claimed it or locked it up in scavenging and it worked great.

  • Hatchman

    Intersting article
    Wouldn’t a simple way to get around this problem to just use of an out-of-memory cache?

    I’ve always provisioned for this in designs when dealing with large objects that still need to be retained within a session or process. Performance is usually pretty good if it’s done intelligently enough as well.

    Those who have been around long enough (programmed back in the day when you did your own memory allocation and de-allocation) tend to not be sold on the garbage collection marketing hype and adopt this approach. Good to hear that it is still a valid solution

  • Moby Disk

    A fix: Just for the excercise of it
    Just for the fun of it, I added some code to the bottom of the loop to reallocate the small blocks periodically. Indeed, that does solve the problem.

    // Periodically reallocate the small blocks
    if ((count % 50) == 0)
    {
    // Reallocate recently created small blocks
    for (int i = count-50; i < smallBlocks.Count; i++)
    {
    // Make a copy of the array
    byte[] oldBytes = smallBlocks[i];
    byte[] newBytes = new byte[oldBytes.Length];
    oldBytes.CopyTo(newBytes, 0);

    // Put the new one into the list
    smallBlocks[i] = newBytes;
    }
    }

    This gets it to 1800MB+ before the app dies.

  • Nuri

    Re: using cache
    I am a huge advocate of cache, and centralized or distributed cache used well can help your app scale and perform.

    Out-of-process-cache means IO.
    IO is factors slower than memory access.

    Also, if the consuming application is in managed code, then when it reads / receives/processes the object from cache it would need to allocate memory for it. So no solution here.

    A remoting or RMI approach where an external service or server hosts the full object in memory but doesn’t supply the whole data – rather the service performs operations on the large object and return to the caller small data might be viable.

    As mentioned by others here, one could also use unmanaged code to marshal the data “external” to the CLR but still in process. The boundary crossing to unmanaged code should be less then hitting the wire.

  • Chakravarthy

    Other Sizes
    Andrew,

    that’s an excellent article show casing the actuals.

    btw can you throw some light on the size of objects that can be accomidated on each gen

    thanks for reading

  • philippg

    how to identify the condition?
    great article.
    i’ll have to look further into my code!

    assuming i start off with “process explorer”, only:
    is there a way to tell if i’m suffering from those symptoms described in your article?

    i’m not really sure how to read all the details in process explorer, so i’ll simply provide some sample data from my application:

    process tab “performance”:
    – virtual memory: private bytes = 450 MB
    – virtual memory: virtual size = 950 MB
    – physical memory: working set = 350 MB

    process tab “.NET”:
    – .NET CLR Memory: Large Object Heap Size = 250 MB

    my application suffers from loads of OutOfMemoryException even though there seems to be sufficient unallocated RAM available.

  • najak

    MS should fix .NET — this is an easy-to-fix Flaw!
    Why can’t Microsoft provide us with a simple API in .NET called:

    GC.CompactLargeObjectHeap()
    even if this method causes the application to hang for 10 seconds or more — it beats the heck out of crashing and requiring Application restart to clean up the memory!

    There are a great many applications that use 3rdparty libraries which make allocations on the LOH. For example, even Microsoft’s own XNA fragments the heck out of the LOH each time you do “new Texture2D()” or “Texture.FromFile()” without any remedy!

    So if you are trying to create a “Game Engine” in .NET — this is a real pain in the neck, to have our LOH leak away memory without any way to stop it. All we can do is “slow it down”, unless we decide to “make our own library” and forget about XNA. Surely this is not what Microsoft wants!

    Many months of headache would be resolved for us (and others!) if MS would simply provide us with this one method; I’ll say it again:

    GC.CompactLargeObjectHeap()
    And while I’m making this suggestion, I would suggest the MS take it one step further, and allow this command:

    GC.DoLargeObjectHeapCompactionWhenOutOfMemory = true; // would be false by default
    So we would just set this to true for “Visual3D Game Engine” (our product), and then automatically, when the fragmentation causes the first OOM exception, .NET would automatically compact the LOH, and then retry the allocation!

    Wow, now that would aligned with all the .NET hype about “it just works” and “easy to use”… and would only cause this “performance hit” for extreme cases, where the alternative is OOM crash, forcing a restart of the application!

    Please Microsoft — do this for us!

    Please do not announce your ignorance by saying the trite naive advice of “just use better memory management” — yeesh! This is very inappropriate advice! You would have to also say “write ALL your own 3rdparty libraries” (e.g. XNA)!!!!

    Why hasn’t Microsoft already provided this behavior in .NET???? What would it hurt? It would save us from these very frustrating and time-consuming band-aid solutions, which can at best “slow down the memory leaks” but never fully fix them.

    The solution we are requesting would be harmless to those who don’t want to use it, and provides a refreshing salvation to those who really do want to use it. Please, Please, Please, provide this in .NET 4.0+.

  • TruePath

    Page Cache
    Either you are using an old OS/cpu that doesn’t support 64 bit addressing (or a cheap version of win 7) or MS has radically failed to take proper advantage of 64 bit addressing.

    Win 7 ultimate lets a process address up to 192 gigs, snow leopard 16TB and linux 32 exabytes. You are never going to realistically chew through that entire address space in the LOH. Thus the strategy that MS implemented for allocation sounds like the right one and it’s only some artificial limit that is getting in the way.

    Remember just because you’ve allocated a page of memory doesn’t mean the OS has to always back that up with a page of RAM. The OS is perfectly free to swap pages out to disk or leave blank pages unbacked. The LOH should be allocating an integer number of psges (4K) for each object. When an object in the LOH is freed by the GC the pages allocated to it should be returned to the OS so they no longer need any backing store and don’t take up disk space or strain the page allocation structures.

    Indeed, the LOH on a 64 byte system should be substantially faster and less resource intensive than the normsl heaps since no objects are ever copied and all objects get an integer number of pages. Unlike the regulsr heap fragmentation isn’t really an issue since this gets handled for you by the hardware supported page tables. Indeed it might be optimal to never GC the LOH until virtual memory limits are hit.

    True such a strategy would end up writing huge parts of your LOH out to disk but that happens in a second concurrent process. As long as you aren’t allocating objects in the LOH and dirtying their pages faster than your disk can write them you shouldn’t see any slowdown except a minor competion for disk IO and a small impact of larger OS page maps. The system won’t thrash because most of those pages really are free and thus are never again read back from disk. Without a GC marking objects in the LOH no false page access by the GC occurs. Thus the LIFO algorithm employed by the page cache is a pretty decent GC that trades GC overhead for some disk writes and (maybe) an occasional read.

    I guess it would be superior to hold the GC metadata and any embedded pointers for objects in the LOH on a separate page from the rest of the data (it can precede it if you want and the rest of the page can be used for other heaps/metadata) so the GC can still free the pages in the LOH without triggering any unneeded page loads. Since objects in the LOH have few if any pointer members/elements (or are all pointers and must be scanned by the GC anyway) one can segregate those from the data and get the best of both worlds: no disk writes and no false page loads for the GC.