Click here to monitor SSC

Simon Cooper

Why enumerator structs are a really bad idea

Published Wednesday, May 19, 2010 7:25 PM

If you've ever poked around the .NET class libraries in Reflector, you probably would have noticed that the generic collection classes all have implementations of their IEnumerator as a struct rather than a class. As you will see, this design decision has some rather unfortunate side effects...

As is generally known in the .NET world, mutable structs are a Very Bad Idea; and there are several other blogs around explaining this (Eric Lippert's blog post explains the problem quite well). In the BCL, the generic collection enumerators are all structs, but they need to be mutable as they have to keep track of where they are in the collection as they're enumerating. This bit me quite hard when I was coding a wrapper around a LinkedList<int>.Enumerator. It boils down to this code:

sealed class EnumeratorWrapper : IEnumerator<int> {

    private readonly LinkedList<int>.Enumerator m_Enumerator;

    public EnumeratorWrapper(LinkedList<int> linkedList) {
        m_Enumerator = linkedList.GetEnumerator();
    }

    public int Current {
        get { return m_Enumerator.Current; }
    }

    object System.Collections.IEnumerator.Current {
        get { return Current; }
    }

    public bool MoveNext() {
        return m_Enumerator.MoveNext();
    }

    public void Reset() {
        ((System.Collections.IEnumerator)m_Enumerator).Reset();
    }

    public void Dispose() {
        m_Enumerator.Dispose();
    }
}

The key line here is the MoveNext method. When I initially coded this, I thought that the call to m_Enumerator.MoveNext() would alter the enumerator state in the m_Enumerator class variable and so the enumeration would proceed in an orderly fashion through the collection. However, when I ran this code it went into an infinite loop - the m_Enumerator.MoveNext() call wasn't actually changing the state in the m_Enumerator variable at all, and my code was looping forever on the first collection element. It was only after disassembling that method that I found out what was going on

The MoveNext method above results in the following IL:

.method public hidebysig newslot virtual final instance bool MoveNext() cil managed
{
    .maxstack 1
    .locals init (
        [0] bool CS$1$0000,
        [1] valuetype  
            [System]System.Collections.Generic.LinkedList`1/Enumerator<int32> CS$0$0001)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldfld valuetype
        [System]System.Collections.Generic.LinkedList`1/Enumerator<int32> 
            EnumeratorWrapper::m_Enumerator
    L_0007: stloc.1 
    L_0008: ldloca.s CS$0$0001
    L_000a: call instance bool 
        [System]System.Collections.Generic.LinkedList`1/Enumerator<int32>::MoveNext()
    L_000f: stloc.0 
    L_0010: br.s L_0012
    L_0012: ldloc.0 
    L_0013: ret 
}

Here, the important line is 0002 - m_Enumerator is accessed using the ldfld operator, which does the following:

Finds the value of a field in the object whose reference is currently on the evaluation stack.

So, what the MoveNext method is doing is the following:

public bool MoveNext() {
    LinkedList<int>.Enumerator CS$0$0001 = this.m_Enumerator;
    bool CS$1$0000 = CS$0$0001.MoveNext();
    return CS$1$0000;
}

The enumerator instance being modified by the call to MoveNext is the one stored in the CS$0$0001 variable on the stack, and not the one in the EnumeratorWrapper class instance. Hence why the state of m_Enumerator wasn't getting updated.

Hmm, ok. Well, why is it doing this? If you have a read of Eric Lippert's blog post about this issue, you'll notice he quotes a few sections of the C# spec. In particular, 7.5.4:

...if the field is readonly and the reference occurs outside an instance constructor of the class in which the field is declared, then the result is a value, namely the value of the field I in the object referenced by E.

And my m_Enumerator field is readonly! Indeed, if I remove the readonly from the class variable then the problem goes away, and the code works as expected. The IL confirms this:

.method public hidebysig newslot virtual final instance bool MoveNext() cil managed
{
    .maxstack 1
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldflda valuetype 
        [System]System.Collections.Generic.LinkedList`1/Enumerator<int32> 
            EnumeratorWrapper::m_Enumerator
    L_0007: call instance bool 
        [System]System.Collections.Generic.LinkedList`1/Enumerator<int32>::MoveNext()
    L_000c: stloc.0 
    L_000d: br.s L_000f
    L_000f: ldloc.0 
    L_0010: ret 
}

Notice on line 0002, instead of the ldfld we had before, we've got a ldflda, which does this:

Finds the address of a field in the object whose reference is currently on the evaluation stack.

Instead of loading the value, we're loading the address of the m_Enumerator field. So now the call to MoveNext modifies the enumerator stored in the class rather than on the stack, and everything works as expected.

Previously, I had thought enumerator structs were an odd but interesting feature of the BCL that I had used in the past to do linked list slices. However, effects like this only underline how dangerous mutable structs are, and I'm at a loss to explain why the enumerators were implemented as structs in the first place. (interestingly, the SortedList<TKey, TValue> enumerator is a struct but is private, which makes it even more odd - the only way it can be accessed is as a boxed IEnumerator!).

I would love to hear people's theories as to why the enumerators are implemented in such a fashion...

Note to self: never ever ever code a mutable struct.

Comments

 

Jason Haley said:

Interesting Finds: May 20, 2010
May 20, 2010 6:10 AM
 

Simon Cooper said:

My previous blog post went into some detail as to why calling MoveNext on a BCL generic collection enumerator...
May 21, 2010 4:51 AM
 

Federico said:

Only thing that comes to mind as to why is memory allocation: a struct is always on the stack or embedded in a containing object. Since enumerators are used quite often, anything saving on memory calls and heap fragmentation is useful (and structs cannot be derived, so size is fixed).
And about readonly: ever seen a const method in C++? Those methods can be called on objects declared as const, and should guarantee that they don't modify the object (except when const_casting<> it...).
An iterator over a const object is mutable, but what it references is not, and that's precisely the purpose of it.
When I tried to use readonly as const, to reference a complex object (an array usually) as unmutable, it wouldn't work, because what is readonly is the member variable, not what it refers to.
Only meaningful application for readonly are simple value objects: numbers and strings; for all the rest forget readonly.
May 23, 2010 12:15 AM
 

Interesting Finds: May 24, 2010 « Hank Wallace said:

May 24, 2010 1:58 PM
 

qwertie said:

I use mutable structs a lot -- for small pieces of data, e.g. fixed-point numbers or (X,Y) points -- they are much more efficient than classes. All you and Eric Lippert have taught me is that marking mutable structs "readonly" is a bad idea.

The C# compiler and BCL could automatically prevent the problem you had by defining some sort of "MutatorAttribute" for methods on structs that modify the struct. If MoveNext() had this attribute, the C# compiler could have automatically issued a warning/error that you are attempting to mutating a readonly struct.

This is already a non-issue for mutable properties, because the C# compiler won't allow you to call a setter on a read-only struct.
April 12, 2011 1:52 AM
You need to sign in to comment on this blog

About Simon Cooper

Simon joined Red Gate as a software developer in 2007. He started off in the SQL Tools division, working on SQL Compare 7 and 8, moved onto Schema Compare for Oracle, and is now in the .NET division working on SmartAssembly.
<May 2010>
SuMoTuWeThFrSa
2526272829301
2345678
9101112131415
16171819202122
23242526272829
303112345
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...

Geek of the Week: Don Syme
 With the arrival of F# 3.0 Microsoft announced a wide range of improvements such as type providers that... Read more...

How to Document and Configure SQL Server Instance Settings
 Occasionally, when you install identical databases on two different SQL Server instances, they will... Read more...

What's the Point of Using VARCHAR(n) Anymore?
 The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for the... Read more...