Click here to monitor SSC

Inside the Concurrent Collections: ConcurrentStack

Published 12 January 2012 12:35 pm

The first concurrent collection we’ll look at is ConcurrentStack. This is conceptually the same as System.Collections.Generic.Stack, but is geared towards occasional concurrent modifications.

Now, in these posts I won’t be looking to explain what every method does; just like my other explorations of various collection types, I’ll be concentrating on the core implementation and concepts used by the collections, and glossing over the more sordid implementation details and scaffolding required to support an ICollection.

So, without further ado, let’s get started:

ConcurrentStack

There are three basic operations you can perform on a stack:

  1. Push: Push a new value onto the top of the stack
  2. TryPeek: Peeking at the top value. In ConcurrentStack, this is quite a simple method.
  3. TryPop: Popping the top value off the stack

ConcurrentStack supplements these with PushRange and TryPopRange for pushing and popping an array of values, ToArray, and the standard ICollection methods.

The stack is implemented as a singly-linked list, with nodes represented by the ConcurrentStack<T>.Node private class:

private class Node {
    internal T m_Value;
    internal Node m_Next;
}

The top of the stack is referenced by the volatile m_head variable. An empty stack is represented by a null value in m_head. For example, if you push the integers 1, 2 and 3 onto the stack in that order, the nodes will look like this:

If you then pop a single value off the stack, m_head will point at the ’2′ node.

Pushing values

So, there are three operations required to push a new value onto the stack:

  1. Create a new Node object to hold the new value
  2. Setting the node’s m_next variable to point to the current head node
  3. Changing m_head to point to the newly-created node.

And, straight away, we’ve got the possibility of a race condition. Say Thread 1 is pushing the value ’4′ onto the stack. It’s just executed step 2:

Now, Thread 2 is scheduled on the CPU and is pushing the value ’5′. Thread 2 executes steps 1, 2 and 3:

Thread 1 then executes step 3:

The node successfully pushed by Thread 2 has disappeared from the stack.

So, what we want to do is perform step 3 only if the head hasn’t changed in the meantime. If it has been changed, then go back to step 2 to use the updated head node as the value of m_next. Furthermore, this all has to be done atomically so other threads can’t see the intermediate state.

This is where Interlocked.CompareExchange comes into play. If you recall my previous post, CompareExchange performs the following as an atomic operation:

public static T CompareExchange<T>(ref T location, T value, T comparand)
  where T : class {
    T origValue = location;
    if (location == comparand)
        location = value;
    return origValue;
}
Or, in words:

I think location has the same value as comparand. If it is, replace it with value. Return me what location actually is.

This allows us to do the ‘check and assign a new head, else try again’ operation atomically, like so:

public void Push(T item) {
    Node node = new Node(item);
    do {
        node.m_next = m_head;
    }
    while (Interlocked.CompareExchange(ref m_head, node, node.m_next) != node.m_next);
}

Or, as words:

  1. Create a new node
  2. Set the node’s next field to the current head
  3. atomically{I think the current head is the value of the node’s next field. If it is, replace it with the new node}. If the head was’t actually the same as the next field, go back to 2.

This has eliminated the race condition we had between steps 2 and 3. Only if the head hasn’t been changed is the new node pushed onto the stack as the new head. This loop repeats until CompareExchange successfully sets the head to the new node.

This can easily be generalised to a range of values in PushRange – we can simply change step 1 to construct the list of nodes holding the values being pushed, then using the m_next field of the tail node, and the top node as the new head, like so:

So that’s pushing values onto the stack. What about removing them?

Popping values

As you can expect, popping items from the stack can be done in a similar way to pushing them:

Node topNode;
do {
    topNode = m_head;
}
while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);

However, it’s not quite so simple as that. For one thing, the stack may be empty, so m_head might be null. For another, you need to be able to pop several items off the stack as a single atomic operation, which requires navigating several m_next references at once. We need to be slightly more clever about it.

Let’s cover checking for an empty stack first. Checking for null is easy, right? if (m_head == null) return false; So we put this at the start of the pop method:

public bool TryPop(out T item) {
    if (m_head == null) {
        item = default(T);
        return false;
    }

    Node topNode;
    do {
        topNode = m_head;
    }
    while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);
    
    item = topNode.m_value;
    return true;
}

However, again, we’ve got a race condition. The null check only happens once. If another thread were to empty the stack while the current thread is inside the while loop, this leaves us with a NullReferenceException lying in wait when we try and get m_head.m_next.

The simple way to fix this is to do the null check inside the loop, eliminating the race condition:

public bool TryPop(out T item) {
    Node topNode;
    do {
        topNode = m_head;
        
        if (topNode == null) {
            item = default(T);
            return false;
        }
    }
    while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);
    
    item = topNode.m_value;
    return true;
}

Sorted. So, what about popping multiple values off the stack? Again, like the null check, this has to be performed inside the while loop using the ‘snapshot’ of the head we’ve taken in topNode:

public int TryPopRange(T[] items, int count) {
    int i;
    Node topNode;
    do {
        topNode = m_head;
        Node curNode = topNode;
        i=0;

        while (i<count && curNode != null) {
            items[i] = curNode.m_value;
            curNode = curNode.m_next;
            i++;
        }
    }
    while (Interlocked.CompareExchange(ref m_head, curNode, topNode) != topNode);
    
    return i;
}

In fact, this idea of taking the current head, doing some work on it, then atomically swapping it if it hasn’t been changed in the meantime can be generalised:

  1. Take an atomic snapshot of the current state
  2. Do some work on the snapshot locally to the current thread
  3. Atomically make the work visible to all other threads, if the snapshot you took is still a valid representation of the state
  4. If it isn’t, go back to the start and try again

This is, at its core, how ConcurrentStack works.

Performing atomic operations

So, what underlying aspects of the system let us perform these operations atomically?

  1. Object references are atomic
    This lets us take an atomic snapshot of the current state simply by storing the value of m_head in a local variable; m_head will always be pointing at a valid node (or null), it won’t be ‘half-changed’.
  2. Interlocked.CompareExchange is atomic
    This ensures the state-check-and-switch won’t be afflicted by race conditions with other threads.
  3. m_head is a volatile variable
    m_head is marked as volatile. This means that any changes to that variable by one thread are instantly visible to all other threads on the system; there’s no caching to get in the way. Any threads that are attempting to modify m_head will be forced to try again, and they won’t overwrite the changes already made.

Gory implementation details

Now, I said at the start that I wouldn’t be covering gory implementation details of these collections. However, in this case, there are some important considerations.

If you decompile ConcurrentStack, you’ll notice there’s a Push method and a PushCore method (similarly for TryPop and TryPopRange). This is a small but significant performance detail; ConcurrentStack assumes there aren’t going to be any thread collisions. The public methods all optimistically attempt to perform the operation once. Only if another thread gets in the way do the corresponding Core methods get executed. It is these that contain the full loop logic, along with calls to SpinWait.SpinOnce to try and get the other threads out of the way before they try again.

In fact, TryPopCore goes one step further, and implements a randomised exponential backoff to reduce the chances of several threads all waiting the same amount of time. It does mean that, although ConcurrentStack is lockless, and so eliminates the risk of deadlock, the performance will suffer if there are many threads all modifying the stack at the same time, as each method will have to try several times before its operation succeeds.

That’s the essence of how ConcurrentStack is implemented. Next time, I’ll take a look at the second lockless concurrent collection, ConcurrentQueue, which goes about achieving thread-safety in quite a different way.

Footnote

I’m now on twitter! If you want to get in touch, comment on the series, or suggest other things I could look at, my username is @simonmcooper

2 Responses to “Inside the Concurrent Collections: ConcurrentStack”

  1. Norbert Raus says:

    Thanks for nice insight in concurrent stack.

    I think it’s worth adding few points

    * fairness – when taking about concurrency this is often an interesting characteristics. In this case ConcurrentQueue is not fair with respect to accessing thread, meaning that it is conceptually possible to have a livelock. It’s tries to minimise the chance for livelock by using backoff algorithm (as you mention) with random spin waits,

    * mindset – from my experience, working with lock-free structure exposes you to slightly different approach, I mean although the structure looks the same, some guidelines are not as good as for synchronised versions. So often you need to know the internal behaviours to effectively use lock-free class. On a surface class behaves the same but the result can be sometimes surprising due to the various internal assumptions like “safe” thread races/structure snapshots etc. (ConcurrentDictionary is an interesting example with AddOrUpdate method for which the assumption is that add/update value factories shouldn’t have any side effects – I hope you will cover ConcurrentDictionary as well :) )

    * performance – I haven’t measured it by myself, but according to Microsoft performance guidelines ConcurrentBag performs equally good and even better for mixed-mode Producer-Consumer scenarios. I think it’s worth checking those guidelines since not all concurrent collections outperforms synchronised counterparts,
    you can find link here: http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

    Thanks,
    Norbert

  2. Norbert Raus says:

    Sorry I meant ConcurrentStack rather than ConcurrentQueue/ConcurrentBag ;)

Leave a Reply