|
|
-
Posted Tuesday, January 24, 2012 10:03 PM |
ConcurrentQueue is, like ConcurrentStack, a lockless collection, in that it is implemented without using any locks at all. However, the semantics required for a queue impose a quite different approach; unlike ConcurrentStack, which has a single point of concurrent contention, a queue can be changed at both the head and tail. This means that at least two variables are involved, most likely more. The simple approach of atomically modifying a single variable won't work here.
What does System.Collections.Generic.Queue do?
Well, let's have a look at the non-concurrent queue. This is implemented as a circular buffer. However, this design is very hard to use in a lockless way; when a new item is enqueued and the backing array is full, the array is doubled in size to accommodate the new item.
To resize the array, the entire contents of the queue has to be read sequentially without using locks and copied into a brand new array by a single thread, during which other threads can enqueue and dequeue items many times. This is very tricky to implement without being subject to race conditions and still ensure high performance.
So, instead, ConcurrentQueue forgoes the circular buffer, and uses a linear array instead. More specifically, it needs to be a linear array with a (conceptually) infinite capacity - although there will only be a finite number of items stored in the queue, items can be enqueued and dequeued an unlimited number of times. Such an array can't be created as a single object.
However, you don't need to keep the entire infinite array in memory, you only need the section of the array that is actually storing items. By splitting the array into finite segments you can create and discard segments as items get enqueued and dequeued to the queue. This is what ConcurrentQueue does.
Queue Segments
To demonstrate, I'll be using a queue with a segment size of 4. To start off with, three items are enqueued. These are added to the leftmost available slot of the tail segment:
Three more items are then enqueued. A new segment is created and assigned to the tail:
And another three:
Five items are then dequeued from the head. The head segment is now empty, so it is discarded:
Using this method, the illusion of an infinite linear array can be created using a finite number of segments.
Achieving thread-safety
ConcurrentQueue uses a segment size of 32, and along with the backing array each segment stores the index items are to be added (m_high) and where they are to be removed (m_low):
private class Segment {
volatile T[] m_array;
volatile int m_high;
volatile int m_low;
}
Thread-safety, in this case, depends on the operation of both enqueuing and dequeuing. So we'll look at the operations in isolation, then put them both together.
Enqueuing
To enqueue an item, you need to:
- Increment the
m_high variable. This is now the index the item should be inserted at.
- Assign the item to the specified slot in
m_array.
The key to this is step 1. If step 1 can be done atomically, returning the new incremented value, that gives each thread trying to enqueue an item at the same time separate slots. Step 2 can then be performed concurrently by each thread without having to worry about conflicts. Interlocked.Increment performs this increment atomically:
public void Enqueue(T item) {
int insertIndex = Interlocked.Increment(ref m_high);
m_array[insertIndex] = item;
}
That's it! Simple, really...
Dequeuing
Dequeuing acts in a similar way to enqueuing:
- Increment the
m_low variable. The previous index is the index of the item to be dequeued.
- Return the item stored at the specified slot in
m_array.
And, similarly, if each thread trying to dequeue an item can be atomically 'assigned' a slot by performing step 1 using Interlocked.Increment, then step 2 can be performed concurrently by several threads:
public T Dequeue() {
// Increment returns the incremented value of the variable
int index = Interlocked.Increment(ref m_low);
return m_array[index-1];
}
But hang on, if we're dequeuing items, we need to check whether the queue is empty first. So checks need to be added to determine when a queue is empty. This depends on the segment state that represents an empty segment. In the case of ConcurrentQueue, this state is when m_low > m_high.
The exact reasons for this specific state representing an empty segment are a gory implementation detail; if you're wondering the reasons for this, then start by working out exactly what m_high and m_low represent.
public bool TryDequeue(out T item) {
if (m_low > m_high) {
item = default(T);
return false;
}
int index = Interlocked.Increment(ref m_low);
item = array[index-1];
return true;
}
All good? Well, no. We've actually introduced a subtle race condition. If we've got a queue with one item in it, and two threads trying to dequeue that one item, then if both threads perform the if (m_low > m_high) check before m_low has been incremented, one will dequeue the item and one will successfully dequeue on an empty queue.
In ConcurrentStack, a similar race condition was solved by using Interlocked.CompareExchange; taking a snapshot, doing some work, then atomically making that work visible to other threads only if the state is still valid. By judicious use of local variables and loops, we can do the same thing here:
public bool TryDequeue(out T item) {
int index;
do {
index = m_low;
// check if the queue is empty
if (index > m_high) {
item = default(T);
return false;
}
// I think m_low has the same value as index. If it is, replace it with index+1.
// Return to me what m_low actually was.
if (Interlocked.CompareExchange(ref m_low, index+1, index) == index) {
// success - index is now a valid index to dequeue, specific to this thread
item = m_array[index]
return true;
}
// m_low has been changed in the meantime. Go back to the start and try again.
}
while (true);
}
In this case, CompareExchange is acting as a conditional Increment.
Combining the two
This is where it gets complicated. The most obvious issue is another race condition, between enqueuing and dequeuing. To reiterate, the steps performed are:
Enqueuing:
- Increment the
m_high variable. This is now the index the item should be inserted at.
- Assign the item to the specified slot in
m_array.
Dequeuing:
- Check if the queue is empty (
m_low > m_high). If it is, return false.
- Increment
m_low (if m_low hasn't changed) and return the item.
Start with an empty queue. Thread 1 executes step 1 of enqueuing; m_high is incremented. Thread 2 then tries to dequeue an item; step 1 of dequeuing thinks the queue has items because m_low <= m_high. Step 2 of dequeuing then executes, and the item dequeued is the null value, because thread 1 hasn't yet assigned the enqueued item to the backing array.
Hmm, how to solve this? There's more than one variable involved here; the race condition involves the relationship between two variables, m_high and m_low, not a single variable being changed behind the scenes, which was the problem solved previously using CompareExchange and loops.
Well, we can't swap round the order of operations for enqueuing, because step 1 is what atomically assigns a slot when several threads are enqueuing items at once.
What we need is some sort of notification that enqueuing has finished assigned the item to the backing array. Then the dequeuing thread can wait for that notification before dequeuing the item from the queue:
Enqueuing
- Increment the
m_high variable. This is now the index the item should be inserted at.
- Assign the item to the specified slot in
m_array.
- Set state indicating the item at that index has been enqueued.
Dequeuing:
- Check if the queue is empty (
m_high >= m_low). If it is, return false.
- Increment
m_low (if m_low hasn't changed).
- Wait for the state to be set at the index to dequeue.
- Return the item.
This is the role played by the m_state array in ConcurrentQueue.Segment. Each segment has an int[] m_state alongside the T[] m_array; a 1 in a particular slot indicates the item has been set, and a 0 (the default value) indicates it isn't set. This turns the Enqueue method into this:
public void Enqueue(T item) {
int insertIndex = Interlocked.Increment(ref m_high);
m_array[insertIndex] = item;
m_state[insertIndex] = 1;
}
and TryDequeue into this:
public bool TryDequeue(out T item) {
int index;
do {
index = m_low;
if (index > m_high) {
item = default(T);
return false;
}
if (Interlocked.CompareExchange(ref m_low, index+1, index) == index) {
// busywait until m_state[index] is set
while (m_state[index] == 0) {}
item = m_array[index]
return true;
}
}
while (true);
}
By marking both m_array and m_state as volatile, this stops the JIT reordering accesses to those arrays, ensuring that the invariant implicit in this code (m_state[index] == 1 if and only if m_array[index] has an item in it) doesn't get broken by the JIT or processors reordering instructions.
Tweaking the control flow a bit, and sprinkling calls to SpinWait.SpinOnce() around as the abstraction of a busywait, and you've got the code of the enqueue and dequeue methods of ConcurrentQueue.
Final notes
There's a couple of things I need to briefly mention. Firstly, I haven't yet covered growing the queue. Creating new segments when the current segment gets full is largely handled by enqueuing. When an item is added to the last slot in a segment, the Enqueue method creates a new segment and assigns it as the next segment in the queue using the m_next variable. When the dequeue method reaches the end of the current segment, it waits for the m_next variable to be set before looking inside that segment to see if there's an item to dequeue.
Secondly, the eagle-eyed among you might have noticed that dequeued items don't get removed from the backing array; they are still referenced from the array when items are dequeued. I can't see any specific threading reason for this, other than giving one less location for possible threading issues. However, more importantly, this guarantees that the invariant I mentioned above is never be broken, which can be important if you're reasoning about the collection.
Keeping references around longer than they need isn't so big an issue; the items referenced in the array are dereferenced all together when the containing segment is dereferenced. However it is something to bear in mind if you have large objects referenced in a ConcurrentQueue for a long time; they might not be garbage collected when they otherwise could be.
Concurrent principles
So what principles can we extract from this examination of ConcurrentQueue?
Separating threads
Increment and CompareExchange are used to atomically assign each thread performing an action a separate slot. Each thread can then do work on that slot concurrently without conflicting with each other.
Invariants
An invariant is used to eliminate a race condition between enqueuing and dequeuing, in this case, m_state[index] == 1 iff m_array[index] has an item in it. This ensures enqueuing and dequeuing on the same slot works as it should. Note that this invariant is maintained even at the cost of keeping things in memory longer than they should.
volatile used as a memory barrier
Furthermore, volatile is used to add memory barriers ensuring the JIT doesn't reorder m_state and m_array array accesses and inadvertantly break the invariant.
Well, that's the core of the ConcurrentQueue class. I haven't even begun to cover the supporting methods, and glossed over most of the gory details. These I leave for you to explore and figure out. It's time to move on! We finally get some locks involved, and peer under the covers at ConcurrentDictionary.
|
-
Posted Tuesday, January 17, 2012 5:16 PM |
As a brief interlude from my Concurrent Collections series, I thought I would give an roundup of how the lean startup experiments have been progressing. As you can expect, there's been some good aspects and some bad aspects.
The experiments so far
After lots of discussions, arguments, posing and ruling out hypotheses, we came up with two 'starter' hypotheses we could test fairly easily:
- Customers don't agree to send data on how they use SmartAssembly; which features they use, the versions of .NET they have installed, etc, because the consent dialog doesn't make it clear the data is all anonymous and aggregated.
This is a prequisite for further experimentation. SmartAssembly isn't a webapp, with google analytics and web logs telling us everything. In order to conduct experiments on SmartAssembly, we need to know how customers are using it once it is installed on their machines, and they need to confirm it's ok for us to collect that data. Our current acceptance rate for usage reporting on SmartAssembly itself is quite low, so we needed to improve this for future experiments.
- Customers aren't using feature usage reporting and error reporting within their own applications because those options are swamped amongst all the obfuscation options.
Experiment 1
Hypothesis: Customers don't agree to send data on how they use SmartAssembly because the dialog asking for consent doesn't make it clear the data is all anonymous
The experiment for this is pretty obvious - improve the wording on the consent dialog. This change was applied to SmartAssembly 6.5. We would compare signup rates with 6.2 to see what effect, if any, this change would have.
Result: Inconclusive
We found after 6.5 had been released that we weren't collecting the right data from our download and install process to be able to accurately calculate an acceptance percentage. One of the quirks of the existing feature usage instrumentation is that the answer users give to the consent dialog is used for all future versions of the product with the same major version.
Since 6.5 is a minor version upgrade from 6.2, this means we couldn't differentiate between an existing customer downloading to upgrade from <6.2 to 6.5 (who wouldn't be presented with the new consent dialog), and a new user downloading for the first time. The data we collected couldn't be interpreted one way or the other; there were too many other variables.
Experiment 2
Hypothesis: Customers aren't using feature usage reporting and error reporting within their own applications because those options are swamped amongst all the obfuscation options
To perform this experiment, we produced a version of SmartAssembly that only had merging, signing, feature usage and error reporting available. We only wanted to present this version to customers whom we knew were downloading SmartAssembly for the error and/or feature usage reporting. The only place we could reasonably guarantee this was a reporting-specific landing page. So, the download link on that page would be A/B tested between the reporting-only and standard version of SmartAssembly.
Result: Not enough data
Our limited scope for this test - one specific landing page amongst the many pages on SmartAssembly where customers could download - meant we got very few people downloading the reporting-only version of SmartAssembly; not enough for any conclusion to be drawn one way or the other.
We had also added a 'cop-out' link on the download page so people could guarantee to get the standard version, if they did happen to download the reporting-only version and wonder where all the obfuscation went. We suspect a lot of users clicked this link; unfortunately, it was untracked, so we don't really know.
It's not all bad news...
So, the first experiments we performed didn't really work. However, that doesn't mean they weren't useful. We worked through a lot of the infrastructure issues that restricted our experimentation, and, most importantly, we learnt things:
- Make sure you know what the experiment is for, and what data you will collect to validate or refute your hypotheses. Check the data you will be collecting will make sense.
- For an A/B test, ensure you will have a sensible number of users on both sides to able to draw conclusions.
- Don't allow users to opt-out of experimental builds, even if it's not marketed as an 'experimental' version. Don't blur the boundaries of experimental and non-experimental builds.
So what now?
We've decided we're going down the wrong route. We shouldn't be trying to target existing users of SmartAssembly, who are looking primarily for an obfuscation tool, we need to target new users. We've decided to pivot and create a new product, a new website, and a new analytics platform. If you're interested, there are samples of the kind of data you could have on your applications at http://www.applicationmetrics.com, as well as a sign up to receive more information and take part in our experiments.
|
-
Posted Thursday, January 12, 2012 12:35 PM |
The first concurrent collection we'll look at is ConcurrentStack. This is conceptually the same as System.Collections.Generic.Stack, but is geared towards occasional concurrent modifications.
Now, in these posts I won't be looking to explain what every method does; just like my other explorations of various collection types, I'll be concentrating on the core implementation and concepts used by the collections, and glossing over the more sordid implementation details and scaffolding required to support an ICollection.
So, without further ado, let's get started:
ConcurrentStack
There are three basic operations you can perform on a stack:
Push: Push a new value onto the top of the stack
TryPeek: Peeking at the top value. In ConcurrentStack, this is quite a simple method.
TryPop: Popping the top value off the stack
ConcurrentStack supplements these with PushRange and TryPopRange for pushing and popping an array of values, ToArray, and the standard ICollection methods.
The stack is implemented as a singly-linked list, with nodes represented by the ConcurrentStack<T>.Node private class:
private class Node {
internal T m_Value;
internal Node m_Next;
}
The top of the stack is referenced by the volatile m_head variable. An empty stack is represented by a null value in m_head. For example, if you push the integers 1, 2 and 3 onto the stack in that order, the nodes will look like this:
If you then pop a single value off the stack, m_head will point at the '2' node.
Pushing values
So, there are three operations required to push a new value onto the stack:
- Create a new
Node object to hold the new value
- Setting the node's
m_next variable to point to the current head node
- Changing
m_head to point to the newly-created node.
And, straight away, we've got the possibility of a race condition. Say Thread 1 is pushing the value '4' onto the stack. It's just executed step 2:
Now, Thread 2 is scheduled on the CPU and is pushing the value '5'. Thread 2 executes steps 1, 2 and 3:
Thread 1 then executes step 3:
The node successfully pushed by Thread 2 has disappeared from the stack.
So, what we want to do is perform step 3 only if the head hasn't changed in the meantime. If it has been changed, then go back to step 2 to use the updated head node as the value of m_next. Furthermore, this all has to be done atomically so other threads can't see the intermediate state.
This is where Interlocked.CompareExchange comes into play. If you recall my previous post, CompareExchange performs the following as an atomic operation:
public static T CompareExchange<T>(ref T location, T value, T comparand)
where T : class {
T origValue = location;
if (location == comparand)
location = value;
return origValue;
}
Or, in words:
I think location has the same value as comparand. If it is, replace it with value. Return me what location actually is.
This allows us to do the 'check and assign a new head, else try again' operation atomically, like so:
public void Push(T item) {
Node node = new Node(item);
do {
node.m_next = m_head;
}
while (Interlocked.CompareExchange(ref m_head, node, node.m_next) != node.m_next);
}
Or, as words:
- Create a new node
- Set the node's next field to the current head
- atomically{I think the current head is the value of the node's next field. If it is, replace it with the new node}. If the head was't actually the same as the next field, go back to 2.
This has eliminated the race condition we had between steps 2 and 3. Only if the head hasn't been changed is the new node pushed onto the stack as the new head. This loop repeats until CompareExchange successfully sets the head to the new node.
This can easily be generalised to a range of values in PushRange - we can simply change step 1 to construct the list of nodes holding the values being pushed, then using the m_next field of the tail node, and the top node as the new head, like so:
So that's pushing values onto the stack. What about removing them?
Popping values
As you can expect, popping items from the stack can be done in a similar way to pushing them:
Node topNode;
do {
topNode = m_head;
}
while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);
However, it's not quite so simple as that. For one thing, the stack may be empty, so m_head might be null. For another, you need to be able to pop several items off the stack as a single atomic operation, which requires navigating several m_next references at once. We need to be slightly more clever about it.
Let's cover checking for an empty stack first. Checking for null is easy, right? if (m_head == null) return false; So we put this at the start of the pop method:
public bool TryPop(out T item) {
if (m_head == null) {
item = default(T);
return false;
}
Node topNode;
do {
topNode = m_head;
}
while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);
item = topNode.m_value;
return true;
}
However, again, we've got a race condition. The null check only happens once. If another thread were to empty the stack while the current thread is inside the while loop, this leaves us with a NullReferenceException lying in wait when we try and get m_head.m_next.
The simple way to fix this is to do the null check inside the loop, eliminating the race condition:
public bool TryPop(out T item) {
Node topNode;
do {
topNode = m_head;
if (topNode == null) {
item = default(T);
return false;
}
}
while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode);
item = topNode.m_value;
return true;
}
Sorted. So, what about popping multiple values off the stack? Again, like the null check, this has to be performed inside the while loop using the 'snapshot' of the head we've taken in topNode:
public int TryPopRange(T[] items, int count) {
int i;
Node topNode;
do {
topNode = m_head;
Node curNode = topNode;
i=0;
while (i<count && curNode != null) {
items[i] = curNode.m_value;
curNode = curNode.m_next;
i++;
}
}
while (Interlocked.CompareExchange(ref m_head, curNode, topNode) != topNode);
return i;
}
In fact, this idea of taking the current head, doing some work on it, then atomically swapping it if it hasn't been changed in the meantime can be generalised:
- Take an atomic snapshot of the current state
- Do some work on the snapshot locally to the current thread
- Atomically make the work visible to all other threads, if the snapshot you took is still a valid representation of the state
- If it isn't, go back to the start and try again
This is, at its core, how ConcurrentStack works.
Performing atomic operations
So, what underlying aspects of the system let us perform these operations atomically?
- Object references are atomic
This lets us take an atomic snapshot of the current state simply by storing the value of m_head in a local variable; m_head will always be pointing at a valid node (or null), it won't be 'half-changed'.
Interlocked.CompareExchange is atomic
This ensures the state-check-and-switch won't be afflicted by race conditions with other threads.
m_head is a volatile variable
m_head is marked as volatile. This means that any changes to that variable by one thread are instantly visible to all other threads on the system; there's no caching to get in the way. Any threads that are attempting to modify m_head will be forced to try again, and they won't overwrite the changes already made.
Gory implementation details
Now, I said at the start that I wouldn't be covering gory implementation details of these collections. However, in this case, there are some important considerations.
If you decompile ConcurrentStack, you'll notice there's a Push method and a PushCore method (similarly for TryPop and TryPopRange). This is a small but significant performance detail; ConcurrentStack assumes there aren't going to be any thread collisions. The public methods all optimistically attempt to perform the operation once. Only if another thread gets in the way do the corresponding Core methods get executed. It is these that contain the full loop logic, along with calls to SpinWait.SpinOnce to try and get the other threads out of the way before they try again.
In fact, TryPopCore goes one step further, and implements a randomised exponential backoff to reduce the chances of several threads all waiting the same amount of time. It does mean that, although ConcurrentStack is lockless, and so eliminates the risk of deadlock, the performance will suffer if there are many threads all modifying the stack at the same time, as each method will have to try several times before its operation succeeds.
That's the essence of how ConcurrentStack is implemented. Next time, I'll take a look at the second lockless concurrent collection, ConcurrentQueue, which goes about achieving thread-safety in quite a different way.
Footnote
I'm now on twitter! If you want to get in touch, comment on the series, or suggest other things I could look at, my username is @simonmcooper
|
-
Posted Thursday, January 05, 2012 2:41 PM |
The concurrent collections, located in the System.Collections.Concurrent namespace, were introduced in .NET 4 as thread-safe collections that could be used without locking, and there are plenty of posts and articles out on the interwebs giving an overview of the collections and how to use them.
Instead, I'll be focusing these posts on how the concurrent collections are implemented; how they achieve thread-safety and the principles behind their implementation. Not only because I find that sort of thing quite interesting, but also to give you ideas about how you might use these principles in your own code.
Note however, that writing bulletproof thread-safe collections is hard. Really hard. Think carefully about somehow using one of the existing collections before trying to write or adapt your own.
What is a 'thread-safe' collection?
First of all, we need to understand what we mean by 'thread-safe'. Well, let's start with the repository of all human knowledge - wikipedia:
A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time
OK, well, as an example, if m_Collection is some sort of 'thread-safe' ICollection, what will the result of these two threads running concurrently do?
| Thread 1 |
Thread 2 |
m_Collection.Add(m_Item); |
bool removed = m_Collection.Remove(m_Item); |
The answer depends on exactly which order Thread 1 and Thread 2 run with respect to each other. So, whether m_Item is in m_Collection after these two threads have run depends entirely on the whim of the operating system. Thread 2 could try and remove an item that's not in the collection, setting removed to false, then Thread 1 adds the item to the collection. Or Thread 1 could run first, adding the item, which is then immediately removed by Thread 2, setting removed to true. Thread 1 could then merrily carry on executing, assuming m_Item is in m_Collection, not knowing it's been immediately removed by Thread 2.
That, however, is an implementation detail of whoever wrote the code for Thread 1 and Thread 2. The important thing is that, after these two threads have run the above code in some order, m_Collection will either have the item in it, or not; it won't be in some corrupted half-state where some internal datastructures think it has the item and some don't, so that (for example) m_Collection.Contains(m_Item) returns false but the enumerator still returns the item. So, I propose that this is what is meant by a thread-safe collection:
A thread-safe collection is one that can be modified by multiple threads at the same time without corrupting itself.
Achieving concurrency
There is a very simple way of writing a thread-safe collection - implement ICollection<T> by wrapping a List<T> that locks the backing list object on every method call (if you're unclear about what locks are, I'll be covering them below). This means that only one thread can access and modify the backing list at once; thread-safety is therefore maintained. However, this isn't a very good solution; many threads could be blocked by one costly list resize operation. This solution doesn't have very good concurrency.
Ideally, we want collections that don't keep threads waiting. As we'll see, two of the collections (ConcurrentStack and ConcurrentQueue) achieve thread-safety using no locks at all, and the other collections minimise the chances of two threads blocking on the same lock.
Locking and synchronization primitives
You can't just create a lockless thread-safe datastructure out of nowhere, you need some underlying support from the hardware, operating system and runtime to achieve this. Before we look at the collections themselves, we need to understand these primitives and what guarantees they provide.
- Atomic types
Some types in .NET are specified as being 'atomic'. That is, if you change the value of a variable of an atomic type, it is guaranteed to appear to change value instantaneously; another thread won't be able to see the variable 'half-changed'. Object references and primitive types of 4 bytes or shorter are always atomic (ints, chars, booleans, floats etc), but 8-byte primitives are only atomic when running on a 64-bit process. The atomicity of object references is especially important to lockless collections, as we'll see in later posts.
- Volatile variables
I discussed volatility in a previous post. To recap, marking a variable as volatile means that:
- The JIT compiler won't cache the variable in a cpu register; it will always read and write to it using the original memory location. This is important when multiple threads are changing a variable at the same time, so that any changes made to the variable in one thread are immediately picked up by other threads.
- A read or write to a volatile location introduces a memory barrier at that point, so that other reads and writes won't be reordered with respect to each other. This is important later on, as we'll see.
- Locks
Mutual-exclusion locks are one of the more basic synchronization primitives. Each object has a lock associated with it, and by locking on an object you only allow one thread to have 'ownership' of that lock at a time. This allows only one thread to execute the section of code protected by that lock at a time.
When the currently executing thread 'releases' the lock, another thread (and only one other thread) is then allowed to take control of the lock and execute the protected code. Any other threads waiting to take the lock are blocked and cannot continue executing until it is their turn to take the lock.
C# has special syntax for locking on an object:
private readonly object m_LockObj = new object();
public void SynchronizedMethod() {
lock (m_LockObj) {
// protected code
}
}
C# actually compiles this method to something like this:
public void SynchronizedMethod() {
bool lockTaken = false;
object lockObj = m_LockObj;
try {
System.Threading.Monitor.Enter(lockObj, ref lockTaken);
// protected code
}
finally {
if (lockTaken)
System.Threading.Monitor.Exit(lockObj);
}
}
This uses the System.Threading.Monitor class to implement the lock. The call to Monitor.Enter blocks the thread until it can take control of the lock, and the lock is released by Monitor.Exit.
System.Threading.Interlocked
Interlocked provides various methods for performing operations that wouldn't normally be atomic in an atomic fashion. For example, Interlocked.Read allows an 8-byte long to be read as an atomic operation (remember, 8-byte primitives are only atomic on 64-bit processes), Interlocked.Add allows you to perform a = a + b (aka a+=b) atomically, Interlocked.Decrement performs a = a - 1 (aka --a) atomically, etc.
The most important of these is Interlocked.CompareExchange. This family of methods performs the following operation atomically (using the generic overload as an example):
public static T CompareExchange<T>(ref T location, T value, T comparand)
where T : class {
T originalValue = location;
if (location == comparand)
location = value;
return originalValue;
}
This might not seem like a particularly useful atomic operation, but it is crucial to the lockless collections, as we'll see.
System.Threading.SpinWait
Introduced in .NET 4, this structure encapsulates the idea of a 'spinwait':
while (!variableThatShouldBeTrueToContinue) {}
This keeps the thread continually spinning round the while loop, repeatedly checking whether another thread has set variableThatShouldBeTrueToContinue to true. However, performing such a spinwait gives no guarantees that the thread that is meant to set variableThatShouldBeTrueToContinue will be given the chance to execute; the operating system could, if it wishes, simply keep on executing the spinwait and not switch to other threads that actually change this variable.
System.Threading.SpinWait gets round this problem by, as well as simply spinning, occasionally calling Thread.Sleep and Thread.Yield. This will respectively encourage and force the operating system to execute other threads, giving a chance for the spinning condition to be satisfied.
Using the concurrent collections
At this point it's also worth pointing out that using a thread-safe collection will not make the rest of your code thread-safe and free of race conditions; it is not a panacea. As in the example at the top of this post, using a thread-safe collection does not stop a race condition between adding and removing the same item from the collection. Using a thread-safe collection simply means you have one less thing to worry about when writing threaded code. You still have the rest of your code to worry about!
That's it for introductions; in the next post, we'll look at the simplest of the concurrent collections, ConcurrentStack.
|
-
Posted Friday, December 23, 2011 2:02 PM |
If you've ever had a poke around System.dll or System.Core.dll in Reflector, you may have noticed TypeForwardedToAttributes applied to the assembly:
[assembly: TypeForwardedTo(typeof(Lazy<>))]
[assembly: TypeForwardedTo(typeof(LazyThreadSafetyMode))]
[assembly: TypeForwardedTo(typeof(Action))]
[assembly: TypeForwardedTo(typeof(Action<,>))]
[assembly: TypeForwardedTo(typeof(Action<,,>))]
[assembly: TypeForwardedTo(typeof(Action<,,,>))]
This post has a look at what these are, and how they're implemented.
Type forwards
TypeForwardedToAttribute is part of a feature introduced in .NET 2 - Type forwarding. As the documentation says, this is a feature that allows a type to be moved to a different assembly without having to recompile assemblies using that type. This is used extensively by the class libraries when types were moved from System.Core.dll to mscorlib.dll between .NET 3.5 and 4, allowing assemblies compiled against .NET 2 and 3.5 to run on the .NET 4 framework as-is, without having to be recompiled.
However, if you think about it, this is much more than a simple attribute; this is a core change to the CLR type resolution mechanism. Every type that is resolved in an assembly first has to check for the existance of a type forward indicating the type has been moved somewhere else.
This isn't something that can easily be represented in a simple attribute; TypeForwardedToAttribute is actually an example of a pseudo custom attribute.
ExportedType
First, a bit of background. The ExportedType metadata table was originally designed to be used in multi-module assemblies as part of the 'public contract' of an assembly; the manifest module for the assembly would contain an entry in ExportedType for every public type defined in other modules, comprising
TypeName & TypeNamespace: the exported type's full name
Implementation: the module (or nested ExportedType) the type can be found.
TypeDefId: the index within the TypeDef table in the module the type is located.
This allows any tool referencing the multi-module assembly to only need to look in the manifest module to find every public type defined in the assembly and where it can be found; it doesn't have to scan every module comprising the assembly.
When .NET 2 came along, ExportedType was repurposed to store type forwards as well. If Implementation is a reference to an assembly rather than a module, then that assembly is the new location of the type named by TypeName and TypeNamespace (type forwards don't allow you to change the type name or namespace).
So, when the C# compiler sees a type forward in System.Core.dll specified by the attribute
[assembly: TypeForwardedTo(typeof(Action))]
the Action type is resolved using the normal C# type resolution rules to [mscorlib]System.Action, and the compiler generates an entry in ExportedType like so:
TypeName: Action
TypeNamespace: System
Implementation: assembly reference to mscorlib
TypeDefId: 0 (type forwards don't use this field)
this entry is then followed by the core CLR type resolution mechanism so that any references to [System.Core]System.Action are transparently redirected to [mscorlib]System.Action at runtime; assemblies using the forwarded type don't have to be recompiled.
TypeForwardedFrom
Finally, TypeForwardedFromAttribute is the counterpart to TypeForwardedToAttribute; it specifies the assembly a type has been forwarded from (using an assembly name string, rather than a direct metadata assembly reference). However, unlike TypeForwardedTo, this is a normal attribute, has no effect on CLR type resolution, and exists primarily for bookkeeping purposes.
Type forwards may seem like a niche feature aimed at library writers, but they are invaluable in the right circumstances. In the BCL, they allow programs compiled against the .NET 2 and 3.5 frameworks to run on the .NET 4 framework without needing to be recompiled.
|
-
Posted Monday, December 12, 2011 1:02 PM |
Normally, virtual method overrides in .NET are done implicitly; if a subclass has a virtual method with the same name and signature as a virtual method in a base class, then the method in the subclass overrides the method in the base class:
.class public BaseClass {
.method public instance virtual string Method1(int32 i1, object o1) { ... }
}
.class public SubClass extends BaseClass {
// this method implicitly overrides BaseClass::Method1
.method public instance virtual string Method1(int32 i1, object o1) { ... }
}
This is in contrast to C# and VB, where a 'base' virtual method has to be declared as virtual, and all overrides of that method in subclasses have to be declared as override. But, similarly to IL, exactly which method it overrides in a base type depends on the name and signature of the methods concerned.
Explicit interface implementations
So what happens with explicit interface implementations? Well, IL and the assembly metadata allow you to specify a MethodImpl, which forces a method in the class to override one with a different name (but the same signature) in a base class. C# uses this mechanism to implement explicit interface overrides; this C#:
public interface IInterface {
string MyMethod(string s1);
}
public class MyClass : IInterface {
public string MyMethod2(string s1) { ... }
// explicit implementation of IInterface.MyMethod
string IInterface.MyMethod(string s1) { ... }
}
is compiled to this IL:
.class public interface abstract IInterface {
.method public abstract virtual instance string MyMethod(string s1) {}
}
.class public MyClass implements IInterface {
// normal MyMethod2 method
.method public instance string MyMethod2(string s1) {}
// explicit implementation of IInterface.MyMethod
.method private virtual final
instance string MyApp.IInterface.MyMethod(string s1) {
.override IInterface::MyMethod
...
}
}
There are several interesting things to note here:
- The
.override directive specifies the explicit override. Here, it forces MyClass::MyApp.IInterface.MyMethod to override IInterface::MyMethod.
- The explicit override method is named
MyApp.IInterface.MyMethod. There is no necessity for any explicit overrides to be named anything in particular, but using this name format guarantees there won't be any name conflicts in the same class; neither with normal C# code, as that can't create method names with dots in, nor with any other explicit implementations, as the name contains the C# fully qualified name of the overridden method.
- The explicit override method is marked
virtual final. In order to take part in any virtual method lookup, a method has to be declared virtual, even if the lookup information is 'forced' by an .override directive.
- The explicit override method is
private. Although implicit overrides have to have at least the same accessibility as the method they're overriding (ie you can't implicitly override a public method with an internal one), these rules don't apply to explicit overrides. In this case, the explicit override can only be called within MyClass or through the IInterface interface.
Abusing explicit overrides
Although .override is used by the C# compiler for explicit interface implementation, it is not limited to that. In fact, the only specific rules governing the use of .override are:
- The signatures of the methods have to match (as I explored in an earlier post, you can't implement override variance using explicit overrides). Explicit overrides only allow you to change the name.
- The overridden method has to be in the type hierarchy of the containing class.
- Both overridden and overriding methods have to be virtual.
- The overridden method can't be final.
That's it. In particular, notice how there aren't any restrictions on whether the overridden method is on an interface or concrete base class:
.class public ToStringExplicitOverride extends [mscorlib]System.Object {
.method private virtual instance string ExplicitToString() {
.override [mscorlib]System.Object::ToString
ldstr "Explicit override of object.ToString!"
ret
}
}
or on the number of .override declarations on a particular method:
.class public ClassA {
.method public virtual instance void Method1(string s, object o) { ... }
.method public virtual instance void Method2(string s, object o) { ... }
}
.class public ClassB extends ClassA {
.method public virtual instance void Method3(string s, object o) {
.override ClassA::Method1
.override ClassA::Method2
}
}
or that the overriding method has to be private:
.class public ClassA {
.method public virtual instance void Method1(string s, object o) { ... }
}
.class public ClassB extends ClassA {
.method public virtual instance void Method2(string s, object o) {
.override ClassA::Method1
}
}
.class public ClassC extends ClassB {
.method public virtual instance void Method2(string s, object o) {
// this method will be invoked whenever ClassA::Method1
// is called on an instance of ClassC
}
}
or that the use of explicit and implicit overrides are exclusive of each other:
.class public ClassA {
.method public virtual instance void Method1(string s, object o) { ... }
.method public virtual instance void Method2(string s, object o) { ... }
}
.class public ClassB extends ClassA {
.method public virtual instance void Method1(string s, object o) {
.override ClassA::Method2
// implements ClassA::Method1 implicitly, and ClassA::Method2 explicitly
// this method will be invoked if ClassA::Method1 or ClassA::Method2
// is called on an instance of ClassB
}
.method public virtual instance void Method2(string s, object o) {
// normally, this would implicitly override ClassA::Method2
// but this method will NOT be called if ClassA::Method2 is invoked
// as Method1 explicitly overrides ClassA::Method2 instead
}
}
or any combination of these. The use of explicit overrides is limited only by the compiler writer's ingenuity.
Alternate explicit override syntax
As well as specifying .override inside the method body, you can also specify it in the class itself:
.class public EqualsClass extends [mscorlib]System.Object {
.method public virtual instance bool AlternateEquals(object o) { ... }
.override [mscorlib]System.Object::Equals with instance bool AlternateEquals(object)
}
This leads onto the question as to whether you can use this syntax to link two disparate arms of a type's inheritance heirarchy. As an example using implicit implementations:
.class public interface IInterface {
.method public virtual abstract instance string Method1() {}
}
.class public abstract BaseClass {
.method public virtual instance string Method1() {
ldstr "foo"
ret
}
}
.class public SubClass extends BaseClass implements IInterface {
// SubClass implements IInterface::Method1 using BaseClass::Method1
}
but does this work for explicit implementations?
.class public interface IInterface {
.method public virtual abstract instance string Method1() {}
}
.class public abstract BaseClass {
.method public virtual instance string Method2() {
ldstr "foo"
ret
}
}
.class public SubClass extends BaseClass implements IInterface {
.override IInterface::Method1 with instance string BaseClass::Method2()
}
well, ilasm compiles it successfully, but it fails verification, and the CLR fails to load the dll:
[MD]: Error: Class implements interface but not method
(class:0x02000004; interface:0x06000001; method:0x0069004d).
[token:0x09000001]
[MD]: Error: MethodImpl has body from another TypeDef
(token=0x02000003).
[token:0x00000001]
Although this can be represented in the assembly metadata, the CLR specification disallows explicit overrides using methods in a base class (section 10.3.2). The overriding method must be in the same type as the .override directive.
This is a shame; this seems like a rather arbitary limitation, given that implicit implementations using base class methods work without errors. It would be interesting to hear from the CLR team as to why this limitation exists in the specification.
In the meantime, you can create a stub method in the implementing class to call the correct method on a base class (which is a technique sometimes used by the C# compiler).
|
-
Posted Monday, November 21, 2011 2:27 PM |
Over the next few weeks, we'll be performing experiments on SmartAssembly to confirm or refute various hypotheses we have about how people use the product, what is stopping them from using it to its full extent, and what we can change to make it more useful and easier to use. Some of these experiments can be done within the team, some within Red Gate, and some need to be done on external users.
External testing
Some external testing can be done by standard usability tests and surveys, however, there are some hypotheses that can only be tested by building a version of SmartAssembly with some things in the UI or implementation changed. We'll then be able to look at how the experimental build is used compared to the 'mainline' build, which forms our baseline or control group, and use this data to confirm or refute the relevant hypotheses.
However, there are several issues we need to consider before running experiments using separate builds:
Ideally, the user wouldn't know they're running an experimental SmartAssembly. We don't want users to use the experimental build like it's an experimental build, we want them to use it like it's the real mainline build. Only then will we get valid, useful, and informative data concerning our hypotheses.
There's no point running the experiments if we can't find out what happens after the download. To confirm or refute some of our hypotheses, we need to find out how the tool is used once it is installed. Fortunately, we've applied feature usage reporting to the SmartAssembly codebase itself to provide us with that information.
Of course, this then makes the experimental data conditional on the user agreeing to send that data back to us in the first place. Unfortunately, even though this does limit the amount of useful data we'll be getting back, and possibly skew the data, there's not much we can do about this; we don't collect feature usage data without the user's consent. Looks like we'll simply have to live with this.
What if the user tries to buy the experiment? This is something that isn't really covered by the Lean Startup book; how do you support users who give you money for an experiment? If the experiment is a new feature, and the user buys a license for SmartAssembly based on that feature, then what do we do if we later decide to pivot & scrap that feature? We've either got to spend time and money bringing that feature up to production quality and into the mainline anyway, or we've got disgruntled customers. Either way is bad. Again, there's not really any good solution to this.
Similarly, what if we've removed some features for an experiment and a potential new user downloads the experimental build? (As I said above, there's no indication the build is an experimental build, as we want to see what users really do with it). The crucial feature they need is missing, causing a bad trial experience, a lost potential customer, and a lost chance to help the customer with their problem. Again, this is something not really covered by the Lean Startup book, and something that doesn't have a good solution.
So, some tricky issues there, not all of them with nice easy answers. Turns out the practicalities of running Lean Startup experiments are more complicated than they first seem!
|
-
Posted Tuesday, November 15, 2011 6:00 PM |
There's a new movement rumbling around Red Gate Towers - the Lean Startup. At its core is the idea that you don't have to be in a company with single-digit employees to be an entrepreneur; you simply have to (being blunt) not know what you should be doing. Specifically, you accept that you don't know everything you need to know in order to create a useful, successful & profitable product.
This is something that Red Gate has had problems with in the past; we've created products that weren't aimed at the correct market, or didn't solve the problem the user had (although they solved the problem we thought the users had, or the problem the users thought they had). As a result, these products weren't as successful as they could have been.
The ideas at the core of the Lean Startup help to combat this tendency to build large, well-engineered products that solve the wrong problem. You need to actually test your hypotheses about what the users and the market needs, rather than just running a project based on those untested assumptions. Furthermore, these tests need to be done as fast as possible (on the order of a week) so that, if necessary, you can change the direction of the project without wasting effort going down a dead end. Over time, as more tests are done and more hypotheses are confirmed or refuted, the project moves towards something that solves users' actual problems.
However, re-aligning the development teams that operate within Red Gate along these lines does itself have some issues; we've got very good at doing large, monolithic releases, with a feature set decided well in advance. Currently it takes about 2 weeks to do install & release testing before a release; this is clearly not practicable for a team doing weekly, or even daily releases. There's also many infrastructure issues to be solved; in our source control, build system, release mechanism, support pages & documentation, licensing system, update system, and download pages. All these need modifications to allow the fast releases necessary for each experiment.
Not only do we have to change our infrastructure, we have to change our mindset. Doing daily releases means each release won't get nearly as much testing as 'standard' releases. As a team, we have to be prepared that there will be releases that have bugs and issues with them; not only do we have to be prepared to change direction with every experiment we do, but we have to be ready to fix any bugs that are reported very quickly as well.
The SmartAssembly team is spearheading this move towards leanness within the company, using Feature Usage Reporting (FUR). We think this is a cracking feature that will really help developers
learn how people use their products, but we need to confirm this
hypothesis. So, over the next few weeks, we'll be running a variety of experiments on SmartAssembly to either confirm or refute our hypotheses concerning how people use SmartAssembly and apply FUR to their own products.
In the rest of this series, I'll be documenting how the experiments we perform get on, and our experiences with applying the Lean Startup model to a mature product like SmartAssembly.
|
-
Posted Monday, November 14, 2011 11:10 AM |
You may have noticed that not a lot has happened to SmartAssembly in the past few months. However, the team has been very busy behind the scenes working on an entirely new version of SmartAssembly.
SmartAssembly 6.5
Over the past few releases of SmartAssembly, the team had come to the realisation that the current 'architecture' - grown organically, way before RedGate bought it, from a simple name obfuscator over the years into a full-featured obfuscator and assembly instrumentation tool - was simply not up to the task. Not for what we wanted to do with it at the time, and not what we have planned for the future.
Not only was it not up to what we wanted it to do, but it was severely limiting our development capabilities; long-standing bugs in the root architecture that couldn't be fixed, some rather...interesting...design decisions, and convoluted logic that increased the complexity of any bugfix or new feature tenfold.
So, we set out to fix this. Earlier this year, a new engine was written on which SmartAssembly would be based. Over the following few months, each feature was ported over to the new engine and extensively tested by our existing unit and integration tests. The engine was linked into the existing UI (no easy task, due to the tight coupling between the UI and old engine), and existing RedGate products were tested on the new SmartAssembly to ensure the new engine acted in the same way. The result is SmartAssembly 6.5.
The risks of a rearchitecture
Are there risks to rearchitecting a product like SmartAssembly? Of course. There was a lot of undocumented behaviour in the old engine, and as part of the rearchitecture we had to find this behaviour, define it, and document it. In the process we found some behaviour of the old engine that simply did not make sense; hence the changes in pruning & obfuscation behaviour in the release notes. All the special edge cases we had to find, document, and re-implement.
There was a chance that these special cases would not be found until near the end of the project, when everything is functionally complete and interacting together. By that stage, it would be hard to go back and change anything without a whole lot of extra work, delaying the release by months. We always knew this was a possibility; our initial estimate of the time required was '4 months, ± 4 months'. And that was including various mitigation strategies to reduce the likelihood of these issues being found right at the end. Fortunately, this worst-case did not happen.
However, the rearchitecture did produce some benefits. As well as numerous bug fixes that we could not fix any other way, we've also added logging that lets you find out exactly why a particular field or property wasn't pruned or obfuscated. There's a new command line interface, we've tested it with WP7.1 and Silverlight 5, and we've added a new option to error reporting to improve the performance of instrumented apps by ~10%, at the cost of inaccurate line numbers in reports.
So? What differences will I see?
Largely none. SmartAssembly 6.5 produces the same output as SmartAssembly 6.2. The performance of 6.5 will be much faster for some users, and generally the same as 6.2 for the remaining. If you've encountered a bug with previous versions of SmartAssembly, I encourage you to try 6.5, as it has most likely been fixed in the rearchitecture. If you encounter a bug with 6.5, please do tell us; we'll be doing another release quite soon, so we'll aim to fix any issues caused by 6.5 in that release.
Most importantly, the new architecture finally allows us to implement some Big Things with SmartAssembly we've been planning for many months; these will fundamentally change how you build, release and monitor your application. Stay tuned for further updates!
|
-
Posted Friday, October 21, 2011 11:03 AM |
Although the collections classes introduced in .NET 2, 3.5 and 4 cover most scenarios, there are still some .NET 1 collections that don't have generic counterparts. In this post, I'll be examining what they do, why you might use them, and some things you'll need to bear in mind when doing so.
BitArray
System.Collections.BitArray is conceptually the same as a List<bool>, but whereas List<bool> stores each boolean in a single byte (as that's what the backing bool[] does), BitArray uses a single bit to store each value, and uses various bitmasks to access each bit individually. This means that BitArray is eight times smaller than a List<bool>. Furthermore, BitArray has some useful functions for bitmasks, like And, Xor and Not, and it's not limited to 32 or 64 bits; a BitArray can hold as many bits as you need.
However, it's not all roses and kittens. There are some fundamental limitations you have to bear in mind when using BitArray:
It's a non-generic collection. The enumerator returns object (a boxed boolean), rather than an unboxed bool. This means that if you do this:
foreach (bool b in bitArray) { ... }
Every single boolean value will be boxed, then unboxed. And if you do this:
foreach (var b in bitArray) { ... }
you'll have to manually unbox b on every iteration, as it'll come out of the enumerator an object. Instead, you should manually iterate over the collection using a for loop:
for (int i=0; i<bitArray.Length; i++) {
bool b = bitArray[i];
...
}
- Following on from that, if you want to use BitArray in the context of an
IEnumerable<bool>, ICollection<bool> or IList<bool>, you'll need to write a wrapper class, or use the Enumerable.Cast<bool> extension method (although Cast would box and unbox every value you get out of it).
- There is no
Add or Remove method. You specify the number of bits you need in the constructor, and that's what you get. You can change the length yourself using the Length property setter though.
- It doesn't implement
IList. Although not really important if you're writing a generic wrapper around it, it is something to bear in mind if you're using it with pre-generic code.
However, if you use BitArray carefully, it can provide significant gains over a List<bool> for functionality and efficiency of space.
OrderedDictionary
System.Collections.Specialized.OrderedDictionary does exactly what you would expect - it's an IDictionary that maintains items in the order they are added. It does this by storing key/value pairs in a Hashtable (to get O(1) key lookup) and an ArrayList (to maintain the order). You can access values by key or index, and insert or remove items at a particular index. The enumerator returns items in index order.
However, the Keys and Values properties return ICollection, not IList, as you might expect; CopyTo doesn't maintain the same ordering, as it copies from the backing Hashtable, not ArrayList; and any operations that insert or remove items from the middle of the collection are O(n), just like a normal list.
In short; don't use this class. If you need some sort of ordered dictionary, it would be better to write your own generic dictionary combining a Dictionary<TKey, TValue> and List<KeyValuePair<TKey, TValue>> or List<TKey> for your specific situation.
ListDictionary and HybridDictionary
To look at why you might want to use ListDictionary or HybridDictionary, we need to examine the performance of these dictionaries compared to Hashtable and Dictionary<object, object>. For this test, I added n items to each collection, then randomly accessed n/2 items:
So, what's going on here? Well, ListDictionary is implemented as a linked list of key/value pairs; all operations on the dictionary require an O(n) search through the list. However, for small n, the constant factor that big-o notation doesn't measure is much lower than the hashing overhead of Hashtable or Dictionary.
HybridDictionary combines a Hashtable and ListDictionary; for small n, it uses a backing ListDictionary, but switches to a Hashtable when it gets to 9 items (you can see the point it switches from a ListDictionary to Hashtable in the graph). Apart from that, it's got very similar performance to Hashtable.
So why would you want to use either of these? In short, you wouldn't. Any gain in performance by using ListDictionary over Dictionary<TKey, TValue> would be offset by the generic dictionary not having to cast or box the items you store, something the graphs above don't measure.
Only if the performance of the dictionary is vital, the dictionary will hold less than 30 items, and you don't need type safety, would you use ListDictionary over the generic Dictionary. And even then, there's probably more useful performance gains you can make elsewhere.
|
-
Posted Wednesday, October 05, 2011 5:59 PM |
Apart from Dictionary<TKey, TValue>, there's two other dictionaries in the BCL - SortedDictionary<TKey, TValue> and SortedList<TKey, TValue>. On the face of it, these two classes do the same thing - provide an IDictionary<TKey, TValue> interface where the iterator returns the items sorted by the key. So what's the difference between them, and when should you use one rather than the other? (as in my previous post, I'll assume you have some basic algorithm & datastructure knowledge)
SortedDictionary
We'll first cover SortedDictionary. This is implemented as a special sort of binary tree called a red-black tree. Essentially, it's a binary tree that uses various constraints on how the nodes of the tree can be arranged to ensure the tree is always roughly balanced (for more gory algorithmical details, see the wikipedia link above).
What I'm concerned about in this post is how the .NET SortedDictionary is actually implemented. In .NET 4, behind the scenes, the actual implementation of the tree is delegated to a SortedSet<KeyValuePair<TKey, TValue>>. One example tree might look like this:
Each node in the above tree is stored as a separate SortedSet<T>.Node object (remember, in a SortedDictionary, T is instantiated to KeyValuePair<TKey, TValue>):
class Node {
public bool IsRed;
public T Item;
public SortedSet<T>.Node Left;
public SortedSet<T>.Node Right;
}
The SortedSet only stores a reference to the root node; all the data in the tree is accessed by traversing the Left and Right node references until you reach the node you're looking for. Each individual node can be physically stored anywhere in memory; what's important is the relationship between the nodes.
This is also why there is no constructor to SortedDictionary or SortedSet that takes an integer representing the capacity; there are no internal arrays that need to be created and resized. This may seen trivial, but it's an important distinction between SortedDictionary and SortedList that I'll cover later on.
And that's pretty much it; it's a standard red-black tree. Plenty of webpages and datastructure books cover the algorithms behind the tree itself far better than I could. What's interesting is the comparions between SortedDictionary and SortedList, which I'll cover at the end.
As a side point, SortedDictionary has existed in the BCL ever since .NET 2. That means that, all through .NET 2, 3, and 3.5, there has been a bona-fide sorted set class in the BCL (called TreeSet). However, it was internal, so it couldn't be used outside System.dll. Only in .NET 4 was this class exposed as SortedSet.
SortedList
Whereas SortedDictionary didn't use any backing arrays, SortedList does. It is implemented just as the name suggests; two arrays, one containing the keys, and one the values (I've just used random letters for the values):
The items in the keys array are always guarenteed to be stored in sorted order, and the value corresponding to each key is stored in the same index as the key in the values array. In this example, the value for key item 5 is 'z', and for key item 8 is 'm'.
Whenever an item is inserted or removed from the SortedList, a binary search is run on the keys array to find the correct index, then all the items in the arrays are shifted to accomodate the new or removed item.
For example, if the key 3 was removed, a binary search would be run to find the array index the item was at, then everything above that index would be moved down by one:
and then if the key/value pair {7, 'f'} was added, a binary search would be run on the keys to find the index to insert the new item, and everything above that index would be moved up to accomodate the new item:
If another item was then added, both arrays would be resized (to a length of 10) before the new item was added to the arrays.
As you can see, any insertions or removals in the middle of the list require a proportion of the array contents to be moved; an O(n) operation. However, if the insertion or removal is at the end of the array (ie the largest key), then it's only O(log n); the cost of the binary search to determine it does actually need to be added to the end (excluding the occasional O(n) cost of resizing the arrays to fit more items).
As a side effect of using backing arrays, SortedList offers IList Keys and Values views that simply use the backing keys or values arrays, as well as various methods utilising the array index of stored items, which SortedDictionary does not (and cannot) offer.
The Comparison
So, when should you use one and not the other? Well, here's the important differences:
- Memory usage
SortedDictionary and SortedList have got very different memory profiles. SortedDictionary...
- has a memory overhead of one object instance, a bool, and two references per item. On 64-bit systems, this adds up to ~40 bytes, not including the stored item and the reference to it from the
Node object.
- stores the items in separate objects that can be spread all over the heap. This helps to keep memory fragmentation low, as the individual node objects can be allocated wherever there's a spare 60 bytes.
In contrast, SortedList...
- has no additional overhead per item (only the reference to it in the array entries), however the backing arrays can be significantly larger than you need; every time the arrays are resized they double in size.
That means that if you add 513 items to a SortedList, the backing arrays will each have a length of 1024. To conteract this, the TrimExcess method resizes the arrays back down to the actual size needed, or you can simply assign list.Capacity = list.Count.
- stores its items in a continuous block in memory. If the list stores thousands of items, this can cause significant problems with Large Object Heap memory fragmentation as the array resizes, which SortedDictionary doesn't have.
- Performance
Operations on a SortedDictionary always have O(log n) performance, regardless of where in the collection you're adding or removing items. In contrast, SortedList has O(n) performance when you're altering the middle of the collection.
If you're adding or removing from the end (ie the largest item), then performance is O(log n), same as SortedDictionary (in practice, it will likely be slightly faster, due to the array items all being in the same area in memory, also called locality of reference).
So, when should you use one and not the other? As always with these sort of things, there are no hard-and-fast rules. But generally, if you:
- need to access items using their index within the collection
- are populating the dictionary all at once from sorted data
- aren't adding or removing keys once it's populated
then use a SortedList. But if you:
- don't know how many items are going to be in the dictionary
- are populating the dictionary from random, unsorted data
- are adding & removing items randomly
then use a SortedDictionary.
The default (again, there's no definite rules on these sort of things!) should be to use SortedDictionary, unless there's a good reason to use SortedList, due to the bad performance of SortedList when altering the middle of the collection.
|
-
Posted Friday, September 16, 2011 6:51 PM |
To many people, System.Collections.Generic.Dictionary<TKey,TValue> is just a useful collection. In this post, I'll be looking inside that collection and see how it really works.
Dictionary is based on a hashtable; for the rest of this post, I'll assume you know roughly how a hashtable works. The Wikipedia article, as the source of all knowledge algorithmical, provides a good overview. It will also help if you've got the class open in Reflector so you can see what's going on yourself.
The basics
The core of the Dictionary type is a standard hashtable, straight out of an algorithms book. However, it's interesting design comes from how it deals with hash collisions, using a variant of chaining.
When you add an item to a hash table, the hash code of that item determines which slot in a backing array the item is stored. In the perfect case, every item would have a separate hash code, and so every item would have a separate slot in the array. To find an item, you simply hash it and go to that slot in the backing array.
However, two or more items can hash to the same hash code, and you get a hash collision. When this happens, and you're using chaining, the extra item is stored in a linked list hanging off the array slot. How does the .NET Dictionary handle this?
Storing entries
To store its data, the Dictionary uses an array of Entry structs:
private struct Entry {
public TKey key;
public TValue value;
public int hashCode;
public int next;
}
key and value should be self-explanatory. hashCode provides a fast lookup to the original hash code for the functions that need it, and next is used when dealing with hash collisions.
So, the backing array is a Entry[] called entries. However, where the entries go in this array isn't determined by the hash code. There's actually a separate array of ints called buckets alongside entries. How is this used?
Hashing items
When adding an item, the hash code is generated modulo the current array size, and that determines the slot the item is stored in. However, that slot is not the one in entries, it is actually the one in buckets. The value in buckets at the hashed index is then the index of the slot in entries that the data is actually stored at, and that is simply assigned to the next free slot in the array.
As an example, say you've got a Dictionary with a capacity of 5:
You add an item to it, that has a hash code of 3, modulo the size of the dictionary. The actual item goes in the first available slot in entries, and the slot at index 3 in buckets then stores the index at which the item is stored in entries; in this case, 0:
You then add two more items to it, with hash codes 0 and 2, respectively:
Nice and simple. But, you then add a fourth item that also hashes to 3. Now, we've already added an item with a hash code of 3; we've got a hash collision, so the Dictionary will have to chain it. However, it doesn't create a separate linked list, it stores the index of the next item in the chain in the next value of the Entry struct like so:
If we were to add another item, also with a hash code of 3, the chain would be extended:
This means, that to search for an item with a particular hash code, you go to the Entry pointed to by the buckets value at that hash code, and follow the next pointers until you find the item you need, or the null pointer (signified by a next of -1; see the FindEntry method).
Removing items
So that's adding items. What about removing items?
Any item in the dictionary can be removed at any time, which is different to adding items, where they simply get put on the end of the entries array. If you've got lots of additions and removals, this can lead to a lot of wasted space as items in the middle of the array get removed, and new items get added to the end. Well, this is where a variable called freeList comes into play.
When an item is removed, the slot it occupied gets added to the freelist chain. This is a chain of entries, indexed by a single freeList variable, which marks array slots where additional items can go. This is chain is also linked by the next pointer in the Entry structure.
So, this is what it would look like if you removed the blue item:
then the orange item:
Then, if another item is added, instead of adding to the end of the array (and causing a full rehash of the contents), it can simply use the slot pointed to by the freeList variable:
Using this indirect mechanism means that the internal arrays don't need to be resized, and everything re-hashed, until you actually have more items than you can store in the entries array. It also doesn't cause much performance degredation when the dictionary gets full, which the non-generic System.Collections.Hashtable had problems with.
Some more random observations...
Clear doesn't reset the size of the arrays; if you clear a dictionary with 100,000 items in it, the backing arrays will still be sized to store 100,000 items, potentially wasting a lot of space.
The dictionary enumerator simply iterates down the entries array. This is why, when you're just adding items to a dictionary, they are returned in the order they were added. Only when you remove items and the freelist becomes involved are items returned in non-consecutive order.
However, this is very much an implementation detail, and can change in future versions of .NET; do not rely on this behaviour in your own code! Always treat the the enumeration order of any hashtable as random.
|
-
Posted Thursday, July 07, 2011 2:50 PM |
Developers might write good code, but no matter how good they are the result will always have bugs in it. It's up to the testers in the team to make sure the final product is as bug-free as it can be.
Deciding what to test
Within a project there's normally no official documentation produced, no official record of what the project should accomplish. The closest thing we get to documentation is the greenlight presentation slides (I'll be covering the greenlight process in a later post). This means there's no specification to validate against, no way of confirming that the project is 'complete'. So how does a tester know what to test in the first place?
Within Red Gate, the same team normally stays on the same product for several minor and major releases (as an example, the same team has worked on SQL Source Control since it's inception back in 2009). Everyone within the team has an intimite knowledge of what the tool does, what problems it solves, how customers use it, and what the main bugs and issues are. Testers are as much a part of the team as the developers or project manager.
This means testers simply don't need a specification. They know how customers use the tool, what bugs they're running into, how new features should interact with existing ones, and (roughly) how they actually work. The testers are fully involved in the project from day 1. This gives the testers deep knowledge of the application and application domain, so they know where and what they should test to ensure the final product works.
How to test a product
As with other things at Red Gate, there's no officially-mandated way of testing a product, no documentation that needs to be produced. It's entirely up to the testers to decide how they want to test the product. Most of the time this will involve writing automated unit and integration tests to run on the build server with each new build. When testing the user interface, this usually means manual testing of all the different parts of the UI to make sure it all behaves sensibly whatever actions the user performs (although some testers are experimenting with automated UI testing).
As a product matures, it gains more and more tests. Not just tests for new features, but tests covering specific bugs encountered by customers. SQL Compare, our oldest existing product, now has over 30,000 unit and integration tests, whereas SmartAssembly (the product I'm working on) was acquired by Red Gate a couple of years ago, and has about 1000 unit tests and counting; Jason's writing more all the time.
At the end of the project, it's up to the testers to give the final go-ahead to release the product to customers. They need to be satisfy themselves that the product is as bug-free as it can be and the new features work as they should, on every configuration the product will be run on, using whatever tools and methods they see fit. Only then does the installer go up on the website, and the online documentation and product web pages updated with the new features and new documentation.
Testers are an integral part of the project team, and only they decide when the product can be released. They act as Red Gate's gatekeepers and quality control.
|
-
Posted Friday, June 24, 2011 5:14 PM |
As I discussed in my previous posts, divisions and project teams within Red Gate are allowed a lot of autonomy to manage themselves. It's not just the teams though, there's an awful lot of freedom given to individual employees within the company as well.
Reasonableness
How Red Gate treats it's employees is embodied in the phrase 'You will be reasonable with us, and we will be reasonable with you'. As an employee, you are trusted to do your job to the best of you ability. There's no one looking over your shoulder, no one clocking you in and out each day. Everyone is working at the company because they want to, and one of the core ideas of Red Gate is that the company exists to 'let people do the best work of their lives'. Everything is geared towards that.
To help you do your job, office services and the IT department are there. If you need something to help you work better (a third or fourth monitor, footrests, or a new keyboard) then ask people in Information Systems (IS) or Office Services and you will be given it, no questions asked. Everyone has administrator access to their own machines, and you can install whatever you want on it. If there's a particular bit of software you need, then ask IS and they will buy it.
As an example, last year I wanted to replace my main hard drive with an SSD; I had a summer job at school working in a computer repair shop, so knew what to do. I went to IS and asked for 'an SSD, a SATA cable, and a screwdriver'. And I got it there and then, even the screwdriver. Awesome. I screwed it in myself, copied all my main drive files across, and I was good to go. Of course, if you're not happy doing that yourself, then IS will sort it all out for you, no problems.
If you need something that the company doesn't have (say, a book off Amazon, or you need some specifications printing off & bound), then everyone has a expense limit of £100 that you can use without any sign-off needed from your managers. If you need a company credit card for whatever reason, then you can get it.
This freedom extends to working hours and holiday; you're expected to be in the office 11am-3pm each day, but outside those times you can work whenever you want. If you need a half-day holiday on a days notice, or even the same day, then you'll get it, unless there's a good reason you're needed that day. If you need to work from home for a day or so for whatever reason, then you can. If it's reasonable, then it's allowed.
Trust issues?
A lot of trust, and a lot of leeway, is given to all the people in Red Gate. Everyone is expected to work hard, do their jobs to the best of their ability, and there will be a minimum of bureaucratic obstacles that stop you doing your work. What happens if you abuse this trust?
Well, an example is company trip expenses. You're free to expense what you like; food, drink, transport, etc, but if you expenses are not reasonable, then you will never travel with the company again. Simple as that. Everyone knows when they're abusing the system, so simply don't do it. Along with reasonableness, another phrase used is 'Don't be a ***'. If you act like a ***, and abuse any of the trust placed in you, even if you're the best tester, salesperson, dev, or manager in the company, then you won't be a part of the company any more.
From what I know about other companies, employee trust is highly variable between companies, all the way up to CCTV trained on employee's monitors. As a dev, I want to produce well-written & useful code that solves people's problems. Being able to get whatever I need - install whatever tools I need, get time off when I need to, obtain reference books within a day - all let me do my job, and so let Red Gate help other people do their own jobs through the tools we produce.
Plus, I don't think I would like working for a company that doesn't allow admin access to your own machine and blocks Facebook!
|
-
Posted Wednesday, June 15, 2011 5:53 PM |
The vast majority of Red Gate is on the first and second floors (the second and third floors in US parlance) of an office building in Cambridge Business Park (here we are!). As you can see, the building is split into three sections; the two wings, and the section between them.
As well as being organisationally separate, the four divisions are also split up in the office; each division has it's own floor and wing, so everyone in the division is working together in the same area (.NET and DBA on the left, SQL Tools and New Business on the right). The non-divisional parts of the business share wings with the smaller divisions, again keeping each group together.
The canteen
One of the downsides of divisionalisation is that communication between people in different decisions is greatly reduced. This is where the canteen (aka the SQL Servery) comes in. Occupying most of the central section on the first floor, the canteen provides free cooked lunch every day, and is where everyone in the company gathers for lunch.
The idea is to encourage communication between the divisions; having lunch with people in a different division you wouldn't otherwise talk to helps people keep track of what's going on elsewhere in the company. (I'm still amazed at how the canteen staff provide a wide range of superbly cooked food for over 200 people out of a kitchen in which, if you were to swing a cat, it would get severe head injuries.). There's also table tennis and table football tables that anyone can use, provided you can grab them when they're free!
Office layout
Cubicles are practically unheard of in the UK, and no one, including the CEOs, has separate offices. The entire office is open-plan, as you can see in this youtube video from when we first moved in (although all the empty desks are now full!). Neil & Simon, instead of having dedicated offices, move between the different divisions every few months to keep up to date with what's going on around the company; sitting with a division gives you a much better overall impression of how the division's doing than written status reports from the division heads.
There's also the usual plethora of meeting rooms scattered around the place; when we first moved in in 2009 we had a competition to name them all. We've got Afoxalypse A & B, Seagulls A & B, Traffic Jam, Thinking Hats, Camelids A & B, Horses, etc. All the meeting rooms have pictures on the walls corresponding to their theme, which adds a nice bit of individuality to otherwise fairly drab meeting rooms. Generally, any meeting room can be booked by anyone at any time, although some groups have priority in certain rooms (Camelids B is used a lot for UX testing, the Interview Room is used for, well, interviews).
And, as you can see from the video, each area has various pictures, post-its, notes, signs, on the walls to try and stop it being a dull office space. Yes, it's still an office, but it's designed to be as interesting and as individual as possible.
|
|
|