Wednesday, December 31, 2008

Implementing a Thread Safe cache using the Parallel Extensions

Caching objects based on a key is a very common task in software development, making it both thread safe and scalable however is not quite trivial. That's why I implemented a generic implementation of such a cache, to handle the concurrency issues I used the Parallel Extensions to the .NET framework (PFX).
[Update: See this post for a cache that works with .Net 3.5 without PFX]
A implementation pattern I have seen quite often, uses a dictionary to store cached items based on a cache key and looks like the following:
   1: Dictionary<int, Item> _cachedItems = new Dictionary<int, Item>();
   2: object _lock = new object(); 
   3:  
   4: public Item GetItem(int id)
   5: {    
   6:     Item result;    
   7:     if(!_cacheItems.TryGetValue(id, out result))    
   8:     {        
   9:         lock(_lock)        
  10:         {            
  11:             if(!_cacheItems.TryGetValue(id, out result))
  12:             {                
  13:                 result = CreateItem(id); // does the actual expensive work of creating the item
  14:                 _cacheItems.Add(id, result);
  15:             }
  16:         }
  17:     }    
  18:     return result;
  19: }
This implementation uses a pattern known as 'double check locking', this allows for items already  in the cache to be retrieved without using a lock, only if the item is not found it acquires a lock, checks again for the item having been added by another thread between checking the first time and acquiring the lock, and then creates the item and stores it for future use.

Thursday, December 11, 2008

F# to ship as part of Visual Studio 2010

To be honest I don’t really like to blog about stuff that is already ‘out there’ on the web. There are however some topics that I personally think are important and don’t always get the attention they deserve. These topics mostly concentrate around parallel programming (see my blogroll in the sidebar). So for these topics I will make an exception.
Today Don Syme announced that F# will be released with Visual Studio 2010. I had expected this news to get out at the PDC last October, but for some reason we were kept in the dark a little longer.
I think this is a mayor step in bringing functional programming to mainstream software development.

Tuesday, December 2, 2008

More context information in Linq queries

Last week I posted on how to use an elements sequence number in Linq queries. In the project I just finished working on, we wrote a lot of Linq queries where we needed even more information related to an elements position in the sequence, like comparing it to the the element that came directly before or directly after it.
To be concrete: the input sequence contains all (ordered) historical versions of the same insurance policy object. From this sequence we need to filter all items for which a specific property has changed since the previous version. This could be implemented like:
   1: Version previous = null;
   2:  
   3: foreach (var version in Versions)
   4: {
   5:     if (previous != null && previous.amount != version.amount)
   6:     {
   7:         // do something with this item
   8:     }
   9:     previous = version;
  10: }
Once you have gotten addicted to the declarative style of Linq, writing code like this just doesn't cut it anymore. I wanted to be able to use the filtered items as a sequence so it can be used in a Linq query.
Using the pattern I described last week this could be written like
   1: Versions.select((item, index) => new {item, index}).Skip(1)
   2:         .where(v=>Versions.ElementAt(v.index - 1).amount != v.item.amount)
This query finds each items index number and uses it to find the preceding item using the ElementAt() method. This makes it possible to compare each item to it's previous item in the where clause. The Skip(1) is needed because the first element obviously doesn't have a previous item.
The ElementAt() method however is not very efficient, because in most cases to find a single item it iterates the sequence from the begin to the requested item. Finding each item's preceding item by re-looping the sequence again can get you into serious performance issues when using larger sets.
Because I needed this kind of queries a lot in this project, I wanted to make then both easy to write and efficient in execution. So I wrote an extension method I called WithContext() that wraps each input item in a container object, along with some information about its context in the sequence. This allows for the code above to be written as:
   1: Versions.WithContext().Where(v=>v.Previous != null && v.Previous.Amount != v.Current.Amount)
To select the duration of each version, based on its own StartDate and the StartDate of the next item, I can now write:
   1: Versions.WithContext().Select(v.Next.StartDate - v.Current.StartDate)
WithContext() takes an IEnumerable and returns an IEnumerable>. ElementWithContext is a simple class that provides properties to retrieve the Current item as well as the Previous and the Next. While at it I added properties to get all Preceding and all Following items, as well as the sequence number.
   1: public class ElementWithContext
   2: {
   3:     public T Previous { get; private set; }
   4:     public T Current { get; private set; }
   5:     public T Next { get; private set; }
   6:     public int Index { get; private set; }
   7:  
   8:     public IEnumerable Preceding { get; private set; }
   9:     public IEnumerable Following { get; private set; }
  10:  
  11:     internal ElementWithContext(T previous, T current, T next,
  12:         int index, IEnumerable preceding, IEnumerable following)
  13:     {
  14:         Current = current;
  15:         Previous = previous;
  16:         Next = next;
  17:         Index = index;
  18:         Preceding = preceding;
  19:         Following = following;
  20:     }
  21: }
The implementation of WithContext() looks a bit like the code in the first sample. It loops the input sequence and for each element it yields a new ElementWithContext. To find the next element I did a kind of 'look ahead' in the for loop. To do this I tweaked the input sequence so that it represents all 'Next' items' . I did this by first adding an empty element at the end (the last item does not have a Next item) and then skip the first item (the first 'next' item is the second in the sequence). This way WithContext() conforms to Linq's 'deferred execution' model by not taking more items from the input then necessary.
   1: static public IEnumerable> WithContext(this IEnumerable source)
   2: {
   3:     // initialize the previous and current item for the first source element
   4:     T previous = default(T);
   5:     T current = source.FirstOrDefault();
   6:     int index = 0;
   7:  
   8:     // Loop all 'Next' items
   9:     foreach (T next in source.Union(new[] { default(T) }).Skip(1))
  10:     {
  11:         yield return new ElementWithContext(previous, current, next,
  12:                 index, source.Take(index), source.Skip(index + 1));
  13:  
  14:         previous = current;
  15:         current = next;
  16:         index++;
  17:     }
  18: }
One more item that I'll keep handy on my personal utility belt :-)
Oh, and thanks to Erno for pointing me to this Live Writer Add-In for the code snippets. I hope this helps reading these way to long code samples (sorry for that).

Sunday, November 23, 2008

Selecting or filtering on the sequence number in Linq

A few weeks ago I was writing a Linq query where I needed each items sequence number (or index) as a part of the filter expression in a Where operation. To simplify the actual problem let's say I needed to filter out only the odd items. One (wrong) way to do this would be:
int i = 0;
var filtered = input.Where(x => i++ % 2 == 1);
This will appear to work at first but it only works correct the first time the filtered sequence is iterated. I will not get into this to deep here, but it has to do with the lambda updating the outer variable which is a bad idea.
Fortunately there is a correct (and easier) way to do this. As it turns out the Where extension method has an extra overload that excepts a lambda function with two input parameters. When you use this overload the lambda receives an extra input parameter which indicates the sequence number of the current item. The correct code looks like this.
var filtered = input.Where((x, i) => i % 2 == 1);
As I wanted to write the rest of the query in Linq comprehension syntax (yes it does have some advantageous sometimes), I wanted to write something like
var filtered = from x in input
               // other Linq stuff           
               where ???? % 2 == 1
               select x;
Comprehension syntax however does not have a built in way to refer to the sequence number (as far as I could find out in 15 min with Google). Bummer :-(
I did find that the Select method has an overload just like the Where method, that accepts a lambda with a sequence number parameter. This allows the sequence number to be used in the query's output.
var withSequenceNumber = input.select((item, index) => new {item, index})
Just like in the where clause, it is not possible to use the sequence number in comprehension syntax's select clause. It is however possible to mix normal C# syntax and comprehension syntax and make the sequence number available to all the comprehension syntax operators:
var filtered = from x in input.select((item, index) => new {item, index})
              // other Linq stuff           
              where x.index %2 == 1
              select x.item;
By first selecting each item along with its sequence number into an anonymous type, the rest of the query can refer to the sequence number as x.index and the item itself as x.item. As an extra bonus the original sequence numbers of the items before filtering are still available after filtering, so they could be used in the select clause to be presented in the ouput.

Thursday, October 16, 2008

Parallel Extensions for .Net Framework will be included in .Net 4.0

I has been a while since I wrote a post on parallel programming. Since that time a lot has happend in field of parallel programming.
Microsoft has been working hard on implementing the Parallel Extensions for .Net Framewok (PFX) and has now anounced that it will be a core part of the next version of the .Net Framework (right into mscorlib.dll). See the post on the PFX blog PFX blog . If you are interested in Parallel programming I strongly recomend reading up on the PFX blog. I also expect more details to be anounced at the upcoming PDC (to bad I won't be there).
FPX includes a task scheduler that promisses to scale well beyond two of four CPU's / cores and allowes LINQ to objects queries to be executed parallel (PLINQ). The later, and probably lesser known aditions include so called Coordination Data Structures (CDS) these include thread safe collection classes, like stacks an queues. One of my personal favorites of the CDS is LazyInit. LazyInit is a smart little helper class that makes it easy to implement Lazy Initialization of fields or local variables, of course this is done thread safe so that it can be used in parallel scenario's.
Can't wait to use these new goodies in real life projects!

Friday, June 6, 2008

More fun with Extension Methods and Lambda’s: Creating a generic tree visitor

One of the most common data structures in Software Engineering is the tree. In the .Net framework for instance you can find them in the file system, the controls of a win-forms application, nodes in an XML document etc.
A common task with trees is to traverse all nodes to look for a specific node or perform an operation on each node. This is usually called visiting. Many tree like data structures unfortunately don't have a built-in visiting mechanism, so we usually need to build this ourselves with something like a recursive method.
What I wanted to do is to write an generic implementation to visit any recursive tree. While the basic algorithm to do this is not very complicated, there are two factors that make it difficult to make the implementation generic, the first is that the type of the nodes is different for each tree, the second is that the way to obtain the children of a node is different for each tree. C# 2.0 solved the first problem with generics, and now C# 3.0 has solved the second problem with Lambda expressions.
A generic tree visitor should allow me to iterate all nodes as a single Enumerable, which of course has the advantage that it can be used it a foreach or a Linq query. So the code I Would like to write to visit a tree would be something like the following:
foreach (var node in root.VisitDepthFirst(n => n.GetChildNodes()))
{
    // …
}

 
Where VisitDepthFirst is a generic extension method that operates on the root object. The extension method receives a Lambda expression that is used on a node to retrieve all it's children. For most trees this is a single property or method which return a collection of all child objects.
See the implementation of the depth first visitor below
static public IEnumerable VisitDepthFirst(this T root, FuncIEnumerable> childSelector)
{
    if (root == null) yield break;
    // we return the root
    yield return root;

    // then we do a recursive Depth first search of all children and yield each item we find
    foreach (var child in childSelector(root).SelectMany(c => c.VisitDepthFirst(childSelector)))
    {
       yield return child;
    }
}

When testing this in a Win-forms application I first tried the following
foreach (var control in groupBox2.VisitDepthFirst(c => c.Controls))
{
    control.Text = "Done";
}
This however leads to a a few compiler errors. The first is has to do with the fact that groupBox2 is of type GroupBox and not Control therefore the compiler expects the Lambda expression to return IEnumerable<GroupBox > instead of IEnumerable<Control>. The second problem is that the ControlsCollection does not implement a generic IEnumerable<> at all but in stead only the non-generic IEnumerable.
To fix this I had to be a bit more explicit about the type of the node and use Linq's Cast<>() Extension method to convert the Controls collection to IEnumerable<Control>. The following is the correct way to do this.
foreach (var control in groupBox2.VisitDepthFirst<Control>(c => c.Controls.Cast<Control>())){    control.Text = "Done";}
Likewise visiting all nodes in an xml document is done as follows
foreach (XmlNode childNode in xmlDocument.VisitDepthFirst<XmlNode>(n => n.ChildNodes.Cast<XmlNode>()))
{

}
To be complete I also created a breath-first search. I don't know if I'll ever use it, but why not.
static public IEnumerable VisitBreathFirst(this T root, FuncIEnumerable> childSelector)
{
    if (root == null) yield break;

    Queue open = new Queue();
    // the open queue contains the items that have already been visited, but who's children have not

    yield return root;
    open.Enqueue(root);
    while (open.Count >0)
    {
        T item = open.Dequeue();
        foreach (T child in childSelector(item))
        {
            yield return child;
            open.Enqueue(child);
        }
    }
}

While I think this is a nice example of generics and Lambda's, I have to be hones that I'm not quite sure if this should have been implemented as Extension Methods. The problem here is that these extension methods operate on any object without restriction, which makes them pop up in intellisense all over the place once you add the using directive. The alternative ofcource is to make it a normal static method and loose the syntactic sugar of extension methods. 
Any comments on that one?

Saturday, February 2, 2008

Using C# 3.0 Extension methods and Lambda expressions to avoid nasty Null Checks

In the project I am currently working on we use a domain model with a quite deep nested structure of classes, code like the following is quite common.
// do something with the Adress
Console.Writeline(Contract.Parties.Client.Adress.Street);

The problem with this is that not all Contacts have a related Parties object and if we do have a Parties object sometimes the Client object is Null. This lead to writing a lot of code code like the following:
if (Conract != Null &&
    Contract.Parties != null &&
    Contract.Parties.Client != null &&
    Contract.Parties.Client.Adress != null)
{
    // do something with the Adress
    Console.Writeline(Contract.Parties.Client.Adress.Street);
}

There are a few problems with this code, the first of which is that it is quit verbose. Second it evaluates the same properties over and over again. In order to reduce the double work we could introduce some extra local variables and create a separete if statement for every step, this will make it even more verbose than it is allready.
Of course C# 3.0 has a lot of new syntax tricks, like extension methods, lambda expression and type inference. I had seen some nice examples (mostly Linq related) but had not been using these features to the max myself. Somehow I got the idea that these features would help me solve the Null checking problem.
I figured that the code I wanted to write was something like the following.
if (Contract
    .NullSafe(c => c.Parties)
    .NullSafe(p => p.Client)
    .NullSafe(c => c.Address) != null)
{
    // do something with the adress
    Console.Writeline(Contract.Parties.Client.Adress.Street);
}

The idea is that all parts of the path are expressed in separate lambda expressions which are chained together. If any link in the chain returns null, the rest of the path is never evaluated and the result of the whole expression will also be null.
All I had to do to make this possible was write one single extension method that would operate on any object, I was quite amazed that I could do this with verry little code that, besides some generic stuff, was actually quite simple.
        public static TResult NullSafe(this T target, Func func)
        {
            if (target != null)
                return func(target);
            else
                return default(TResult);
        }

All the generics make it look a lot more complicated than it actually is. This extension method receives two arguments, the first of which is the object it operates on and the second is a generic Func delegate (the lambda expression). The return type of the extension method is automatically inferred from the return type of the lambda that is passed in. This allows you to call NullSafe again on the result of the first call and use IntelliSense to type p.Client in the second lambda.
The nice thing about extension methods is, besides that they can operate on any existing class without changing it, that they can also operate on Null references without causing a NullReferenceException!! This makes it possible to test for null inside the extension method. If the object it is called on is not null it evaluates the lambda expression and returns the result. Otherwise it returns the default value of the expected return type, for reference types this is Null.
To make life even better I created another overload of NullSafe that was even simpler

        public static void NullSafe(this T target, Action action)
        {
            if (target != null)
                action(target);
        }

Instead of a generic Func delegate this one receives a Generic Action delegate and returns void.
This removes the need to retype the whole path inside the if, it actually removes the if altogether.

Contract
    .NullSafe(c => c.Parties)
    .NullSafe(p => p.Client)
    .NullSafe(c => c.Address)
    .NullSafe(a =>
    {
        // do something with the adress
        Console.Writeline(a.Street);
    });

To bad for me, my current project is still on C# 2.0 so I'll keep doing it old school style :-(
Exercise
If you would also like some practice with C# 3.0, I would like challenge you to write something that allows me to do the same thing for IEnumerable, I would like to write something like the following
foreach (Order order in Contract
    .NullSafe(c => c.Parties)
    .NullSafe(p => p.Client)
    .NullSafe(c => c.Orders)
    {
        // do something with order
        Console.Writeline(o.Amount);
    }

Where Client.Orders is of type IEnumerable. The idea is that I want to iterate every Order in Contract.Parties.Client.Orders. If at any stage in the path (including the Orders collection) we encounter a Null reference we will just loop an empty enumerator and do nothing.