Tuesday, December 2, 2008

More context information in Linq queries

Last week I posted on how to use an elements sequence number in Linq queries. In the project I just finished working on, we wrote a lot of Linq queries where we needed even more information related to an elements position in the sequence, like comparing it to the the element that came directly before or directly after it.
To be concrete: the input sequence contains all (ordered) historical versions of the same insurance policy object. From this sequence we need to filter all items for which a specific property has changed since the previous version. This could be implemented like:
   1: Version previous = null;
   3: foreach (var version in Versions)
   4: {
   5:     if (previous != null && previous.amount != version.amount)
   6:     {
   7:         // do something with this item
   8:     }
   9:     previous = version;
  10: }
Once you have gotten addicted to the declarative style of Linq, writing code like this just doesn't cut it anymore. I wanted to be able to use the filtered items as a sequence so it can be used in a Linq query.
Using the pattern I described last week this could be written like
   1: Versions.select((item, index) => new {item, index}).Skip(1)
   2:         .where(v=>Versions.ElementAt(v.index - 1).amount != v.item.amount)
This query finds each items index number and uses it to find the preceding item using the ElementAt() method. This makes it possible to compare each item to it's previous item in the where clause. The Skip(1) is needed because the first element obviously doesn't have a previous item.
The ElementAt() method however is not very efficient, because in most cases to find a single item it iterates the sequence from the begin to the requested item. Finding each item's preceding item by re-looping the sequence again can get you into serious performance issues when using larger sets.
Because I needed this kind of queries a lot in this project, I wanted to make then both easy to write and efficient in execution. So I wrote an extension method I called WithContext() that wraps each input item in a container object, along with some information about its context in the sequence. This allows for the code above to be written as:
   1: Versions.WithContext().Where(v=>v.Previous != null && v.Previous.Amount != v.Current.Amount)
To select the duration of each version, based on its own StartDate and the StartDate of the next item, I can now write:
   1: Versions.WithContext().Select(v.Next.StartDate - v.Current.StartDate)
WithContext() takes an IEnumerable and returns an IEnumerable>. ElementWithContext is a simple class that provides properties to retrieve the Current item as well as the Previous and the Next. While at it I added properties to get all Preceding and all Following items, as well as the sequence number.
   1: public class ElementWithContext
   2: {
   3:     public T Previous { get; private set; }
   4:     public T Current { get; private set; }
   5:     public T Next { get; private set; }
   6:     public int Index { get; private set; }
   8:     public IEnumerable Preceding { get; private set; }
   9:     public IEnumerable Following { get; private set; }
  11:     internal ElementWithContext(T previous, T current, T next,
  12:         int index, IEnumerable preceding, IEnumerable following)
  13:     {
  14:         Current = current;
  15:         Previous = previous;
  16:         Next = next;
  17:         Index = index;
  18:         Preceding = preceding;
  19:         Following = following;
  20:     }
  21: }
The implementation of WithContext() looks a bit like the code in the first sample. It loops the input sequence and for each element it yields a new ElementWithContext. To find the next element I did a kind of 'look ahead' in the for loop. To do this I tweaked the input sequence so that it represents all 'Next' items' . I did this by first adding an empty element at the end (the last item does not have a Next item) and then skip the first item (the first 'next' item is the second in the sequence). This way WithContext() conforms to Linq's 'deferred execution' model by not taking more items from the input then necessary.
   1: static public IEnumerable> WithContext(this IEnumerable source)
   2: {
   3:     // initialize the previous and current item for the first source element
   4:     T previous = default(T);
   5:     T current = source.FirstOrDefault();
   6:     int index = 0;
   8:     // Loop all 'Next' items
   9:     foreach (T next in source.Union(new[] { default(T) }).Skip(1))
  10:     {
  11:         yield return new ElementWithContext(previous, current, next,
  12:                 index, source.Take(index), source.Skip(index + 1));
  14:         previous = current;
  15:         current = next;
  16:         index++;
  17:     }
  18: }
One more item that I'll keep handy on my personal utility belt :-)
Oh, and thanks to Erno for pointing me to this Live Writer Add-In for the code snippets. I hope this helps reading these way to long code samples (sorry for that).

No comments:

Post a Comment