Friday, June 6, 2008

More fun with Extension Methods and Lambda’s: Creating a generic tree visitor

One of the most common data structures in Software Engineering is the tree. In the .Net framework for instance you can find them in the file system, the controls of a win-forms application, nodes in an XML document etc.
A common task with trees is to traverse all nodes to look for a specific node or perform an operation on each node. This is usually called visiting. Many tree like data structures unfortunately don't have a built-in visiting mechanism, so we usually need to build this ourselves with something like a recursive method.
What I wanted to do is to write an generic implementation to visit any recursive tree. While the basic algorithm to do this is not very complicated, there are two factors that make it difficult to make the implementation generic, the first is that the type of the nodes is different for each tree, the second is that the way to obtain the children of a node is different for each tree. C# 2.0 solved the first problem with generics, and now C# 3.0 has solved the second problem with Lambda expressions.
A generic tree visitor should allow me to iterate all nodes as a single Enumerable, which of course has the advantage that it can be used it a foreach or a Linq query. So the code I Would like to write to visit a tree would be something like the following:
foreach (var node in root.VisitDepthFirst(n => n.GetChildNodes()))
{
    // …
}

 
Where VisitDepthFirst is a generic extension method that operates on the root object. The extension method receives a Lambda expression that is used on a node to retrieve all it's children. For most trees this is a single property or method which return a collection of all child objects.
See the implementation of the depth first visitor below
static public IEnumerable VisitDepthFirst(this T root, FuncIEnumerable> childSelector)
{
    if (root == null) yield break;
    // we return the root
    yield return root;

    // then we do a recursive Depth first search of all children and yield each item we find
    foreach (var child in childSelector(root).SelectMany(c => c.VisitDepthFirst(childSelector)))
    {
       yield return child;
    }
}

When testing this in a Win-forms application I first tried the following
foreach (var control in groupBox2.VisitDepthFirst(c => c.Controls))
{
    control.Text = "Done";
}
This however leads to a a few compiler errors. The first is has to do with the fact that groupBox2 is of type GroupBox and not Control therefore the compiler expects the Lambda expression to return IEnumerable<GroupBox > instead of IEnumerable<Control>. The second problem is that the ControlsCollection does not implement a generic IEnumerable<> at all but in stead only the non-generic IEnumerable.
To fix this I had to be a bit more explicit about the type of the node and use Linq's Cast<>() Extension method to convert the Controls collection to IEnumerable<Control>. The following is the correct way to do this.
foreach (var control in groupBox2.VisitDepthFirst<Control>(c => c.Controls.Cast<Control>())){    control.Text = "Done";}
Likewise visiting all nodes in an xml document is done as follows
foreach (XmlNode childNode in xmlDocument.VisitDepthFirst<XmlNode>(n => n.ChildNodes.Cast<XmlNode>()))
{

}
To be complete I also created a breath-first search. I don't know if I'll ever use it, but why not.
static public IEnumerable VisitBreathFirst(this T root, FuncIEnumerable> childSelector)
{
    if (root == null) yield break;

    Queue open = new Queue();
    // the open queue contains the items that have already been visited, but who's children have not

    yield return root;
    open.Enqueue(root);
    while (open.Count >0)
    {
        T item = open.Dequeue();
        foreach (T child in childSelector(item))
        {
            yield return child;
            open.Enqueue(child);
        }
    }
}

While I think this is a nice example of generics and Lambda's, I have to be hones that I'm not quite sure if this should have been implemented as Extension Methods. The problem here is that these extension methods operate on any object without restriction, which makes them pop up in intellisense all over the place once you add the using directive. The alternative ofcource is to make it a normal static method and loose the syntactic sugar of extension methods. 
Any comments on that one?