Wednesday, December 31, 2008

Implementing a Thread Safe cache using the Parallel Extensions

Caching objects based on a key is a very common task in software development, making it both thread safe and scalable however is not quite trivial. That's why I implemented a generic implementation of such a cache, to handle the concurrency issues I used the Parallel Extensions to the .NET framework (PFX).
[Update: See this post for a cache that works with .Net 3.5 without PFX]
A implementation pattern I have seen quite often, uses a dictionary to store cached items based on a cache key and looks like the following:
   1: Dictionary<int, Item> _cachedItems = new Dictionary<int, Item>();
   2: object _lock = new object(); 
   3:  
   4: public Item GetItem(int id)
   5: {    
   6:     Item result;    
   7:     if(!_cacheItems.TryGetValue(id, out result))    
   8:     {        
   9:         lock(_lock)        
  10:         {            
  11:             if(!_cacheItems.TryGetValue(id, out result))
  12:             {                
  13:                 result = CreateItem(id); // does the actual expensive work of creating the item
  14:                 _cacheItems.Add(id, result);
  15:             }
  16:         }
  17:     }    
  18:     return result;
  19: }
This implementation uses a pattern known as 'double check locking', this allows for items already  in the cache to be retrieved without using a lock, only if the item is not found it acquires a lock, checks again for the item having been added by another thread between checking the first time and acquiring the lock, and then creates the item and stores it for future use.
There are some problems with this implementation. The first and most important is that the MSDN documentation guarantees the dictionary only to be thread safe for multiple concurrent readers, not for reads and a write at the same time. In this implementation the first TryGetValue is done outside the lock and could conflict with an Add from an other thread. To get this right you will need to use more sophisticated locking using for instance a ReaderWriterLockSlim or always lock the entire cache even when only reading.
Another problem is that the CreateItem() Method, which does the actual expensive work to get the data we want to cache (e.g. call a Web Service), is done inside the lock that synchronizes access to the cache. This means that even if multiple different items are requested, only one item will be created at a time, this can be killing the performance you wanted to improve by caching. Finally this implementation requires the same pattern to be (wrongly) implemented over and over again because the code to create an item usually differs for every scenario.
To solve these problems once and for all I created a generic CacheDictionary using PFX. This CacheDictionary does not allow for items to be directly added or retrieved, instead the Fetch() method takes the key of the requested item and a delegate that will create the item if needed. If the item is found in the cache it will be returned, if not the provided delegate is invoked to create the item and store it for later use.
   1: public class CacheDictionary   
   2:     where TValue : class // needs to be a ref type due to current limitation of lazyInit<>   
   3: {   
   4:     ConcurrentDictionary> _cacheItemDictionary = new ConcurrentDictionary>();   
   5:   
   6:     public TValue Fetch(TKey key, Func producer)   
   7:     {   
   8:         LazyInit cacheItem;   
   9:         if (!_cacheItemDictionary.TryGetValue(key, out cacheItem))   
  10:         {   
  11:             cacheItem = new LazyInit(() => producer(), LazyInitMode.EnsureSingleExecution);   
  12:   
  13:             if (!_cacheItemDictionary.TryAdd(key, cacheItem))   
  14:             {   
  15:                 // while we never remove items, if TryAdd fails it should be present   
  16:                 cacheItem = _cacheItemDictionary[key];   
  17:             }   
  18:         }   
  19:         return cacheItem.Value;   
  20:     }   
  21: }  
To store the items I used the ConcurrentDictionary provided with the September 2008 CTP of PFX. Internally the ConcurrentDictionary does all the required synchronization magic so I don't need to write a single lock myself. The ConcurrentDictionary looks a bit like the normal Dictionary, but the methods are a bit different. In does for instance not support Add() and Remove() methods, but only TryAdd() and TryRemove(). This is because with a ConcurrentDictionary you can never be sure of it's state while other threads may be adding or removing items at any moment.
To use the ConcurrentDictionary as a store for the cache the first step is to Try to Get the item with a TryGetValue(). If it is not found the new item is 'created' and I Try to Add it to the cache. This might fail if another thread beat me to it, in that case I can be sure the item exists (I don't support removes yet) and just get it using the indexer.
Inside the ConcurrentDictionary I don't store the cached items directly. Instead I wrap the actual items in another class introduced by the PFX, LazyInit. LazyInit allows a very simple implementation of lazy object initialization. When creating an instance of LaziInt the constructor expects a delegate that will create the item as needed. In this case I use the delegate that was provided in the call to Fetch() (due to some issues in the current CTP it can't be passed directly). When the LazyInit.Value property is called for the first time, the delegate is executed and the result is stored in this instance of LazyInit for any subsequent calls. Internally LazyInit has it's own synchronization logic which makes sure each item is created only once, this synchronization is done per instance of the LazyInit class. By using LazyInit here I'm sure each cache item is created only once without holding a lock for the the whole cache while doing so. Furthermore it allows me to 'try' to store the instance of the LazyInit object in the cache before actually creating the item itself. Only If the TryAdd() succeeds this instance of LazyInit is actualy used to create the item.
Note: In the current CTP of PFX, LazyInt has some limitations, the most important one is that it will only allows T to be a reference type. That is why this CacheDictionary currently also requires TValue to be a reference type. It is expected that this limitation will be removed when PFX ships.
This CacheDictionary currently does not support expiration policies like for instance the ASP.Net web cache does. I found the ASP.Net cache however not to be very versatile while it only has one global cache store for an entire AppDomain and only allows string keys. Maybe pluggable expiration policies will be something for a next version of CacheDictionary, there are however a lot of scenario's where expiration is not really an issue and it is of course always possible to flush the entire CacheDictionary by just creating a new instance :-). Any suggestions for improvement however are welcome.

No comments:

Post a Comment