I recently mentioned that Improving is planning an upcoming product release. The product is an open source toolkit for object relational mapping and database related code generation, Or. We view it as a competitor to NHibernate. Although it probably won't be as feature-rich near-term, it will definitely hit a useful sweet spot, and provide Improving a platform to discuss a number of design decisions, corners of the .NET framework, and, of course, object relational mapping techniques.
As part of the release, we wanted to have sample applications that demonstrate its use (in lieu of quality documentation that should follow.) We decided tactically that rather than create a canned application, it would make more sense to start with some existing, open-source sample applications, port them to use the framework, and then do a before/after comparison to see what and whether the changes improved the application (gratuitous marketing: Improving. Its what we do.) For the first release, we thought we would start with Microsoft's own sample TimeTracker application from ASP.NET and Dottext, the popular, open source blog server application. Todd is hacking up the TimeTracker, and I was taking on the .Text code base.
Now.
Before I get into this, I think the following disclaimer needs to be shared. Design is full of trade-offs, and many designs are [thank you, Mr. Weinberg] how they are because they got that way. It is probably reasonable to assume that, with the exception of a few choice pieces of code, there is always a way to improve on an existing design. That is one of the core tenets of our brand, Improving -- there is *always* room for Improving. I respect and applaud Microsoft for releasing to open-source the sample Time Tracker app to provide a concrete example of how to use ASP.NET. Even more than that, I respect and applaud Scott Watermasysk for giving his code to the community, even though he reserved the right to take it back and, under new employment, appears to have done so. We are giving our code to the community as well with the release of Or. We fully expect someone not so close to our code could look at it and find tons of opportunities for Improving the design, and we welcome it. If you have great ideas that you want to share, email or at improvingtech.com and ask to become a committer. We are going to be very picky about who gets on the committer list, so we expect you to demonstrate yourself with real live code.
Now.
Back to the application I have been focusing on - .Text. .Text is a blogging platform that supports a variety of syndication formats and multiple blogs. It is backed by a Sql Server database and has a very consistent design, which makes it fairly easy to understand and modify. All of these things, at first glance, make it appear to be a good candidate for the port to Or. As I dug deeper, however, I found a number of twists that made the port challenging, and I was distracted by a number of coding techniques that I don't typically practice. For example, I almost never create typesafe collections. I want to be able to support typesafe collections in Or, but I haven't really taken the time to test things out. I thought I would dig a little deeper in .Text to understand how it was using them, and the result of that investigation is this post.
Typesafe collections provide a [go figure] typesafe data structure for managing multiple instances of the same type of object. I really didn't see typesafe collections much before I developed with Microsoft technologies. To me, typesafe collections have the potential to provide a number of benefits. First, you can restrict the type of objects that are passed into a collection. This ensures that the types that you get out are also of the same type. Second, if you go further in your implementation of typesafe collections than the average [.NET] implementation, you can eek a small bit of performance out of your collection by backing it with a typesafe array, allowing you to avoid the cost of casting to and from your target contained type, which adds at least an isinst IL, which, relative to some other IL ops, is fairly expensive. If the majority of the time in your application is spent iterating over collections of objects, this could actually have a pretty significant impact on the the responsiveness. This has never [ever... EVER] been the bottleneck for me, but your mileage may vary. I can imagine plenty of cpu intensive cases that might yield such results, and I know close personal friends who have had to deal with similar problems in Java many moons ago, but we all know that *that* was because it was implemented in Java, right guys? Come on? Your with me, aren't you? [Of course not].
For what it's worth, my experience has shown that the real bottleneck is almost always i/o, and it is typically disk first, network second, unless you're doing something really stupid, er, I mean, naive, er, I mean innocent, like accessing properties on distributed objects, which everyone should know by now is a bad idea. There are some who would say distributed objects at all are a bad idea. I definitely prefer [text-based] message passing ala services or REST. Back to our story.
As I dug deeper on the typesafe collections, I noticed a number of things. First, the code that accesses the collections in .Text almost always looks like the following:
KeyWordCollection kwc = GetKeyWords(); if(kwc != null && kwc.Count > 0) { KeyWord kw = null; for(int i =0; i.Count;i++) { kw = kwc[i]; entry.Body = ReplaceFormat( entry.Body, kw.Word, kw.GetFormat, kw.ReplaceFirstTimeOnly, kw.CaseSensitive); } }
Without actually being responsible for designing the code, I can only presume that the code is written this way for performance reasons. First, the null check is paranoid-but-reasonable defensive measure. Second, the Count method from CollectionBase (which ends up delegating to an internal ArrayList, for what it is worth) is pretty darn cheap, as method calls go. Compare that with this slightly cleaner syntax sugar coated version that assumes that the GetKeywords() method will at least return an empty collection (which it does):
foreach( Keyword kw in GetKeyWords() ) { entry.Body = ReplaceFormat( entry.Body, kw.Word, kw.GetFormat, kw.ReplaceFirstTimeOnly, kw.CaseSensitive); }
This cleaner version hides the fact that under the covers, an Enumerator object is created, which is certainly more expensive than a null check and a Count check. Both implementations hide the fact that the reference is downcast, e.g. there is an isinst call [and its friends], with the typesafe collection having its downcast occuring in the indexer. I could just as easily do the same checks before the foreach version to gain the modicum of performance in the null case. Once again, I feel compelled to state that none of this has ever been material to an application I have written, but if you think it may be, PAYF. Again, if the typesafe implementation is backed by a typesafe array, a little more performance can be had. Given that, it would appear that the only benefit left to having a typesafe collection would be so that we could ensure that we are only allowing particular types to enter the collection. So, we get to this point and we have to ask ourselves, is the hundred or so lines of code necessary to build a typesafe collection in the manner that it is done in .Text worth the effort to gain the security of ensuring that we only add types we expect to add? For me, the answer is "No." This is largely due to the fact that I trust myself to only add in types that are homogeneous at some level in a hierarchy below object. I rarely have a situation where I have lots of different code adding objects to the same collection. If I do, the collection is usually an instance variable on another class, and I can protect the collection by implementing a mutator for the collection along the lines of:
public class Company { public void Add(Employee e) { employees.Add(e); } private IList employees = new ArrayList(); }
I went through and deleted all of the typesafe collections and modified the code that accessed them to use foreach statements, and the code looked to me to be considerably cleaner and the number of classes and files that needed to be maintained were reduced. I *did* have to add back in a PagedList implementation and an ImageCollection, because they seemed to actually differentiate themselves. For the ImageCollection, I didn't actually extend CollectionBase, but instead just held on to an IList instance. So, the moral of the story is that TypeSafe collections can add value to a system, but they do come at a cost, and that cost should be weighed relative to the amount of effort necessary not only to [generate] the code, but also to maintain and especially to understand the system. Fewer extraneous types means fewer things to understand means a shorter ramp. I have included an example of a typical typesafe collection implementation from the .Text source code to demonstrate the amount of code we are talking about. It isn't a huge amount of code, until you realize that there are 14 other classes of identical size. Fortunately, the code is very much boilerplate, and there are many templating systems, e.g. CodeSmith, QuickCode, CodeRush, etc. that allow you to generate said boilerplate.
using System; using System.Collections; using System.Xml; namespace Dottext.Framework.Components { [Serializable] public class KeyWordCollection: CollectionBase { public KeyWordCollection() { } public KeyWordCollection(KeyWordCollection value) { this.AddRange(value); } public KeyWordCollection(KeyWord[] value) { this.AddRange(value); } public KeyWord this[int index] { get {return ((KeyWord)(this.List[index]));} } public int Add(KeyWord value) { return this.List.Add(value); } public void AddRange(KeyWord[] value) { for (int i = 0; (i < value.Length); i = (i + 1)) { this.Add(value[i]); } } public void AddRange(KeyWordCollection value) { for (int i = 0; (i < value.Count); i = (i + 1)) { this.Add((KeyWord)value.List[i]); } } public bool Contains(KeyWord value) { return this.List.Contains(value); } public void CopyTo(KeyWord[] array, int index) { this.List.CopyTo(array, index); } public int IndexOf(KeyWord value) { return this.List.IndexOf(value); } public void Insert(int index, KeyWord value) { List.Insert(index, value); } public void Remove(KeyWord value) { List.Remove(value); } public new KeyWordCollectionEnumerator GetEnumerator() { return new KeyWordCollectionEnumerator(this); } public class KeyWordCollectionEnumerator : IEnumerator { private IEnumerator _enumerator; private IEnumerable _temp; public KeyWordCollectionEnumerator(KeyWordCollection mappings) { _temp = ((IEnumerable)(mappings)); _enumerator = _temp.GetEnumerator(); } public KeyWord Current { get {return ((KeyWord)(_enumerator.Current));} } object IEnumerator.Current { get {return _enumerator.Current;} } public bool MoveNext() { return _enumerator.MoveNext(); } bool IEnumerator.MoveNext() { return _enumerator.MoveNext(); } public void Reset() { _enumerator.Reset(); } void IEnumerator.Reset() { _enumerator.Reset(); } } } }
Posted by: anon | 2004.12.08 at 09:08 AM
Posted by: Jef | 2004.12.08 at 09:46 AM
Posted by: Yasir Osama Atabani | 2007.07.01 at 08:45 AM