A Django site.
December 10, 2007
» TODOs in code considered harmful

Twice within the last week, I've found an easily preventable bug in my code. The problem, in both cases, was that I put a TODO comment in the code, fully intending to come back later and fix up whatever issue I put the TODO in for. If you've developed software for any amount of time, you know the rest of the story. I never went back in, completely forgot about the TODOs, and blissfully shipped the code. Over a year later, my forgotten TODOs came back to bite me in the form of bugs. Luckily for me in both cases the bugs were very minor - but still quite annoying.

TODOs in code should be considered harmful. They're bugs waiting to happen, and they're so easy to forget about. There's a sort of pathology I see a lot in code in which the author hits a particularly sticky problem (or perhaps just something they can't be bothered to do) and they just slap in a big TODO comment. For me, no more TODOs. They're not visible enough, not noticeable enough, and they're a lame excuse for not doing the right thing in the first place.

There are a couple of better alternatives to TODO comments. The first is to instead file a new bug against yourself to fix whatever issue you would have written the TODO for. This has much better visibility than TODOs. It also allows you to estimate, prioritize, and schedule the needed work. Depending on your tools, doing this will range from cumbersome to elegant. A very nice IDE/issue tracking integration feature would be to allow for creating new issues from within the IDE, in the context of a file that you're editing.

Another alternative is to write some quick, naive code that is observably incorrect. Again, this has much better visibility than TODOs, since anyone using the software should be able to tell at a glance that things aren't right. Throwing an exception would fit in this category, as long as the exception is observable to a user/tester of the software. However, in most cases you want to write code that actually works, but has some behavior that reminds everyone that additional work still needs to be done before the feature is complete. This allows testing / other work to progress, but doesn't allow the issue to fall off the radar.

If you do testing or QA, here's an easy way to find bugs. Take a look through your codebase for TODO comments. For each comment you find, determine if the comment is 6 months or older, if the author of the comment is no longer with the team, etc. For those comments that have fallen by the wayside, there's an excellent chance you'll uncover some bugs around them. Personally, as a developer, I am going to be much more disciplined about not using TODO comments in my own work.

Further reading:
Point: http://c2.com/cgi/wiki?TodoCommentsConsideredHarmful
Counterpoint: http://c2.com/cgi/wiki?TodoCommentsConsideredUseful

August 16, 2007
» Simple Concurrency Guidelines for Designing APIs

When designing an API, one of the considerations you usually have to address is concurrency. In other words, for every class in the API, what is the class's threading policy? At the very minimum, an API should document how it behaves with regard to concurrent access, and even better is an API designed with concurrent access in mind.

Writing a good API that is concurrency-friendly is hard. More than anything, it requires lots of reasoning about how all of the moving pieces will work together under concurrent access. In this article I’m going to discuss a few simple guidelines you can follow when designing APIs and reasoning about concurrency. The specifics of what I’ll discuss apply to Java, although the general concepts probably apply equally well to .NET and other similar environments.

The goal is, in general, to choose the design that is the easist to reason about and still has acceptable concurrent performance and API usability. Here are four guidelines that can be used as a starting point when thinking about what the threading policy of a class should be:

1) For non-collection-oriented classes, default to immutable
2) For collection-oriented classes, default to mutable and thread safe
3) Prefer collections of immutable elements
4) For collections of mutable elements, make copies and treat the contained elements as immutable internally

These guidelines will help you to design classes that make it easier for the consumer of your API to reason about concurrency policy. This reasoning is made possible by clear documentation of each class’s design for concurrent access. Failure to consider the threading policy for a class leads to under-documented APIs, unexpected behavior at runtime, and obscure bugs that are hard to reproduce. Documenting the concurrency of an API is essential in order to guide the API’s consumers to correctly using the API.

Note that documenting an API as “not thread safe” is a valid design choice. It may not be the best choice (depending on the API). However, it is better to document that a class isn’t thread safe than to make the API consumer guess or rely on undocumented behavior that could change.

In the discussion below, I make a distinction between collection-oriented classes and non-collection-oriented classes. Collection-oriented classes contain multiple similar child objects (elements). They are usually easy to recognize since they often contain methods to add elements, remove elements, find elements, and perform similar operations across the contained elements. Collection-oriented classes shouldn’t be confused with composite classes that are made by composing together dissimilar classes.

Now I’ll go over each guideline. Remember that ultimately, we are looking at a class and trying to determine what a reasonable threading policy for that class might be.

For non-collection-oriented classes, default to immutable

Immutable classes are one of the best design choices you can make when designing an API. For classes that aren’t collection-oriented, default to using an immutable design.

Only design mutable classes if it seems that API consumers would be greatly inconvenienced by the immutable classes. Usually though, immutable non-collection-oriented classes aren’t a big hassle for users.

Immutable classes have the inherent property of being safe for simultaneous access from multiple threads. You don’t have to do any internal or external locking. This is the first big advantage of immutable classes – thread safety for free and no performance hit under concurrent access.

Another big advantage of immutable classes is that when you use them, you can more easily build a mental model of the system you’re designing. Since immutable classes are very easy to reason about, it takes less mental RAM to think about how instances will behave at runtime - they can only ever be in a single state (post construction).

Immutable classes are also a great way to set API consumer expectations. One of the most frustrating things about an API is when some class, say a service of some kind, takes a mutable object as initialization data. What happens after you hand the mutable configuration data off to the service? Are you still allowed to modify the configuration data? Does the service care? Will it break? Using an immutable class design for the configuration data instead solves these problems.

Designing immutable classes is extremely easy in Java. Briefly, here are the requirements you should satisfy when designing a class in order to call it immutable:

1) All fields in the class should be declared final
2) All fields should be either immutable objects or mutable objects that are not mutated outside of a constructor
3) The this reference does not escape a constructor
4) No mutable objects passed to a constructor are retained by the instance or any of its components
5) No mutable objects escape the instance

Satisfying these requirements is not the only way to design a thread safe immutable object, but other techniques are much harder to explain and require deep knowledge of Java’s memory model.

For collection-oriented classes, default to mutable and thread safe

When designing a collection-oriented class, the default choice should be to write a mutable container that can be safely accessed by multiple threads concurrently. Most collection-oriented classes should be mutable since that’s what the API consumer will expect. When using a collection-oriented class, the most common reason is because you want to modify the collection (add or remove elements) or modify the contained elements themselves. An immutable collection-oriented object makes consumers jump through hoops to do this.

Often, collection-oriented classes in an API will be used only from a single thread for many scenarios. It can be tempting not perform the necessary locking to make these classes thread safe, since that locking will be unnecessary for the majority of usage scenarios. In this case, one valid design decision would be to skip the locking and document the collection as not being thread safe. However, doing this penalizes the users of the class who are in concurrent-access scenarios, since the burden of implementing thread safety is now on them.

I recommend going ahead and doing the internal locking to make these types of classes thread safe. On modern Java runtimes, the cost of uncontended synchronization is extremely low (JCIP talks about this in detail). Even when the common case is single-thread use, it’s better to design a class to be thread safe if there are potential concurrent scenarios. If a profile reveals that the locking is a hotspot when using the class, then you have a good reason to avoid it. Otherwise, assume that locks are essentially free when uncontended.

Although it’s not the first choice I’d make, for some APIs immutable collection-oriented classes may not be a bad decision. You get all of the advantages mentioned above for immutable classes. If API consumers will not often need to mutate the collection or the collection’s elements, doing this may make sense. For example, some collection-oriented objects are often just passed around to other parts of the API, and rarely changed. If you decide to design an immutable collection-oriented class, be sure to document this very explicitly. Also, avoid method names that imply that the receiver is being changed as they can be confusing to a casual user of the API. An immutable collection-oriented class should not have a "public void add(Foo f)" method since the signature of that method implies that it alters the receiver.

Prefer collections of immutable elements

Whenever possible, collection-oriented classes should contain immutable elements. This helps reduce confusion about the API, since it is clear that the state of the elements can’t be changed while the collection contains them. I’ve found that designs that use small, immutable objects as building blocks and contain them in mutable containers tend to be very robust and easy to understand.

The biggest reason this is important is that most non-trivial collection-oriented classes have one or more invariants that they must enforce. For instance, a collection-oriented class may store elements that each have an identifier of some sort, and the collection may guarantee that contained elements will have unique identifiers. This is just an example – often the constraints can be much more complicated. If the contained elements are externally mutable, it will be very hard or even impossible for the collection to enforce those invariants.

Mutable collections of immutable objects are also very fast to make copies of. Since the contained elements are immutable, copying the collection only involves making a shallow copy. Supporting copies of collections is important for many APIs, so anything that makes this easier and faster is a win.

For collections of mutable elements, make copies and treat the contained elements as immutable internally

Unfortunately, it’s not always possible to design using only collections of immutable elements. Often, for one reason or another, the collection must contain mutable elements. One case of this is a collection-oriented class where the elements themselves are collection-oriented. No matter the specifics, there is a design pattern you can follow in this case.

The collection should make copies of the mutable elements as they are added or retrieved, ensuring that no external clients have a reference to the actual contained instances. In other words, every time a mutable element is added to the collection, the collection makes a copy and adds the copy instead. Every time a mutable element would be obtained, a copy is obtained instead. Internally, the collection should treat the contained elements as though they were immutable and should not call any method on the elements that could change them. By doing this, the collection will contain “effectively immutable” elements. The collection is then free to enforce constraints on the contained elements and know that no external client can break the constraints.

I can attest that this kind of design works well, but it does require some careful API documentation. Without proper documentation, clients may expect that they can obtain a contained element, mutate it, and have those changes automatically show up in the collection. Instead, this kind of design facilitates a more transactional usage. Clients obtain an instance, perform some changes to it, and then must add that instance back in to the collection. This usually isn’t a huge burden on the clients as long as expectations are set correctly.

That covers the 4 guidelines. Note that these guidelines are really meant to only cover simple cases – however, the simple cases make up the bulk of most APIs. There are certainly complex classes that don’t fall easily into one of the categories above, and they will need to be designed with more thought.

August 5, 2007
» A Layered Conceptual Model for Character Encoding

In today’s global software industry, character encoding issues are frequently encountered on almost all software projects. One factor that causes trouble when discussing and solving character encoding issues is terminology. In order for developers to share problems and successes with each other, a generally agreed-upon set of terms is needed. Another problem is that different character encoding standards work differently. It can often be useful to abstract away the details and talk in general about how character encoding standards work. By doing this, it can be easier to understand the mechanics of a specific character encoding by fitting it into a more general framework.

You might be familiar with the OSI seven layer model for describing network protocols. Many computer science and software engineering education programs teach this model (e.g., in an introductory networking class). This model is useful because it abstracts away the specific details of particular network protocols and provides a generalized stack of layers that all protocols can be thought of as having.

A similar abstract layered model can be used to define and discuss character encoding standards. In this article, I’ll define and explain a layered model for character encoding. By learning this model, you’ll be able to more easily diagnose and solve character encoding issues, as well as gain the ability to easily understand new encodings by fitting them into an existing mental model. The model I describe here is defined by the Unicode standard in the Unicode Technical Report #17, where it is known simply as the “Character Encoding Model”. Even though this model is defined by the Unicode standard, it is a general purpose conceptual model and can be used for character encoding standards other than Unicode.

Before discussing the individual layers of the model, it’s useful to remember what character encoding is, in a very basic way. At the risk of overstating the obvious, the point of character encoding is to go from a sequence of characters to a sequence of bits. The bits could be used in memory, persisted on a disk, or transferred across a network. At some point in the future the process is reversed, and the bits are decoded back into characters. By breaking this mechanism down into abstract layers, it is easier to understand all of the different transforms involved.

The first and most basic layer is an Abstract Character Repertoire. This defines a collection of characters that are described and given names. For instance, phrases like “the XYZ alphabet” and “the script used by XYZ” could be said to define character repertoires. An important aspect of character repertoires is that in no way do they define representations of characters. A repertoire simply defines member characters by naming them.

One point to note – a character repertoire assumes a definition of what a “character” is. For the purposes of this article, I’m going to hand-wave over the entire concept of character identity and what a character truly is. That is a topic that could easily make up an article all by itself. For now, whatever definition of “character” you have in your head is good enough to understand this layered model.

The second layer is a Coded Character Set. This is the first level at which actual encoding takes place: each character in the character repertoire is given a unique integer number to represent it, called a code point. Therefore, the coded character set level defines the first representation of each character, which is simply an integer. In other words, a coded character set defines a mapping from characters in a character repertoire to code points.

A coded character set, by its nature, defines a code space. The code space is the domain of the code points, and defines the minimum and maximum code point values. For large repertoires, it can be helpful to break the code space up into smaller sub-sections and give those sub-sections names.

Representing a sequence of characters as a sequence of numbers gets us a little closer to our ultimate goal of a sequence of bits, but it’s not a huge difference from characters. When compared to a sequence of bits, a number sequence is still a pretty abstract concept. The code points in a coded character set may have very different magnitudes (e.g., 10 vs. 10000), and a coded character set says nothing about how to represent these abstract integers as bits.

The third layer is a Character Encoding Form. A character encoding form transforms a sequence of code points in a sequence of equal-sized integers called code units. This is the first level at which bits are introduced into the encoding – code units are called equal size because each the code unit size is expressed as a number of bits. The size of code units may vary from character encoding to character encoding, but for a particular character encoding form the size is fixed. So when you hear the term n-bit character encoding, it refers to a character encoding form in which the code units are n bits long.

It’s easy to confuse the concepts of code points and code units for a few reasons. For one, they are both integers. For another reason, many character encodings use an identity mapping as a character encoding form, in which each code point value is equal to the code unit value. In such character encodings, the character encoding form is said to be one-to-one (i.e., one code point maps to one code unit). To remember the difference between the two concepts, keep a few things in mind. A code point is an abstract integer (e.g., 17), or just a point on some number line. A code unit is a fixed-size integer (e.g., 17 expressed as an 8-bit value or 0x11). Even though many encodings map one code point to one code unit, such a one-to-one mapping is not the case for all encoding standards.

The fourth layer is a Character Encoding Scheme, which maps individual code unit values to specific sequences of bits. For encoding standards in which the code units are of length 8 bits or less, the character encoding scheme layer typically does nothing. For encoding standards in which the code units are longer than 8 bits, the encoding scheme must map the code unit values into a sequence of bytes. This is where endianness issues arise – in this case a character encoding scheme specifies the ordering of the sequence of bytes for a code unit value.

The UTR #17 model also defines an optional fifth layer called a Transfer Encoding Syntax. This layer is different than the previous four layers. A transfer encoding syntax is almost always separate and orthogonal to the other four layers, and is often not specified as part of a character encoding standard but is used in addition to a defined standard. The most common use of a transfer encoding syntax is to apply some sort of post-processing to the sequence of bytes produced by the other four layers. For example, the sequence of bytes may be compressed to save space (e.g., according to an algorithm such as LZW). Or, the sequence of bytes may be further encoded by an algorithm so that it can be more easily transmitted over certain media (e.g., an algorithm like Base64).

It’s most useful to think of a transfer encoding syntax as a completely optional and separate fifth layer that can be added on to a stack of the other four layers.

To summarize the layers, an abstract character repertoire defines a set of named characters. A coded character set encodes a sequence of those characters as a sequence of abstract integer code points. A character encoding form represents the character sequence as a sequence of fixed-length integer code units. Finally, a character encoding scheme then represents the character sequence as a sequence of bytes.

Now that the levels have been defined, it is possible to give a slightly more precise definition of a character encoding standard. A character encoding standard specifies a stack of these four layers that when combined ultimately maps from a sequence of abstract characters in a repertoire to a sequence of bytes.

To further explain these layers, here’s a few examples using several character encoding standards that many software professionals will be familiar with.

First, consider a character encoding known as windows-1252. This encoding defines a character repertoire of 256 characters that are in the Latin alphabet and used in languages such as English (primarily), French, German, etc. This encoding defines a coded character set that maps each of the 256 characters in the repertoire to an integer value between 0 and 255. Further, since each code point value is between 0 and 255, a very straightforward character encoding form is used in which each code point value maps to an 8 bit code unit having the same value, which is obtained by 0-padding out each integer code point value to 8 bits. The character encoding scheme layer does nothing since the code units are only 8 bits in length.

As you can see, in a very simple character encoding standard such as windows-1252, some of the layers blur together or appear to be unused. This is a reflection of the fact that more complicated character encodings exist in which those layers are more distinct.

As a second example, consider a standard in which all of the layers are easily seen – the UTF-16 standard as defined by Unicode. The character repertoire defined by Unicode is huge. Unlike all other character encoding standards, Unicode is an attempt to include all useful characters in its repertoire – spanning languages, cultures, and even history. Unicode defines a coded character set in which each Unicode character is given a code point in the range from 0 to 0x10FFFF. Unicode code point values are often written in the form "U+hexadecimal code point value" (e.g., U+0041). UTF-16 defines a character encoding form that uses 16 bit code units. Each code point maps to either one or two code units. Finally, the encoding scheme specifies how the sequences of code units should be serialized as sequences of bytes. UTF-16 is actually a family of encoding standards in which the individual standards in the family differ only in the encoding scheme (in other words, they differ in byte ordering). For example, UTF-16BE uses an encoding scheme in which code units are serialized in big-endian form.

It is also useful to observe the transformation of a character as the representation moves through the layers. Consider the character “A” as encoded by windows-1252. “A” is included in windows-1252’s repertoire, and is given the code point 65 (coded character set layer). The character encoding form layer maps code point 65 to the code unit 0x41. The character encoding scheme layer does nothing since the code unit 0x41 serializes a single byte (1000001 in binary).

Now consider the character U+10140 (greek acrophonic attic one quarter) as encoded by UTF-16BE. This character is an ancient Greek number character that has only historical significance. I picked it at random since I wanted a character that would map to more than one UTF-16 code unit. This character is included in Unicode’s character repertoire and given the code point U+10140 (coded character set layer). UTF-16 maps the code point U+10140 to the two code units 0xD800 0xDD40 (character encoding form layer). The UTF-16BE encoding maps the two code units to the byte sequence 0xD8 0x00 0xDD 0x40 (character encoding scheme layer).

By learning this layered model for character encodings, you will gain both an understanding of how character encodings work and a mental model you can apply when things go wrong. It will also be easier to discuss character encoding issues with other software professionals since you can share a common set of terms. Finally, when learning new character encodings you can easily fit them into an existing framework, comparing and contrasting them with encodings you are familiar with.

May 7, 2007
» Crazy Book Ideas

Every so often (more often than I'd admit) I come up with a software development book idea. This idea is almost always for a book that has no equivalent currently on the market. For a short time I dream about pitching my idea to O'Reilly, Manning, APress, or Pragmatic. However, I usually realize pretty quickly there is a reason my book idea isn't already written.

Here I present to you my latest book idea, complete with a chapter listing.

Whiteboarding for Software Developers

1. Know Your Equipment: A Guide to Whiteboards, Markers, and Erasers
2. The Good, the Bad, and the Ugly: Drawing for Software Developers
3. Whiteboarding UML: A Beginner's Guide
4. Persistence: How To Save Your Diagrams Across Erases
5. Color: How To Add Clarity and Flair to Your Drawings
6. Whiteboard Etiquette: Using Shared Whiteboards in a Collaborative Environment (thanks to Scott Ambler)
7. Presenting: How To Effectively Incorporate A Whiteboard When Speaking To A Group
8. Under Pressure: How To Not Flop During Interview Whiteboarding
9. Cleaning and Care: Maintaining Your Whiteboard
10. Alternatives: When Not To Use A Whiteboard

You may think I'm joking about this book idea. However, if I came across this book in a bookstore it would be an immediate buy.

April 30, 2007
» Does this ever happen to you?

I don't know what it is. Somehow sitting in front of the computer interferes with my thought process. This happens to me all the time. I'll be sitting at my desk, thinking through some programming problem. I'm not getting anywhere. Then I do something that makes me get up from the desk, like walk over to a colleague's office or walk down the hall to the restroom. Invariably the solution I was looking for pops into my head shortly after leaving my computer. This happens subconsciously - I'm not thinking about the problem at all when the solution suggests itself.

Today it happened again. I spent the last few hours at work trying to figure something out. Finally around 5:00 I gave up in disgust (I hate leaving on that note!) and headed home. About 2 minutes after leaving my desk (I was halfway through the parking lot) the solution I was searching for hits me.

I don't know what it is, but somehow when you're sitting in front of a computer for extended periods of time, your brain gets in a rut. It seems like all that's needed to break out of the rut is to get away from the computer and stop thinking about the problem, even for a very short period of time (1-2 minutes). I've talked with other software professionals who've noticed basically the same thing.

So next time you're blocked on a problem, try to stop thinking about the problem for a little while. Take a short walk or something. Don't stay in front of the computer - checking email or blogs doesn't get you out of the rut. You have to physically step away from the computer and completely take your mind off of the problem.

April 12, 2007
» A class by any other name...

Stop whatever you're doing for a moment, and take a look at the books on your bookshelf that you have to help you develop software. If you don't have a bookshelf or aren't a book kind of person, just play along anyway :-).

There's a book that should be on your bookshelf, and I think there's a good chance it's not. The purpose of this entry is to convince you to add this book to your bookshelf. It's a book that you need to have close at hand, ready to pull out at a moment's notice. It will be a help no matter what domain you work in, and is useful across programming languages.

The book is a thesaurus, and if you don't currently use one when writing code I'm going to try to convince you to start.

It's a well accepted and somewhat obvious fact that you're not writing code solely for the benefit of the computer that will run it. In fact, the most important audience for your code is the programmers who will maintain it in the future (included your future self). The code should be optimized for reading. Write once, read many.

Coming up with good names for things in code is incredibly important. Good, of course, is subjective. Your domain, programming language, and team culture all play a part in deciding what "good" means for you. Spending a little extra time up front to come up with good names pays dividends down the road.

Here are two easy steps you can take, starting now, to improve the naming in the codebases you work on:

1) Take a little time to think of good names.
The more visible the thing you're naming, the more important it is to have a good name for it. I'm thinking mostly about public, shared classes, as they are the most important things to create good names for. However, method names, variables, etc, are all very important too.

In order to come up with a good name, you need to have a clear understanding of the role of the class (or method name, etc. - from here on out I'll just use class but feel free to substitute whatever construct you like) you're naming. If the class doesn't have a well defined role don't waste time trying to name it. First refactor your code and split out responsibilities, and then come up with good names.

Be creative. This is where that thesaurus comes in. The English language has many ways of expressing a concept. Use them. Use them all, or make use of nuances. Whatever works well for your situation. There's no reason that program code needs to have such a limited vocabulary as is often used.

2) Refactor mercilessly to keep names up to date.
Things change. Classes change behavior. Aspects are added or removed. Don't be so in love with your names that you can't change them when they're no longer appropriate. Many modern IDEs have excellent refactoring support, allowing you to safely perform renames and automatically take care of updating references.

Whatever you do, don't let names get out of date. There's nothing worse than trying to make sense of unfamiliar code when the names don't align with the functionality. Keep in mind that even though you may have a great understanding of the codebase, the newcomer to your team, the future maintenance programmer, or yourself down the road won't necessarily have that same understanding.

Here are a few other things to keep in mind:

Good names should be unique, or close to it. This is especially important when you have a large codebase and people will often open types by doing a search (ctrl-T in Eclipse) by name. Here's where that thesaurus comes in handy again. It should be relatively easy to come up with a name that's unique enough to not get confused with other names that are important in your system. Here's an Eclipse-specific tip: you can tell Eclipse to filter out certain packages when searching for classes by name. For instance, if you are not a Swing/AWT programmer and don't want to see java.awt.List come up every time you search for "List", you can filter out the java.awt package. Go to Window -> Preferences -> Java -> Appearance -> Type Filters.

As you name things in your codebase, you're creating a taxonomy and a common nomenclature for your team. You will have to refer to these things by name to your team members, and they will come up in discussions, design meetings, support issues, etc. Are you choosing names that make sense in this context? Sit back from the keyboard for a moment, and try to talk about a class or several classes in complete English sentences. Assuming this works, many of those words you used are great candidates for names in your system. Choose names that are close to your domain and express what you're trying to accomplish. A great name is better than a weak name with a large comment block that attempts to explain the role of the class.

A great book to check out along these lines is Domain Driven Design by Eric Evans. The book talks in great detail about focusing the development process around the concept of a strong domain. Choosing good names is a part of that.


January 23, 2007
» Tips for Pair Programming

I'll start off by saying I am a big proponent of pair programming. It's a very effective programming style when used in moderation. I read stories about some places that pair all of the time for everything - while I've never worked in that kind of environment, I can't imagine it would be too pleasant. There are too many programming activities that involve "think time" in which you're not talking, not typing, not reading, but simply processing. Depending on the kind of work you're doing, this think time may be a big or small part of your day, but either way it's an important part.

However, pair programming is great for some tasks. Lately I've been pairing with a colleague to hammer out the finishing touches on a feature for an upcoming release, and pairing to do this is working really well. Along the way, I've done a few small things to make the overall pair programming experience as effective as possible.

Environment
If you're going to be pair programming for any length of time (more than an hour or two), getting the environmental factors right is crucial. Pairing in a noisy environment isn't as effective as pairing with fewer distractions. At the same time, realize that pairing creates a fair amount of extra noise in the environment that wouldn't otherwise be there (especially if you work in an office environment), so be respectful of your non-pairing coworkers.

A clean and open work space is a must. Both programmers have to feel comfortable - if your counterpart has to dig through last week's sandwich wrappers and who knows what else in order to find a place to work, you're probably not going to have an effective pairing experience. If you will be doing some extended pairing, spend some time before you start cleaning up your work environment. By reducing the amount of clutter and having clean work surfaces, your pair will feel much more comfortable and willing to invest in the effort.

Along these same lines, consider closing programs which pop up intrusive notifications for the duration of the pairing session. This includes things like mail readers and feed readers that have the habit of popping up temporary windows to alert you of new items. This will only distract from the pairing session and your pairing partner probably doesn't care to see the notifications anyway. If you're worried about missing important mail, just take a 10 minute break every couple of hours and check for any urgent messages then.

Another point is that you need lots of collaboration material. At a minimum this means a large, clean whiteboard with plenty of markers and an eraser. It's surprising to notice the psychological effect of a whiteboard that's been freshly cleaned with whiteboard cleaner compared to one that's dirty and lightly erased with a felt eraser. Also good to have on hand is plenty of pads of paper and pens for jotting down quick notes and diagrams.

Input Devices
Ideally pair programming doesn't work like drivers ed. Each participant should be just that - a participant and not an observer. It's great that you can show off your mad programming skillz to your pairing partner, but that isn't really the point of the exercise. The ideal pairing setup involves dual everything - 2 keyboards, 2 mice, and 2 monitors. In my case I had only a single monitor handy but there are plenty of extra mice and keyboards laying around, so I plugged in a second set of input devices and we were off to the races.

The ability to have both pairs able to code without having to play chair tetris or keyboard shuffle saves a lot of time. Often this means one of the programmers can pick up in the middle of a block of code where the other left off, and you don't have to context switch at all. This also means that you're not bumping elbows when trying to scroll around through a class (personal space is important to many programmers :-) ).

If you are going to use only a single monitor, be sure to position the monitor so that it can be easily seen by both. Often this is a different position from what is optimal for a single programmer so don't be afraid to slide the monitor around when starting the pairing session to find the best spot.

Of course it goes without saying that you need to have two comfortable chairs and plenty of space to sit in. If you want to go all out there is always the PairOn. :-)



Fonts
A prerequisite for this kind of extended pairing session is to increase the font size in the tools you'll be using together. This is less of an issue if you're using multiple monitors, but it's still important to consider it. That 10 point font you use with your 21" 1600x1200 monitor doesn't look nearly as good when your face isn't 4 inches from the screen. In Eclipse, I set the text font to be Consolas at 14pt, which works great for pairing off of a single monitor. Consolas looks great on LCD monitors with cleartype enabled, and if you are legally allowed to install it on your machine I can't recommend it enough. Of course, the downside of larger fonts is that you can't see as much code at a time, but it means that you don't get eyestrain or headaches and both programmers have a good view.



ZoomIt
Finally, I highly recommend installing the ZoomIt utility from Sysinternals on the computer before beginning the pairing session. This utility was written to help aid presentations, but it's perfect for pair programming.

ZoomIt has two extremely useful features for pairing. First, with the press of a hotkey, you can use the mouse wheel to zoom in on any portion of the screen. Even if you've bumped up the font sizes as I suggested above, there will probably still be times that a little zooming is needed to narrow in on something. If you or your pairing partner need to often lean closer to the monitor to try to read something, ZoomIt will be a huge help.

Secondly, ZoomIt has a great annotation feature that allows you to draw on the screen. This can be great for pairing - instead of smudging up the screen every time you need to point out something, simply press a hotkey and use the annotation feature instead. This is a really underrated feature for pairing - how many times have you wanted to point at a section of the code or highlight a particular block? Again, this feature is all hotkey driven and the ZoomIt utility overall is very well done.



Conclusion
Keep in mind that pair programming can be mentally draining. Often a 6 hour work day pairing is easily the equivalent of an 8 hour work day alone, especially if you don't pair often. When done right, you can temporarily boost your effective output, but when done wrong, it can be an excuse to slack. Also I've found that during pairing you will often discover tasks that one or the other of you need to do, but doing those tasks as a pair would be a waste of time. The best thing to do is to keep a list of "action items" that are related to the task at hand but will be completed outside of a pairing session. If you do this, be sure to put appropriate TODOs in the code and keep your parter informed about the progress of your action items.

January 17, 2007
» If you need ShouldNeverHappenException, you're calling a bad API

Over on Artima, Bill Venners posted a question about exception handling. When an API declares that an exception could be thrown, it is saying that something could go wrong. This is, of course, a conditional declaration (if an exception was thrown every time, the API would not be very useful). As it happens, often the API makes a guarantee that under certain conditions, an exception will never be thrown. As a client of the API, how do you handle the case when you're calling in and are guaranteed to be in a case where an exception will not happen?

Here's (roughly) the example from the linked article:


// Processes the given input string using the given character encoding.
// If the given encoding is not supported, throws an UnsupportedEncodingException.
// The "UTF-8" encoding is guaranteed to always be supported.

public String doSomething(String input, String encoding) throws UnsupportedEncodingException
{
...
}


Java's checked exceptions make this situation worse. In other languages that support exceptions but don't have the concept of compile-time checking of exception handling, developers usually ignore this case. And after all, why not? If you are calling an API in such a way that an exception is guaranteed to not be thrown, then there is no reason to put in exception handling code. Unless, of course, you are programming in Java and the exception is checked, in which case you must handle the exception whether you want to or not, even when you are guaranteed by the API that your handling code will never get called.

The usual thing to do here is to simply handle the "impossible" exception by rethrowing it wrapped in an unchecked exception, like RuntimeException. This looks something like:

try
{
doSomething("an input value", "UTF-8");
}
catch (UnsupportedEncodingException shouldNeverHappen)
{
// according to the API docs, "UTF-8" is always supported
throw new RuntimeException(shouldNeverHappen);
}


But I'd like to look at this from the point of view of the API producer and not the API consumer. The fact that the API consumer has this problem about how to handle an "impossible" exception indicates an API bug. The fix is to split the API into multiple methods that have different guarantees, adding the following method in addition to the method above:

// Processes the given input string using the "UTF-8" character encoding.
// To process with another character encoding, see doSomething(String input, String encoding).

public String doSomethingUtf8(String input)
{
...
}

Since the API is smart enough to guarantee a special case (no UnsupportedEncodingException) for UTF-8 in the API documentation, why not make that special case explicit? As it often happens (eg java.net.URLEncoder.encode) the special non-exception case is the normal case. Why not optimize for the normal case?

There are many ways to do this. If there are multiple "good" values that will result in no exception being thrown, create a version of the API that takes an enumerated value (and doesn't throw) and one that takes a simple data type like String (and throws).

Of course, if the API didn't use checked exceptions, this would all be a moot point. I rarely throw checked exceptions when writing new code, and encourage others to do the same. Checked exceptions are useful in certain situations, and I enjoy having them in the Java tool chest. However, they're vastly overused and in most cases APIs that throw checked exceptions should be converted to throw unchecked. I've touched on this point before, and there's plenty of opinion in agreement (and disagreement :-) ) out there on the web.

BTW, in the example code above, you could make an argument that the exception should really be an IllegalArgumentException (which is unchecked). UnsupportedEncodingException extends IOException, which is a bit strange to throw as an argument validation exception when IllegalArgumentException exists for that purpose.

January 16, 2007
» State-based vs. Interaction-based Unit Testing

State-based and interaction-based unit tests differ greatly in style, although they each have the same end goal, which is verification of a unit of code. The ultimate difference between the two is what attributes of a unit of code are tested in order to consider whether or not the unit of code is correct. It's important for a software developer to know both styles and to understand the differences between the two.

State-based testing
A state-based unit test is written in a style of unit testing that many software developers would be familiar with. In fact it wouldn't be far off to call this "traditional" unit testing. In a state-based test, the first step is to initialize the unit under test. This initialization may include creation of test data and graphs of supporting objects necessary to exercise the unit under test. The test then exercises the unit by calling methods on it. When the test has finished exercising the unit, assuming no errors have yet been raised, the test then proceeds to verify expected state of the unit. In JUnit parlance, this verification is usually done using the assert*() methods. No matter what testing harness is used, this verification takes the form of testing state and raising errors if the actual state differs from expected state.

A little more about those supporting objects alluded to in the previous paragraph. One of the challenges of unit testing is to sufficiently decouple the multiple units of code that make up an application so that each unit can be tested individually. Otherwise, you end up with a "unit" test that's really more like an integration test. Often this decoupling can be hard. A variety of techniques have been developed to help in this decoupling, but the most important technique is to program to interfaces instead of concrete classes. There should be an interface at all of the coupling boundaries of your unit of code. If this single condition is met, writing effective unit tests will become much easier. Often the supporting objects used by a test harness to properly exercise a unit are stubs. A stub is a trivial implementation of an interface that exists for the purpose of collecting state during a unit test.

Stubs are most often written by hand, although they don't have to be. Usually you will end up creating a graph of stubs and trivial real objects that parallels the graph of real objects (both complex and trivial) present when the application is running. As the unit under test uses the stubs, they stubs collect state that can be verified at the end of the test. We'll come back to stubs later on when we talk about mock objects. For now, keep in mind that stubs are often created by hand and exist to collect state for verification purposes.

How a state-based unit test works shouldn't be a surprise to anyone who's ever written a unit test. One of the advantages of this style of unit testing is it's simplicity. It doesn't take long to teach someone how to write tests in this style, and the overall behavior of the test is intuitive. It kind of feels like the way we might test something outside of the realm of software development.

The important thing to note is what criteria the state-based unit test is considering about the unit under test. It is, not surprisingly, all about state. If the unit under test reaches a certain state, it is considered correct. The state-based unit test doesn't really care how the unit got to that state, but only that it is there at the end. The phrase "the end justifies the means" is apt here. The means (behavior of the unit) doesn't matter nearly as much as the end (state). In fact, the only thing a state-based unit test verifies about the means is that the behavior of the unit didn't include raising any errors during the test.

Are state-based unit tests effective? Absolutely. The best evidence of which is that most unit tests today are being written in this style. The end state of a unit is in many cases very significant from a verification standpoint. The end user of an application certainly cares more about state than behavior (here, behavior being that of internal objects and not external behavior of the application itself). When you're using an online banking application the bottom line is that you want your balance to be the correct one.

Interaction-based testing
An interaction-based unit test is different. The easiest way to explain would be to say that an interaction-based test verifies the behavior of a unit instead of verifying the unit's end state. From the point of view of an interaction-based test, the correctness of a unit is based on how it interacts with its neighbors, and not with internal state of the unit.

An interaction-based unit test first initializes the unit under test. This is done by creating "fake" stand-ins for all of the unit's immediate neighbors. A neighbor is any object that the unit under test passes messages to (calls methods on). These stand-ins are called mock objects, and are usually created by a test framework library. When the initialization is complete, the only "real" thing in the graph of objects is the unit under test. Everything that the unit is hooked up to is a mock object - capable of receiving the same messages as the real object.

There is then a second step to the initialization of an interaction-based unit tests. After all of the mock objects have been initialized, expectations are set on the mocks. This is exactly what it sounds like - the test code programs the mocks and tells them what messages to expect from the unit under test. This can include many things such as what order to expect messages in, what the parameters to method calls should look like, and often how the mock should respond to these messages.

Contrast mocks and stubs. I know that in many places these two terms are used synonymously, but they are really very different from each other. A mock is often generated by a test framework library, while a stub is often created by hand. The internal state of a mock doesn't matter at all - only the expectations it has about the messages it receives. A stub exists to collect state. Stubs are often used in conjunction with "real" objects. For instance, if an existing real object is very simple and lightly coupled to the rest of the code base, it is often brought in by testing code as a supporting object to the unit under test. A stub is normally only created when the real object can't be used in a test harness for various reasons (coupling, external dependencies, etc). Mocks are used exclusively in an interaction-based unit test - all of the unit's immediate neighbors are mocks, even when the real object is trivial.

The final stage in an interaction-based unit test exercises the unit. During this phase, the test code is invoking methods on the unit under test, which itself is interacting with the mock objects. A test error will be raised by the mocks if expectations are not met. An expectation can fail to be met for a variety of reasons, including if methods are called in the wrong order, if parameters have unexpected values, if the wrong methods are called, or if the right methods are not called. At the end of this phase the interaction-based unit test completes. There is no state-checking of the unit under test. If all expectations of the mock objects have been met, then the unit has been verified from the point of view of the test.

Are interaction-based unit tests effective? To really answer that question you have to look at the motivation for this style of testing. Object-oriented development might have been called behavior-driven development if it had been invented in this decade. (I know that there is an existing methodology termed behavior-driven development, but it is really based on OO being done right if you look closely). What is a more useful measure of the correctness of a unit - it's behavior or it's state? A OO purist would hopefully answer that behavior is key while state should be internal and encapsulated. An interaction-based unit test verifies behavior, while a state-based unit test verifies state.

Interaction-based unit tests are great for completely isolating the unit under test, and thereby are very true unit tests. A properly done interaction-based test cannot be testing anything other than the unit, while it is not hard for a state-based test to rely on side effects and be a little sloppy about boundaries between the unit under test and other units in the application. Interaction-based testing is most useful when applied uniformly throughout a code base. The objects that you create mocks for in a interaction-based test will themselves need interaction-based tests in order to feel confident about the code base as a whole.

Deciding which style to use
One of the premises of this article is that it's important for today's software developer to understand the large differences between state-based and interaction-based unit testing. However, for any of this to be of practical use, you will eventually have to make decisions about which style to use when writing a test.

I find that interaction-based testing feels a little unnatural. You can certainly make the argument that an interaction-based unit test is more tightly coupled to the implementation of a unit than a state-based unit test is (the counter-argument is that this is a good thing, not a bad thing). I've also noticed that interaction-based unit tests tend to have a lot more plumbing code that state-based tests. Coming back to a state-based unit test after a few months is usually easy - coming back to an interaction-based test often involves lots of inspection to get back up to speed. In other words, I posit that interaction-based unit tests have a higher maintainability cost than state-based tests. I think a lot of this is a reflection of the current state of languages and libraries in which interaction-based tests are written. This is still a fairly new testing style, and over time much improvement will be made in the libraries and frameworks that support it.

Finally, I'd like to suggest that different kinds of code benefit differently from each style. Some code is very stateful, and is best tested with a state-based test. Other code is more stateless and can be fully tested only with an interaction-based style. Perhaps the right question isn't about knowing which style is better in an absolute sense, but more about being able to recognize which style will be most effective for a particular piece of code.

Further reading
Martin Fowler: Mocks Aren't Stubs
If interaction-based unit testing is new to you, start by reading this article. You're likely also going to have to spend some time writing interaction-based tests against a code base you know well before the differences between the two styles start to sink in. This is one of those cases where "armchair programming" isn't going to help - you have to dive in and try it to fully appreciate the differences between the two styles.

Behavior-Driven Development (BDD)
Many of the motivations behind interaction-based testing come from an increasing focus on behavior and less on state. From the linked site: "It must be stressed that BDD is a rephrasing of existing good practice, it is not a radically new departure."

Mock Roles, not Objects (PDF)
A paper written by the authors of JMock, a popular Java mock object library. The paper is short and very readable. I highly recommend it to anyone trying to understand the purpose of mock objects and how they differ from stubs and state-based testing.

State vs. Interaction Based Testing Example
A blog entry by Nat Pryce, one of the developers on the JMock project, that gives an example of a state-based and an interaction-based unit test for the same piece of code.

July 24, 2006
» the more things change...

One of the books I've been reading lately is Software Conflict 2.0. It's a collection of essays by Robert Glass that was originally published as a book in 1990. The "2.0" version of the book contains all the original (unedited) essays along with a handful of short retrospectives written by Glass for the second version of the book.

Robert Glass is a prolific author, and I've enjoyed his work before. The collection of essays in this book is no exception: despite being 15 years old they feel incredibly relevant to software engineering today. Particularly interesting are the portions of the book where Glass relates anecdotes of the early days of software development (room-sized computers, etc) and successfully ties the stories into today's industry landscape. As a young practitioner in the software field, it's often easy for me to think that the issues of today are somehow new and original. Reading older material like this is enlightening: the issues (the very same issues) that we discuss, debate, and write articles about today were being discussed, debated, and written about decades ago. In many ways, the software industry as a whole has re-learned the same lessons over and over again. I'll save my opinions about why I think that happens for another blog post.

One of the authors Glass makes frequent reference to in his essays is David Parnas. I've heard of Parnas before but have never made the time to read anything he wrote. So today I searched Amazon and found a collection of Parnas' most influential papers entitled Software Fundamentals. I've added the book to my soon-to-read list: I think it's high time I become familiar with some of Parnas' ideas and writing.

I'd recommend reading Software Conflict 2.0 to anyone with a general interest in software engineering. For more, check out the book site where you can sample a few of the essays:
http://www.developerdotstar.com/books/software_conflict_glass.html

May 9, 2006
» API Design: The Principle of Audience

The Principle of Audience states that an API must be designed with its audience in mind. The audience of an API is the people that will be writing code against it.

All too commonly, APIs are designed primarily as a way to expose some underlying functionality at some level of abstraction. This style of API design results in APIs that are either too general or too large for easy use – they're functionality-oriented instead of audience-oriented.

Let's say that you're writing an API around a large, complex piece of software. You could ask one of two questions in order to drive the API design:
1) What functional areas of this software do I want to expose?
2) What kinds of users wish to interact with this software, and what are they trying to get done with it?

Driving an API with the first question can result in an API that is complete but not easy to use and learn. Driving with the second question results in an API that is scenario-focused and was written with purpose and intention. It all boils down to one fact: people are writing to an API because they're trying to get some work done and then move on.

How Audience Drives API Design

Once you've identified your audience for your API, you can start to make audience-focused decisions to drive the design of the API.

One issue to consider is whether the API should be high-level or low-level. A low-level API may do little more than wrap existing code and provide a consistent interface to it. This low-level API usually has an interface style that's identical to the code being wrapped (for example, procedural code wrapped by a procedural API), and it often maps one-to-one with concepts in the underlying code. The main purpose of such a low-level API is to provide a level of indirection so that future changes can be made to the underlying code without breaking existing clients of the API.

A higher-level API may provide a totally different way of working with the existing functionality. A high-level API may have a completely different interface style than the wrapped code (like an object-oriented interface to a procedural library). It may not map one-to-one with concepts in the wrapped code: the API may invent concepts of its own and expose them to clients, and it may hide or suppress concepts in the underlying code that clients of the API shouldn't be concerned with.

The high-level or low-level decision comes down to audience. If your audience is already familiar with the software the API is being written for, a low-level API probably makes sense. If the audience is unfamiliar with the existing software, a high-level API provides a needed conceptual level of indirection.

Another audience issue to consider is whether an API is going to be internal (to an organization) or external. An external API probably needs to be versioned, and definitely needs different documentation than an internal API. People using an internal API probably have access to the authors of the API, and often can make changes to the API to fix bugs or add additional functionality.

Audience Considerations

When you write an audience-oriented API, there are some considerations to keep in mind. Be sure to provide starting points. Think about the API from the perspective of a user who is coming to it for the first time. Where should they start? There should be jumping-off points both in terms of the documentation and the API itself. The documentation should at least have an index, a 5-minute tutorial section, and a FAQ. The FAQ should consist of real questions that are frequently asked – it's going to be hard to come up with this list until the API has been used by people other than its authors.

As an author of an API, you will find it difficult to predict where these jumping-off points should be, or even if there are enough of them. The best way is to team up with another developer you know who is completely new to the API, and have them attempt to write a program against it. Observe where they start, and at what points they struggle with the API. Getting another person to sit down with your API and critique it is going to be much more valuable than trying to predict where the pain points are by guessing.

A great way to provide jumping-off points is a suite of tests that ship with your API and exercise all of its functionality. These tests could be very simple (they probably aren't unit tests), just calling the API and verifying the result, perhaps mocking up the back-end to some degree. Developers using your API can consult the tests to see how various things are done.

Ensure that your API ships with as many development aids as possible. Understand what IDEs and environments your audience will be using to write against your API and deploy code in. Make sure that your API ships with whatever help and documentation is appropriate for these environments. For example, if your audience is going to be using an IDE that offers code completion or contextual code hints, make sure that your API ships with the necessary hooks to integrate into this IDE and provide that kind of support. The whole idea is to make it as easy as possible for your audience to code against your API.

Multiple APIs

So what happens when you've thought about your audience, and you realize that there are multiple audiences each with different needs? Instead of trying to build a single, one-size-fits-all API that covers all the needs, consider writing multiple, distinct APIs for your product.

Let's say that you're writing an API for a some software that manages a large amount of data and performs interesting computations on it. One audience of the API might be people who want to write functionally equivalent client applications (perhaps on unsupported platforms), so you would want to write an API that exposes most of your core functionality and concepts. However, another audience may simply want to get data in or out of your data store. For this audience, an import / export oriented API would make a lot more sense.

A lot of business integration is about just moving data out of one system and into another. Users of an API for this reason don't care about all the features and concepts of your application – they simply want a clean API to either get their data out of your system, or put their data into your system. Given that business data far outlasts the applications that process it, every application that manages customer data should have an API for importing and exporting that data in as neutral a form as possible.

Web Services

The Principle of Audience is one reason I believe a currently popular approach to web services won't work. I call this concept the “build it and they will come” approach to web services. The idea being that a company will invest lots of time and money into writing web services that expose their data or services. This is done without a clear customer (audience) in mind, with the hope of it being useful for future business integration. I'd be willing to bet that most of the time these web services are only used internally in a company (probably with both endpoints on the same platform), and are either scrapped or completely revised if integration with external entities is ever needed.

The problem with “build it and they will come” web services is that no one knows exactly who “they” are. These web service projects are often written either because web services are popular right now, or because having web service support becomes a marketing bullet point. Without a clear audience in mind, these web services become at best toys to try out new technology, and at worst a maintenance time sink with little business value.

A final note – one counter-point to an audience-oriented API might be phrased: Isn't the whole point of an API that we can write a library and then people will use it in all sorts of interesting ways we've never thought of? My response is that an API will only be useful for an unforeseen purpose by coincidence. You can certainly take steps to make the API flexible and extendable, but in the end the best APIs are those that are designed ahead of time for their audience.

May 4, 2006
» API Design

I'm going to start a series of entries about API design. Being a consumer of many APIs and producer of a few, I've accumulated a number of opinions about how to design APIs. But before I dig into the nuts and bolts of this topic, a little background is in order.

What is an API?

API is an acronym that stands for Application Programming Interface. You can drop the application and programming part and think of an API as an interface. An API is a public interface to a system (where “system” could be at many different levels of granularity). Often the term API is used to describe the interface of a third party library or a programming environment, but the most common APIs you will encounter are the APIs you and your coworkers write on a daily basis. Even if you're writing code that's only going to be used inside your own codebase or organization, you're still writing an API for yourself and your colleagues.

Types of APIs

If an API cannot change in sync with its consumers, then it's a fundamentally different kind of API than one that can. In other words, if you control both the API code and the client code, you have a different set of problems than if you only control the API. In both situations, the principles of API design that I'll discuss apply, but in the situation where you don't control the clients, there are many additional challenges (like versioning).

If you are publishing an API which will be used by clients you don't control, then the costs of bad API design are much greater. Once you've published an API, you must choose between backwards compatibility and breaking clients, which isn't an easy choice. But even when you're writing both the API and the clients, it is still usually beneficial to put some thought into the APIs you're writing. An API designed with intention is better than an implicit or ad-hoc API.

However, proper API design is hard and costly. It's not always the right thing to do: you wouldn't normally design an API for prototype code. It's often better to extract APIs from v1 code rather than to try to begin by designing an API.

Principles of API Design

I've grouped number of design guidelines into common principles, and I'll be writing future entries on them:

The principle of audience: all APIs must be designed with their intended audience in mind, and a one-size-fits-all API is not possible

The principle of least surprise: the effects of using an API should be as unsurprising as possible

The principle of least resistance: an API should make the easiest way of doing something the right way of doing it

The principle of resilience: clients should not be allowed to break an API

The principle of consistency: an API should be as consistent as possible in all aspects

The principle of feedback: APIs should fail fast and provide precise feedback when things go wrong

The principle of abstraction: APIs should limit the number of abstraction leaks and should not require the client to be an expert in the system

The principle of responsibility: an API is a contract that comes with client responsibilities and API responsibilities

April 18, 2006
» UML

UML (Universal Modeling Language) is a commonly used modeling tool in many software development organizations. UML has been around for about a decade (read about the history of UML), and it’s enjoyed relative popularity during that time. UML is often misapplied in software projects, but when used appropriately, it can be a valuable tool.

What UML is Good At

UML is best used as a partial description of an object-oriented model. For reasons I’ll explore further below, using UML to completely document every aspect of a model, down to the last class and method, is at best a waste of time and at worst damaging to a project. UML is great for concisely capturing of a portion of a model, bringing a coworker up to speed on a design, or for quick throwaway sketches.



I often use UML to help capture some model knowledge that I don’t want to forget. Perhaps there’s a complex interaction between a set of classes, or an inheritance hierarchy that’s just a little too deep and not easily understandable. By diagramming out a small portion of the model, I may be able to capture enough model knowledge to save time when I come back to the model in the future. The diagram serves to jog my memory about the model, and this task is what I use UML the most often for.

A short, simple UML diagram is also one of the best tools for collaboration between team members. Say a colleague comes into my office to discuss a design. I could spend 30 minutes or more in a verbose, verbal discussion of the design. Alternatively, we could go through the code together and spend a lot of time coming up to speed on the design. However, by drawing a UML diagram on my whiteboard, I can convey the same information in a more concise form. This saves time, and allows my colleague and I to get right to the issue at hand instead of spending too much time on background details.

It’s helpful to think of UML as a common language between you and your coworkers. Having a team that can quickly share information and designs in visual terms that everyone understands and can comment on is very valuable. Informal design reviews are common for many teams, and the ability to represent a design in common terms makes design reviews flow much more smoothly. In this usage, UML is a lot like design patterns – it gives a team a common lexicon in order to express concepts that everyone is already familiar with.

UML also excels at “back of the napkin” type sketches. These kind of diagrams are sloppy, imprecise, and usually thrown away quickly. They’re used to quickly jot down a few thoughts about a design or model, or perhaps to prototype a design. Often by spending 5 minutes or less making such a sketch, I can spot problems with a design that wouldn’t have been apparent right away if I had jumped directly into an implementation.

What UML Isn’t So Good At

In college, I got my introduction to UML through a professor who knew the modeling language quite well. We used a nimble little software tool called Rational Rose :-) . Although I’m going to use my college experience as an example of a misapplication of UML, I want to stress that my professor was actually quite a good teacher of UML (and software engineering in general), and I learned many modeling concepts that I still use on a daily basis.

The idea was that you would do a big up front design before you wrote a line of code. Rational Rose was used to create a very large, very complex class diagram that contained each class, each method, and all interaction between classes. We were taught to iterate on this design diagram for a long period of time (relative to the implementation phase), refining the UML model as we received feedback. Eventually, the model was supposed to reach a state of completeness, at which time you could perform the simple task of translating the model into code.



During my last year of college, all of the graduating software engineers worked on a senior project together. There were about 20 of us in the class, and we had two semesters to complete our software project. The entire first semester was to be spent creating a UML model of our project. The class was split into groups, and each group was responsible for a functional area of the application. By the end of the first semester, we had a combined class diagram with a few hundred classes that showed every aspect of the classes and their relationships. The idea was that we’d come back for the second semester, crank our huge design through the implementation machine, and wind up with our finished application.

If you’ve ever been in a situation like this, I’m sure you already know where the story is going. As soon as we tried implementing our design, we ran into lots and lots of those small annoyances called “implementation details”. We realized that our model was imprecise and left out lots of important aspects. Often we hadn’t thought of these aspects but sometimes the model simply wasn’t capable of expressing them. We also had integration and performance problems. Our modeling didn’t do a good job of capturing interactions at layer boundaries, and the model left a lot to implementation details there. Since our chosen architecture had been completely on paper for all of the design phase, we didn’t realize that it had some poor performance characteristics until we started implementing the application.

We eventually finished the project with moderate success, but in order to do so we had to cut a lot of ties to the model. The original expectation was that we would finish with a working implementation and a UML model that accurately described it, but the end result was a working implementation and a model that wasn’t really very close to it. I have a feeling that our result was not at all uncommon among projects done using a similar methodology.

Big Design Up Front

A big UML design up front seems to be often used as an attempt to shorten the implementation phase of a software project. The idea being that by concentrating all of the creativity and developer experience into a diagram, then that diagram can then be put through a mindless machine of sorts and useful code will come out the other end.

In practice, this simply doesn’t work. UML is not a replacement for implementation. When properly used, it can supplement implementation, but UML is not an alternate form of code. UML and code have separate roles, and they each do certain things well. Code is unambiguous (sometimes wrong, but always precise) and always up to date. A UML diagram is neither of these.



The up-to-date issue is something I think a lot of teams run into. Assuming that you’ve done a big up front UML design and are now implementing it, what do you do when your implementation must deviate from the model? Obviously this can happen for a number of reasons: perhaps the model neglected to take something into account, or the implementation revealed additional aspects that needed modeling, or the design had to change due to external needs. In any case, it’s rare for the UML model to stay current with the implementation. On teams that keep the UML up to date, I imagine they invest many man-hours in updating the UML model.

A certain class of UML tools called round-trip UML tools are supposed to help here. I don’t have much personal experience with these kinds of tools so I’m not going to say much about them. The idea behind them is that a UML diagram and implementation are just both different “views” of the same design, and the tool allows modification of one view of the design to change the other view. I’ve certainly heard lots of second-hand evidence that these tools simply don’t work as advertised.

Other Problems with UML

In his book Domain Driven Design, Eric Evans makes a point about the inadequacy of UML to capture the big picture. UML is very good at capturing a representation of objects. This representation can be very precise, showing you exactly what objects are like and what object relationships are like. But UML is not good at capturing the meaning of a design – the design’s intention, and what the objects are meant to do.

Think about giving a very precise, detailed description of a foreign object to a child who had never seen the object before. For instance, imagine describing a radial arm saw in a woodshop to a 4 year old. The description could be full of detail and precision, and would convey an accurate representation of the radial arm saw. It could include the relationship between the saw blade and the saw body, the direction and speed at which the saw blade spins, and the rotational ability of the saw arm. But even with such an accurate representation of the saw, the description probably wouldn't be all that relevant.



UML diagrams can answer what questions, and sometimes can answer how questions. But often, the more important questions are why questions. A UML diagram can’t tell you why a particular class is modeled the way it is, or what tradeoffs were considered in order to arrive a design.

UML is not even always good at representation. One problem is that once you start getting beyond simple relationships, class diagrams can become exceedingly complex. There are lots of UML class diagram symbols that I don’t know, don’t want to, and have never needed to use. There are also a lot of representational concepts that UML just kind of punts on. You can tell what these concepts are by looking for << this kind of text >> in a UML class diagram. In a complicated class diagram you’ll see these all over the place, and it’s really just a regression to a textual description of a concept inside a diagram.

Something I’ve noticed that tells a lot about the misapplication of UML is that large UML diagrams almost always require some text to accompany them to explain things. If UML really captured large designs well, the diagram would stand alone and wouldn’t require a supplement in order for it to have meaning. Of course, any diagram or model always has context, and the context needs to be explained, but what I’m referring to goes beyond context and into interpretation.

A UML Tool I Like

Even though I’ve had a lot to say about the misapplication of UML, I don’t want to throw the baby out with the bathwater. Just because UML is often misused isn’t a reason for avoiding it completely. Design patterns are misused more often than UML, but it would be foolish to refuse to ever use a design pattern because of widespread misapplication.

The UML tool that I use most often is Violet. It’s nice because it’s so simple that it requires no training or reading of manuals. It’s also cross platform, so I can share Violet diagrams with my coworkers on other platforms. It allows me to quickly get the job done and the tool stays out of the way.

A tool like Violet is not going to scale to doing large diagrams, and it doesn’t have any sort of reverse engineering, code generation, or round-trip tooling built in. But for simple diagramming, the kind that UML is really useful for, it’s hard to beat.

March 16, 2006
» Variations on Publish / Subscribe

Introduction

The publish / subscribe pattern, also known as the observer pattern, is one of the most widely used of the originally cataloged design patterns. It’s a pattern that decouples an object that generates events (the subject) from objects interested in reacting to those events (observers or listeners).

The observer pattern is used more often in client-side GUI applications that other types of programs, but it is a generally applicable pattern. The reason it’s used more in GUI apps is that these kinds of programs tend to be externally event driven, so modeling the internals of the application with an event-driven approach feels very natural and intuitive. Another common use of this pattern is to decouple different logical layers of a program from each other, in order to support different functionality or even different applications that reuse a common base layer.

Even though the observer pattern is widely known and used, there are lots of innovative variations on the pattern that don’t appear as often. I’m going to catalog a few of these variations, but the following list is by no means comprehensive. There are lots of permutations of publish / subscribe out there.

All the variations I’m about to describe have one thing in common: they encapsulate the logic needed to store a collection of listeners and send events to them, so the code that does this doesn’t need to be duplicated throughout an application. I call this code a multicaster. So when you see the term multicaster used in the discussion below, it’s equivalent to “the code that manages a collection of listeners and fires events to them”. I call it a multicaster because the subject (event generator) uses the multicaster to push an event out to a collection of listeners for that event.

Making a Multicaster Look Like a Listener

Consider a very simple interface for a listener object that receives an event:


public interface MyEventListener
{
public void onEvent(MyEvent event);
}


One interesting multicaster implementation is to have the multicaster itself implement the listener interface. The “fire” method simply sends the event to any listeners being managed by the multicaster. With this implementation, the multicaster presents the illusion that there is only one listener, and that the event is being sent once to that listener.

There are two basic ways to implement this variation. You can either hard-code the relevant listener interfaces to your multicaster implementation, or the multicaster implementation can be made more general-purpose by using a runtime proxy approach like Java dynamic proxies.

For an example of the first (non-proxy) approach, take a look at the class java.awt.AWTEventMulticaster that ships with the Java standard library. This is a multicaster implementation that looks like a listener – it directly implements 17 different AWT listener interfaces.

Probably the biggest downside of the multicaster-as-listener approach is that a general purpose multicaster must use runtime proxies (since the Listener type isn’t known at design time), and is always going to be slower because of the runtime proxy generation and overhead. However, this performance hit is often acceptable to many systems, and with techniques such as byte code manipulation, the overhead of runtime proxies can be greatly reduced.

You can find a good example of this approach on the following web site: http://www.t-deli.com/listeners.html. This general-purpose Multicaster implementation generates a runtime proxy that implements a Listener interface. The implementation uses either Java dynamic proxies or (optionally) a 3rd party library to create proxies by bytecode generation.

Managing Lots of Event Types

When designing part of a system that uses the publish / subscribe pattern, it’s common to get to the point where you have more than a handful of individual event types, and it can become tedious to write multicasting code for each event type (even if much of that code is reused in some way).

A nice solution to this problem is a multicaster API that has the same level of complexity whether you have 1 or 100 different event types. For instance, consider the following multicaster API:


public interface Multicaster
{
public void registerEventType(Class eventType, boolean hierarchical);
public void registerListener(Object listener);
public void fireEvent(Object event);
}


In my current project, we’ve written a multicaster implementation with a very similar API to the one above. The multicaster makes a few assumptions about the signature of listener methods: namely, each listener method is a void method that takes a single parameter of an event type. This Multicaster works through reflection. Before you can begin doing real work with it, you need to tell it what events look like. This is done by registering event types (or event base classes if your event types have a hierarchy). Once the multicaster knows which types are valid events, it can reflectively examine each listener added to it to see if the listener can accept any of the known event types. The registered listeners are categorized according to the event types they can receive, and when an event is fired on the multicaster it broadcasts the event to the appropriate listeners.

The big win with a multicaster implementation like this one is that adding new events to the system is trivial and safe – none of the eventing infrastructure needs to be modified in order to support a new event type. Listener interfaces are optional with this design (since “listenability” is dynamically determined) but we still use them for clarity.

Dealing with Concurrency

Here’s what a simple implementation of a multicaster could look like:


public class Multicaster
{
private Set listeners = new HashSet();

public synchronized void addListener(Listener listener) { listeners.add(listener); }
public synchronized void removeListener(Listener listener) { listeners.remove(listener); }
public synchronized void fireEvent(Event event)
{
for (Iterator it = listeners.iterator(); it.hasNext(); )
{
((Listener) it.next()).recieveEvent(event);
}
}
}


Note that all the methods use the synchronized keyword to ensure the class works correctly while being used by multiple threads. This implementation is correct, but has potentially poor performance and isn’t appropriate for a general-purpose Multicaster implementation. The reason is that when the Multicaster fires an event on a listener, it is effectively yielding control to that listener for some unknown amount of time – during which the event firing thread holds the lock on the Multicaster object. This approach forces other threads that simply want to subscribe or unsubscribe a listener to wait for the firing thread.

A common solution to this performance problem is to create a copy of the collection of listeners before firing an event on them. The Multicaster lock is held only long enough to make the copy, and then released while the event is being fired. The fireEvent method above would change to look like this:


public void fireEvent(Event event)
{
Set copy;
synchronized(this)
{
copy = new HashSet(listeners);
}

for (Iterator it = copy.iterator(); it.hasNext(); )
{
((Listener) it.next()).recieveEvent(event);
}
}


This solution is nice because it narrows the window of time during which the lock is held by an event firing thread. This implementation can be improved further by keeping the copy of the collection around, and only re-creating the copy when the real collection has been changed. For many uses of publish / subscribe, the ratio of publication to subscription is high, so only copying when necessary can yield a large performance gain.

In his newest book, Allen Holub presents a very elegant solution to the concurrency problem with publish / subscribe. He manages to come up with a Multicaster implementation that has good performance under high concurrency, and doesn’t suffer from the “copying the listeners too often” problem. In fact, Holub’s implementation doesn’t use the synchronized keyword at all. Instead, the synchronization is “built into” the data structure he used.

The implementation stores the listeners in a linked list, adding new listeners to the head of the list. Events are fired by walking down the list and firing on each listener. If another thread comes in and adds a new listener, it will be able to immediately modify the head of the list without impacting the thread already walking the list to fire listeners. The remove listener case is a little more complicated, and involves making a new copy of the section of the linked list up to the point where the listener to be removed is. The remove case can also happen at the same time that a firing thread is walking down the original list.

This implementation demonstrates an important principle: instead of using synchronization keywords and primitives as a band-aid approach, applied externally to protect access to a data structure, it’s much better to design the data structure to be concurrent in the first place. The code comprising Holub’s Multicaster impl