A Django site.
August 1, 2008
» Caching Database Queries in Java

Caching database queries can reduce and even remove the performance degradation caused by slow database access. I have just finished an article that discusses steps needed to address the performance problem caused by the heavy database queries such as optimizing and caching database queries and provides code examples.

One thing to keep in mind that a plain query cache usually should be bigger than an entity cache because the variability is higher (query text plus parameters).

A more advanced approach is a Cache of Caches Pattern. The fist (or front) cache using the SQL query text as a key holds second-level caches. The second-level caches hold actual query results for a given parameter set:



Here, the queries executed more often have a better chance of avoiding eviction. Upon eviction from the front cache the query result cache should be destroyed.

If cache look up speed is of utmost importance, distributing the query cache among a bunch of machines may be a better option.

Regards,

Slava Imeshev

May 15, 2008
» Cacheonix at JavaOne Days 2 and 3


Talking to the Crowd
Those two days were filled with talking to the show attendees who were interested in distributed caching and data grids. Actually it was a great pleasure because practically all of the people we talked to had a decent understanding of the problem of the large scale data management and mains sources of the scalability bottleneck such as databases and slow datasources, and how Cacheonix could help them.

It had its funny moments. I was discussing some pretty complex issue with someone when two streams of the guys wanted to talk came to us from both sides. Serguei and me made a simultaneous move towards them trying to say "hang around for a sec". And it looked like we jumped from the guy we were just talking to. There was a very awkward second when that person found itself standing in an empty space alone. I quickly came back and had to apologize. Morale: Don't try to talk to more than one person.

Hi Competitors
Most of the competitors did show up to say "Hi". Cameron Purdy of Tangosol Coherence fame stopped by and we had a nice talk, primarily about the life after Tangosol.

Artima and Javalobby
I have finally had a chance to talk to the guys and gals from Artima and Javalobby in person.


Tear Down
Tearing down and packing up took some 20 minutes. Simplicity rules.

Summary
It was a great show and we are looking forward to the next year.

May 9, 2008
» Cacheonix at JavaOne Day 1: My cache is bigger than yours


The floor
I should say that Sun did a great job organizing the floor. Or, may be it was just the strategic location we picked up. At any rate we liked the format.

Traffic
For us the booth traffic literally took off. There were times when I was talking to four people simultaneously.



On goodies
We brought cool electronic clocks (challenging to set up, though :) After trying to just lay the goodies out we quickly found out that ~50 of those were going in about four minutes. So, it's a great way to fill Java crowd's swag bags, but it is totally useless from the company's point of view. Don't do it. After seeing this not work we were distributing the clocks mostly to the guys and gals we talked to. It seemed to work better.

Running to Caltrain
The Pavilion was open until 8:00PM. After having a few drinks we ended up running to the SF Caltrain station to catch the 8:33 train. It's a 10 minutes trail with a few traffic lights. A good exercise it was, it didn't feel that good given the wine. I should resume going to the gym, clearly.

May 8, 2008
» Cacheonix at JavaOne Day 0: The fastest booth set up ever



I guess it's been my fastest booth set up ever. Setting up the booth for Cacheonix took about 15 minutes from entering the Pavilion to saying goodbye to the security at the entrance and driving back home.

We decided to take simplicity to the extreme and go only with a table, a projector and a screen:



This turned out to be a great choice. We set up a roller in PowerPoint to show eight variants of the display panel in a 10 min cycle. A quick run demonstrated that two images were totally off in the actual lighting environment of the floor. The rest of images were perfect. The set up gave us a crisp, bright and effective display. Bye-bye expensive roll-ups. See how our both stands out compared to the rest:



One thing I would do differently. The environment definitely allowed for a better resolution image. Instead of buying a $499 800x600 Dell 1201MP projector I would get a bit more expensive $599 1024x768 Dell 1409X.

May 2, 2008
» Coloring Local Java Variables in IntelliJ

By default local variables in IntelliJ are not colored in any way. As a result, the local variables are blended with the rest of the code and you have to make a little but present mental effort to identify them when reading the code.

The good news is that IntelliJ allows setting the color of local variables. Here is the snipped of the configuration screen:



The challenge is to come up with the coloring that would look natural and go with the rest of the coloring scheme. After trying to come up with a color that would fit into the already busy scheme, I have finalized on the "black bold" for local variables. It is not very intrusive and you still can spot the local variables easily:



April 2, 2008
» CruiseControl 2.7.2 Released

Jeffrey Fredrick just announced that CruiseControl 2.7.2 is now available for download: http://tinyurl.com/2zm9mz.

There are lots of bug fixes, lots of changes to the Dashboard and some new plug-ins, but the bit that is of most interest to me was (from the release notes)

TeamFoundationServer source control
----------------------
* Fix compatibility with Microsoft Visual Studio Team Foundation Server 2008 (CC-735). Submitted by Martin Woodward.

This was to work around an issue that came up when using CruiseControl (java version) to talk to a TFS2008 server (TFS2005 worked fine and still does).  If you are attempting to use CruiseControl with TFS 2008 then you should go with CruiseControl 2.7.2.  For that matter - if you are using CruiseControl.NET with TFS then you should also take a look at the latest release of the integration to TFS - as that contains the same fix allowing you to happily talk to a 2008 version of Team Foundation Server (also using the TFS 2008 client API's).

Anyway, congratulations to the CruiseControl team on the 2.7.2 release!

February 26, 2008
» Fixing the dreaded "libXp.so.6: cannot open shared object file: No such file or directory"


If you, like us, deal with the older versions of JDK on the newer versions of Linux from time to time, you may get this message when running JDK:



"libXp.so.6: cannot open shared object file: No such file or directory"



The way to fix it is to run two following commands:


sudo yum install libXp

sudo yum install xorg-x11-deprecated-libs



Hope this helps.



Regards,



Slava Imeshev

January 2, 2008
» [Java] Don’t call subclass methods from a superclass constructor

When designing a class for subclassing, it’s important to avoid calling any method that the subclass could or must override from the superclass constructor. This includes any non-final public or protected methods (either abstract or concrete). The only methods that are appropriate to call from the superclass constructor are final methods or private methods in the superclass.

Unfortunately, this mistake is all too easy to make – I’ve made it myself lots of times. Usually, this is done because the superclass has some invariant that must be enforced, and part of enforcing the invariant involves calling at least one method that the subclass could or must override.

For example, consider the following contrived class:


class SuperClass {
private final FooBar fooBar;
private String message;

public SuperClass(FooBar x) {
fooBar = x;
message = computeMessage(x);
}

protected String computeMessage(FooBar x) {
return x.someMethod() + "..."; // details of the message are not important for this example
}
}


This class exhibits the problem this entry describes. Class ‘SuperClass’ is designed for subclassing, but it calls an abstract method from its constructor. The intention of this class is to enforce the invariant that the ‘message’ field will never be null after construction, and computing the correct value for the ‘message’ field involves the subclass. Even though this is a contrived example, many needs for calling subclass methods from a superclass constructor have this form in essence.

The reason this is bad is because in Java, superclass constructors run before subclass constructors (that is, a subclass must invoke super() before other statements in its constructors). By invoking a subclass method, code in the subclass will be invoked before the subclass gets a chance to initialize itself. The subclass will see default values for all its fields, and will not have access to any data passed in the subclass constructor.

For example, consider the following subclass of the class given above:


class SubClass extends SuperClass {
private MessageHelper helper;

public SubClass(FooBar x, MessageHelper helper) {
super(x);
this.helper = helper;
}

protected String computeMessage(FooBar x) {
return helper.getMessage(x); // NullPointerException!
}
}


In this example, the subclass takes a helper object in its constructor. The helper object is used in the overridden abstract method. However, because the superclass constructor calls the abstract method, the subclass does not have a chance to assign the helper object to a field (its constructor has not run yet). This requires the subclass to either guard for this condition (but what should it do when the condition happens?) or contain a NPE bug as in the example above.

Fortunately, this problem is easy to fix with the following design pattern. First, observe that there are two issues at play here: (1) enforcing an invariant and (2) designing for subclassing. The best way to satisfy both issues is to make the superclass abstract and require (through documentation) that all subclasses enforce the invariant in their constructor.

For example, here is a fixed version of ‘SuperClass’ and ‘SubClass’ from above that fix the problem:


abstract class SuperClass {
private final FooBar fooBar;
private String message;

protected A(FooBar x) {
fooBar = x;
}

protected final void initialize() {
message = computeMessage(fooBar);
}

protected abstract String computeMessage(FooBar x);
}

class SubClass extends SuperClass {
private MessageHelper helper;

public SubClass(FooBar x, MessageHelper helper) {
super(x);
this.helper = helper;
initialize(); // subclass MUST call initialize
}

protected String computeMessage(FooBar x) {
return helper.getMessage(x); // no NullPointerException
}
}


The superclass is made abstract to avoid the possibility of creating an instance in which the invariant is not satisfied. This is OK, since a fundamental part of this design is to allow for subclassing. In addition, all subclasses must call the protected initialize() method from their subclass constructor. If they fail to do so, the superclass-subclass contract is broken and invariants can’t be guaranteed.

The fix may be considered a little cumbersome. It requires the creation of at least one other class (since before the fix, the superclass could be used directly). The fix also requires (but can’t enforce through the compiler) that subclasses all follow certain rules. However, these tradeoffs are much better that having the situation in which subclass code is invoked before the subclass can initialize itself. By the way, I am pretty sure this issue is also discussed in Effective Java (I’d be sure, but I don’t have a copy handy at the moment).




November 14, 2007
» On Red Hat and Sun Collaborating On Open Source Java

According to the press release, Red Hat announced today that it is joining the OpenJDK community.

This is an interesting topic, though, the background, as I see it, may be marketing rather than technological.

Nowadays press releases are a purely marketing engine. Sun is a big shop, RedHat is not an exactly small one too, at least in terms of mind share. This means that the release has gone through a thorough polishing and whatever the message is, it is sent to analysts and [possibly] shareholders.

This is a total speculation, but what about RedHat trying to ride the wave of disappointment caused by lack of Java 6 on Mac OS X? Is it possible that RedHat is displaying full collaboration with Sun, as compared to "non-collaborative" Apple which may be seen by RedHat as a rival on non-Windows OS market?

For Sun it would be just "good publicity never hurts".

I am thinking about these options on because, let us be serious, there haven't been any real problems running Java on RedHat or any other Linux distributions for years. It takes me 2 minutes or less to have a Java app running on RedHat, even while comparing to "native" Perl, Ruby or PHP apps requiring downloading hundreds of megabytes dependencies.

Regards,

Slava Imeshev

October 15, 2007
» Managed Environments

When writing Java code, it’s useful to differentiate between two types of targets: a “normal” environment and a “managed” environment [1]. The difference between the two is simple. In a normal environment, you (the person writing the code) call the main() method [2]. In a managed environment, you do not. Managed environments are sometimes called container environments because they usually follow a containment or hosting model. In this model, the host container is the code that contains the main() method, and independent units of third-party code (hereafter plugins[3]) are managed by the container.

One of the most familiar examples of managed environments to most Java developers is the application server model used to host server-side Java applications. Such an application server can range from a full-blown JEE container implementation to a smaller-scale operation that hosts servlets with possibly a few EE features. Another example of a managed environment that most Java developers will have interacted with is the Eclipse IDE. Eclipse is entirely built around a container model (for the last few versions, that model has been OSGi). Users of Eclipse need not know or care about this, but anyone who writes plugins for Eclipse must understand the model and its implications.

It’s helpful to think about the differences between managed and normal environments. In theory, the environment should be mostly transparent to the code executing in that environment. In practice, however, there are some issues that need to be considered. This is especially true if you are writing library-level code: some libraries are much more “container-friendly” than others. If you are writing application-level code this may or may not be an important issue for you. Occasionally, you might have a single code base that you want to use in both types of environments.

The biggest difference between the two types of environments is the classloader layout. Normal Java environments are usually pretty simple in this regard [4]. Without going into unnecessary detail, this layout contains a bootstrap classloader that loads the system classes (like java.*) and an application classloader that loads everything else, including the classes that you write and any libraries those classes depend on. In contrast, a managed environment usually implies a more complicated classloader hierarchy. There will typically be a separate classloader for each plugin. There will often also be one or more classloaders in the hierarchy to support the container itself, and one or more classloaders that are shared among all of the plugins [5].

Why is the classloader layout important to think about? Two big reasons: visibility and static variables. Visibility is a simple property that has to do with the relationship between classloaders. Class A has visibility of class B if A’s classloader is able to load class B. Visibility becomes important in a variety of contexts. For example, many APIs will instantiate objects for you. In order to do this, they might need to have visibility of the class to be instantiated [6]. Static variables are associated with a class instead of an instance of that class, and are often used to obtain global variable semantics in Java. However, static variables aren’t really that similar to global variables – they do have a scope, and that scope is the class they are associated with. Such variables are only visible to classes that have visibility to the static variable class. If two different classloaders in the application each load a class that defines a static variable, there will be two instances of that variable. This situation may or may not be anticipated by the author of the class.

In the “normal” environment, it’s hard to make things go wrong classloader-wise. Any libraries your application code has dependencies on are pretty much guaranteed to have visibility of your code, since there’s essentially just one flat classloading space. Any static variables behave mostly like global variables for the same reason. In fact, you can pretty much ignore many classloader issues in a normal environment.

In a managed environment, there is much more potential for things to go wrong. As mentioned above, each plugin will have its own classloader. Most managed environments have complicated rules or configuration options that govern the relationships between the plugin classloaders, any shared classloaders, the container classloaders, and the bootstrap classloader. Since there is not a flat classloading space, visibility issues come into play. For instance, a library may not be able to instantiate an object because it does not have visibility to a classloader. A class may be loaded twice, and then both versions of the class may be attempted to be used in the same context (leading to very confusing ClassCastExceptions!). In short, all kinds of problems can happen, and debugging them can make for hours and hours of fun.

A particularly interesting problem that often occurs in managed environments is an insidious type of memory leak. Managed environments often provide for some amount of dynamic loading and unloading of plugins at runtime. This is most often implemented by discarding the plugin classloader when the plugin is unloaded. The plugin classloader contains references to all of the plugin classes, which in turn contain references to any static variables defined by the plugin. If all goes well, the garbage collector can reclaim the plugin classloader and everything referenced by it. However, it only takes one outside reference to a single object from the plugin to keep the classloader from being garbage collected. This is because every object has a reference to the classloader that loaded it. It turns out to be very non-trivial to cleanly unload a plugin at runtime – just perform a web search for “classloader memory leaks” to read all sorts of war stories. The Java language and libraries are just not designed for that kind of thing. Luckily, most managed environments do not have core functionality that depends on dynamic unloading [7].

Another problem occurs with the issue of configuration. Let me skip straight to an example: using Java system properties for configuration simply does not work in a managed environment [8]. System properties have global scope – when a system property is set, the value is read by every other part of the application. In particular, system properties span classloader visibility. Suppose a managed environment contains plugin A and plugin B, both of which depend on a library which is placed in a shared classloader space. Assume further that said library relies on system properties for configuration. This is bad because plugin A can now affect plugin B’s configuration.

So when writing Java code, whether a library or not, consider the target environment of your code. Managed environments are becoming more and more common. Managed environment techniques like custom classloaders that would have been considered advanced 5 years ago are now much more commonplace, which means you’re much more likely to run into them. Managed environments, long the standby of the server-side Java space, are now increasingly popular on the desktop. The Java community is starting to take a lot of interest in standardizing managed environments [9]. If you’re writing application-level code, understand which type of environment you’re targeting. If you’re writing a library, take the time to write container-friendly code.

Footnotes:


[1] The term “managed” here means something different than another common use of the word: in the .NET world, the term “managed” refers to code that uses automatic memory management as opposed to manual allocation and freeing. That’s not the meaning I’m using in this article.


[2] That is, the public static void main(String[] args) method. In other words, you control the entry point of the Java application.


[3] Plugins may not be the best term, but “containees” or “independent third-party units of code” was just too unwieldy to use throughout the rest of the article. If you’re a server-side person, just substitute “web application” everywhere you see “plugin”.


[4] But not always. As non-trivial applications grow, they almost always eventually start playing magic classloader tricks to support various features. At that point, they start to look a lot more like managed environments from the perspective of some of the code.


[5] For example, see
this document for a description of the classloader hierarchy in one managed environment.


[6] Of course, a well-written API that does this will not require such visibility. Instead, it will either require that a classloader that does have visibility be passed, or it will make use of hacks such as the
thread context classloader.


[7] For instance, most JEE containers support hot redeploying of applications. This functionality is very useful during development but rarely enabled when the application is put into production.


[8] Despite the fact that it does not work, many libraries do it anyway. You’ve been warned! ;-)


[9] See
OSGi, JSR 277, and JSR 291.

August 16, 2007
» Simple Concurrency Guidelines for Designing APIs

When designing an API, one of the considerations you usually have to address is concurrency. In other words, for every class in the API, what is the class's threading policy? At the very minimum, an API should document how it behaves with regard to concurrent access, and even better is an API designed with concurrent access in mind.

Writing a good API that is concurrency-friendly is hard. More than anything, it requires lots of reasoning about how all of the moving pieces will work together under concurrent access. In this article I’m going to discuss a few simple guidelines you can follow when designing APIs and reasoning about concurrency. The specifics of what I’ll discuss apply to Java, although the general concepts probably apply equally well to .NET and other similar environments.

The goal is, in general, to choose the design that is the easist to reason about and still has acceptable concurrent performance and API usability. Here are four guidelines that can be used as a starting point when thinking about what the threading policy of a class should be:

1) For non-collection-oriented classes, default to immutable
2) For collection-oriented classes, default to mutable and thread safe
3) Prefer collections of immutable elements
4) For collections of mutable elements, make copies and treat the contained elements as immutable internally

These guidelines will help you to design classes that make it easier for the consumer of your API to reason about concurrency policy. This reasoning is made possible by clear documentation of each class’s design for concurrent access. Failure to consider the threading policy for a class leads to under-documented APIs, unexpected behavior at runtime, and obscure bugs that are hard to reproduce. Documenting the concurrency of an API is essential in order to guide the API’s consumers to correctly using the API.

Note that documenting an API as “not thread safe” is a valid design choice. It may not be the best choice (depending on the API). However, it is better to document that a class isn’t thread safe than to make the API consumer guess or rely on undocumented behavior that could change.

In the discussion below, I make a distinction between collection-oriented classes and non-collection-oriented classes. Collection-oriented classes contain multiple similar child objects (elements). They are usually easy to recognize since they often contain methods to add elements, remove elements, find elements, and perform similar operations across the contained elements. Collection-oriented classes shouldn’t be confused with composite classes that are made by composing together dissimilar classes.

Now I’ll go over each guideline. Remember that ultimately, we are looking at a class and trying to determine what a reasonable threading policy for that class might be.

For non-collection-oriented classes, default to immutable

Immutable classes are one of the best design choices you can make when designing an API. For classes that aren’t collection-oriented, default to using an immutable design.

Only design mutable classes if it seems that API consumers would be greatly inconvenienced by the immutable classes. Usually though, immutable non-collection-oriented classes aren’t a big hassle for users.

Immutable classes have the inherent property of being safe for simultaneous access from multiple threads. You don’t have to do any internal or external locking. This is the first big advantage of immutable classes – thread safety for free and no performance hit under concurrent access.

Another big advantage of immutable classes is that when you use them, you can more easily build a mental model of the system you’re designing. Since immutable classes are very easy to reason about, it takes less mental RAM to think about how instances will behave at runtime - they can only ever be in a single state (post construction).

Immutable classes are also a great way to set API consumer expectations. One of the most frustrating things about an API is when some class, say a service of some kind, takes a mutable object as initialization data. What happens after you hand the mutable configuration data off to the service? Are you still allowed to modify the configuration data? Does the service care? Will it break? Using an immutable class design for the configuration data instead solves these problems.

Designing immutable classes is extremely easy in Java. Briefly, here are the requirements you should satisfy when designing a class in order to call it immutable:

1) All fields in the class should be declared final
2) All fields should be either immutable objects or mutable objects that are not mutated outside of a constructor
3) The this reference does not escape a constructor
4) No mutable objects passed to a constructor are retained by the instance or any of its components
5) No mutable objects escape the instance

Satisfying these requirements is not the only way to design a thread safe immutable object, but other techniques are much harder to explain and require deep knowledge of Java’s memory model.

For collection-oriented classes, default to mutable and thread safe

When designing a collection-oriented class, the default choice should be to write a mutable container that can be safely accessed by multiple threads concurrently. Most collection-oriented classes should be mutable since that’s what the API consumer will expect. When using a collection-oriented class, the most common reason is because you want to modify the collection (add or remove elements) or modify the contained elements themselves. An immutable collection-oriented object makes consumers jump through hoops to do this.

Often, collection-oriented classes in an API will be used only from a single thread for many scenarios. It can be tempting not perform the necessary locking to make these classes thread safe, since that locking will be unnecessary for the majority of usage scenarios. In this case, one valid design decision would be to skip the locking and document the collection as not being thread safe. However, doing this penalizes the users of the class who are in concurrent-access scenarios, since the burden of implementing thread safety is now on them.

I recommend going ahead and doing the internal locking to make these types of classes thread safe. On modern Java runtimes, the cost of uncontended synchronization is extremely low (JCIP talks about this in detail). Even when the common case is single-thread use, it’s better to design a class to be thread safe if there are potential concurrent scenarios. If a profile reveals that the locking is a hotspot when using the class, then you have a good reason to avoid it. Otherwise, assume that locks are essentially free when uncontended.

Although it’s not the first choice I’d make, for some APIs immutable collection-oriented classes may not be a bad decision. You get all of the advantages mentioned above for immutable classes. If API consumers will not often need to mutate the collection or the collection’s elements, doing this may make sense. For example, some collection-oriented objects are often just passed around to other parts of the API, and rarely changed. If you decide to design an immutable collection-oriented class, be sure to document this very explicitly. Also, avoid method names that imply that the receiver is being changed as they can be confusing to a casual user of the API. An immutable collection-oriented class should not have a "public void add(Foo f)" method since the signature of that method implies that it alters the receiver.

Prefer collections of immutable elements

Whenever possible, collection-oriented classes should contain immutable elements. This helps reduce confusion about the API, since it is clear that the state of the elements can’t be changed while the collection contains them. I’ve found that designs that use small, immutable objects as building blocks and contain them in mutable containers tend to be very robust and easy to understand.

The biggest reason this is important is that most non-trivial collection-oriented classes have one or more invariants that they must enforce. For instance, a collection-oriented class may store elements that each have an identifier of some sort, and the collection may guarantee that contained elements will have unique identifiers. This is just an example – often the constraints can be much more complicated. If the contained elements are externally mutable, it will be very hard or even impossible for the collection to enforce those invariants.

Mutable collections of immutable objects are also very fast to make copies of. Since the contained elements are immutable, copying the collection only involves making a shallow copy. Supporting copies of collections is important for many APIs, so anything that makes this easier and faster is a win.

For collections of mutable elements, make copies and treat the contained elements as immutable internally

Unfortunately, it’s not always possible to design using only collections of immutable elements. Often, for one reason or another, the collection must contain mutable elements. One case of this is a collection-oriented class where the elements themselves are collection-oriented. No matter the specifics, there is a design pattern you can follow in this case.

The collection should make copies of the mutable elements as they are added or retrieved, ensuring that no external clients have a reference to the actual contained instances. In other words, every time a mutable element is added to the collection, the collection makes a copy and adds the copy instead. Every time a mutable element would be obtained, a copy is obtained instead. Internally, the collection should treat the contained elements as though they were immutable and should not call any method on the elements that could change them. By doing this, the collection will contain “effectively immutable” elements. The collection is then free to enforce constraints on the contained elements and know that no external client can break the constraints.

I can attest that this kind of design works well, but it does require some careful API documentation. Without proper documentation, clients may expect that they can obtain a contained element, mutate it, and have those changes automatically show up in the collection. Instead, this kind of design facilitates a more transactional usage. Clients obtain an instance, perform some changes to it, and then must add that instance back in to the collection. This usually isn’t a huge burden on the clients as long as expectations are set correctly.

That covers the 4 guidelines. Note that these guidelines are really meant to only cover simple cases – however, the simple cases make up the bulk of most APIs. There are certainly complex classes that don’t fall easily into one of the categories above, and they will need to be designed with more thought.

June 29, 2007
» Today's Big Launch

The blogosphere today is buzzing with news of the other launch happening today, but there is one a bit closer to my own heart - Eclipse 3.3 (Europa) has been released.  I'm downloading it right now.

I've been running Eclipse 3.3 since Milestone 3 at the start of the year, and as the releases have been coming out it has been getting better and better.  Interestingly, the download site today has broken down Eclipse into separate versions geared towards downloading the parts that different audiences are interested in - bringing the straight "Eclipse IDE for Java Developers" down to 78MB.  The version I need "Eclipse for RCP/Plug-in Developers" is a more substantial 153MB.

My immediate needs in Eclipse 3.3 was support for the Windows Vista native UI widgets (including things like the Vista tree control).  The version of SWT that was shipping at the time of Vista launch had a weird bug which caused the JVM to crash randomly, but was fixed early in the 3.3 codebase.  SWT in the 3.3 release has also got a version which renders using WPF rather than Win32.  I'm still not really sure what the reasoning behind a WPF version, but it is funny to compile Teamprise Explorer against this version of WPF and then zoom in using the magnification tool in Vista and everything is all smooth as it is vector based.  Performance sucks with the WPF version - but still.  With Teamprise Explorer compiled against the 3.3 Win32 SWT libraries, performance is super with the application looking more native on Vista than ones written using .NET 3.0.

Other the next few weeks I'm also going to try looking into some of the additional Europa projects.  The whole organization of the Eclipse Open Source project is very interesting to watch.  Today sees the simultaneous launch of 21 separate open source projects - many of which have dependencies on other projects.  The complexity is very interesting and yet (from the outside at least) seems to work impressively well.  Eclipse has been very good at doing releases every year, with substantial improvements as well as incremental changes.

As I write, I am 89% done downloading.  I'll let you know how I get on.  If anyone is queuing for the other launch, be sure to let me know how that goes.

March 9, 2007
» Embedded Beanshell for Runtime Diagnostics

For the Teamprise product line, I've always had a focus on providing comprehensive runtime diagnostics. This feature gets used any time we have a customer report a problem. With just a few clicks, the customer can produce a zip archive containing a complete set of diagnostic data that our support team can peruse through. This kind of thing really cuts down on the back-and-forth usually involved with customer support. I should really do a complete blog entry sometime on the entire diagnostic system, since it's a pretty neat piece of engineering. But today, I'm going to talk about just one aspect of our runtime diagnostic system.

When a customer invokes our support dialog, one of the tabs is labeled "BeanShell" and looks like this:


BeanShell is a popular script engine for Java. It interprets a lightweight, dynamically typed scripting language that has syntax very similar to Java syntax. BeanShell has a small footprint and is perfect for being embedded in other applications. Embedded BeanShell could be used for many purposes - a plug-in system, an administrative console, and lots of others. In the Teamprise support dialog, we use Embedded BeanShell to allow us to run arbitrary diagnostic commands at runtime.

The basic idea is that a Teamprise contact can provide a customer with a snippet of BeanShell code that was composed to track down a problem. The customer pastes the snippet into the BeanShell dialog, and presses the eval button. Any results of the snippet are logged as well as displayed in the dialog. The customer can then either send us their logs or report on what the output was.

Of course, for this kind of thing to be useful, the BeanShell script must be able to access important objects in the system. BeanShell has the perfect feature for this - when setting up the interpreter programmatically, you can predefine certain variables that the script can then access. We predefine variables for the Eclipse workspace, the TFS connection object, and lots of other important "root" objects in our system.

Just as an example, here is a snippet of BeanShell. When run in our support dialog this will print out all the work item queries for the first Team Project on the server:


workItemClient = tfsConnection.getClient(com.teamprise.core.WorkItemClient.class);
project = workItemClient.getProjects().getProjects()[0];
for (query:project.getStoredQueries().getQueriesByScope(null)) {
print(project.getName() + "/" + query.getName() + ":\n" + query.getQueryText() + "\n");
}

This script makes use of the predefined tfsConnection variable, which is our connection to the TFS server.

No matter how comprehensive a diagnostic reporting system is, there will always be data you don't have the forethought to collect at development time. Using an embeddable scripting language like this gives us control to run diagnostic code at runtime and allows our customers to quickly run new diagnostic tests that we create for them.

October 3, 2006
» Locale sensitive String sorting in Java

So, the day after I get made a Microsoft MVP I do two posts about Java - go figure.  Anyway, today I had one of those moments where you thought you understood something and then realize you didn't and probably a lot of your code that you've written over the past 10 years doesn't work as well as you thought...  All this with the humble String.compareTo method.

Take the following strings:-

  • charlotte
  • Chloé
  • Raoul
  • Real
  • Réal
  • Rico

In .NET, if you want to perform a standard case insensitive, dictionary based comparison between two strings then you can use the String.Compare method.  This does a culture based, case insensitive comparison.

In Java, if you were to do use the Comparable interface which makes use of the standard String.compareTo method to sort a list, you would end up with:-

  • Chloé
  • Raoul
  • Real
  • Rico
  • Réal
  • charlotte

That is because compareTo looks at the unicode value of the character and sorts on that - which for those of us that tend to live in the ASCII range tends to work ok (only that lowercase letters come after the uppercase ones) - however if you have a language that uses one of the many other characters it doesn't work so well.  If you had a language where M comes before A in the alphabet you are totally screwed.

This is were you should be using the java.text.Collator class in Java.  The Collator class does locale sensitive string comparisons - i.e. allowing you to do a dictionary base sort of a set of strings.

Dope.  One of those classes I should have been using for a while...  I thought I was just being dumb, but then a couple of other people I mentioned this to were not aware of the issue so I thought it worth a blog post.

» Assert in Eclipse

One of the things that Java IDE's have always had over Visual Studio is the ability to target older versions of the VM from the latest and greatest versions of the tools.  For example, I develop in Eclipse 3.2 day to day, but I target Eclipse 3.0 on Java 1.4 for compilations and to debug against.  That way I get errors in Eclipse 3.2 if I try to use a method that isn't in the Eclipse 3.0 object model.  Very useful.  That said - I had a problem recently because my IDE was telling me that I couldn't use the "assert" keyword which was introduced in Java 1.4 (which we require for Teamprise).

The problem was in Windows, Preferences, Java, Compiler.  Source compatibility was set to Java 1.3 and .class file compatibility set to Java 1.2 - I corrected these preferences to make them allow Java 1.4 source and class files and now the assert keyword works just fine.

August 24, 2006
» Closures for Java

It looks like there's a very good chance that the Java language will finally get closures. This is, of course, a direct result of the influence of the programming techniques that Ruby (and Rails) has popularized. After all, Sun has considered the issue of closures before. Back when Microsoft had J++ Sun published a whitepaper declaring that a new J++ feature called delegates (a form of closures) was totally unneccessary. The Sun party line has always been that inner classes provide all the benefits of closures without adding additional language constructs.

Of course, J++ is now gone, Microsoft has .NET (with delegates), Ruby on Rails has exploded in popularity, and Java programmers are starting to envy their counterparts who can do in one line of code what takes them 5. Maybe inner classes aren't the answer to everything. Heck, maybe even objects aren't a panacea.

I really hope this makes it into the language. My top three wishes for the Java language are closures, type inference, and eliminating checked exceptions, so it's great to know that at least one of those has a chance at being reality. Of course, as the closures feature is slated for Java 7, it will still be many years before most Java programmers will get to use closures in day to day work (how many shops are still on Java 1.3?).

» Java: Advantages of Interfaces

Occasionally I hear the claim that creating an interface is only justified if there are multiple implementations of the interface. Developers will sometimes claim that an interface with only one implementation is a violation of YAGNI or is an example of an unnecessary complex (ie overengineered) design. While it certainly is possible to misuse and abuse interfaces, claims like these show misunderstanding of some of the most important reasons for using an interface.

Interfaces are commonly used to provide polymorphic behavior, and this is of course a valid use. It's also the way that interfaces are usually taught, so this is the scenario that many developers associate with interface use. However there are lots of other uses for interfaces, including some that involve only having a single implementation.

Contracts

If you asked me to define what an interface is, I'd reply that an interface defines a contract. Of course, a class defines a contract as well, so I should refine that definition. An interface defines an only a contract - nothing more. I often use interfaces solely for the purpose of clarifying an existing implicit contract between two classes and making it explicit.

Why is it important to think in terms of contracts? It's all about coupling. Explicitly defining the contracts in play during class interactions forces you to think hard about how coupled together classes are. Thinking in terms of contracts often leads to refactorings that can greatly improve the design of code, which leads me to...

Separation of Concerns

Here's an experiment. Randomly choose one of the biggest classes in the codebase that you work on (remembering that this article is primarily about Java and Java-like languages). For many projects this would be a class that's more than a few thousand lines long. I'm going to make a claim that the majority of the time, this class is suffering from either a) lack of separation of concerns, b) duplicated code, or c) both. If the class contains many blocks of code with a striking resemblance to each other, it's probably a victim of the copy-and-paste coding technique, and a little bit of refactoring might go a long way towards cleaning that up. On the other hand, if the class contains lots of dissimilar code it's probably a "spaghetti" class and could use some separation of concerns.

Interfaces are great for separating out concerns in a class like this. By identifying each concern, writing an interface that defines that concern, and then altering the class to code to the interface, you can greatly reduce the size and complexity of the class. (Arguably, you may also be increasing the overall complexity of the system - it's always a tradeoff). I would almost always rather see a set of small classes with interfaces that define the contracts between them rather than one huge mangled class.

API Publishing

Interfaces are great to use when publishing APIs. By publishing an interface and keeping the implementation undocumented and internal, you can achieve benefits for both the API producer and consumer. Producers gain the advantage of having a clear delineation between what is API and what isn't, and consumers won't be tempted to depend on "implementation details".

Interfaces are a reification of the "what not how" principle of design. By publishing only the "what" as public API, you are free to make internal structural changes to the "how" without causing any client breakages.

Of course, the argument is often made that abstract classes are better for APIs than interfaces because more changes can be made to an abstract class without breaking existing clients. There is certainly merit to this argument, but I think it is mostly true when interfaces have been misused. Interfaces should be short and focused. There are many techniques for evolving an interface based API, most of which involve using composition and adapters to allow for both new and old interfaces to coexist peacefully.

Use Interfaces

Interfaces carry few costs (including having little or no performance costs) and have many advantages that go beyond simplistic polymorphism use cases. Interfaces have the ability to break up complex designs and make clear the dependencies between objects. There are many important use cases I haven't even touched on at all, like using interfaces to make objects more testable and using interfaces to increase configurability of systems.

I like to think of interface use as a tool for increasing the clarity of my designs, and it's a tool I'm glad to have in my toolbox.



August 15, 2006
» Java: Dynamic Proxies and InvocationTargetException

Recently I was fixing a bug in some code I'm responsible for, and the bug was interesting and general enough to share the details of.

A common approach in Java is to use dynamic proxies to provide decorator-style behavior. Doing this allows you to add additional behavior "around" an object without the object itself or it's callers being aware of the decorating. The only requirement to use this built-in dynamic proxying is that the object must be accessed through an interface (third-party bytecode generation products like cglib do not have this restriction). For an example of how this technique is used, see my entry about Java active objects.

The key part of creating a dynamic proxy is to implement the InvocationHandler interface. The dynamic proxy object (which is generated by Java library code) calls the invoke method of this interface to dispatch method invocations at runtime.

An extremely common pattern is to implement the InvocationHandler interface something like this:


class MyHandler implements InvocationHandler {
private Object delegate;

public MyHandler(Object delegate) {
this.delegate = delegate;
}

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
// wrapping behavior could go here, before the real method is invoked ....
Object value = method.invoke(delegate, args);
// ... or here, after the real method returns
return value;
}
}

The idea here is that the dynamic proxy is providing some AOP-style behavior (resource management, logging, security, etc), but the real work is being done by a delegate object. The delegate object directly implements the interface that is being proxied, and all of the interface methods are forwarded to the delegate by the InvocationHandler (using Java reflection).

There is a subtle bug with the above code, and any InvocationHandler written as above should be treated as suspect.

If you read the documentation for the java.lang.reflect.Method.invoke method you'll see that it can throw an InvocationTargetException. This occurs when the method being reflectively invoked throws any exception.

Part of the reason the InvocationTargetException class exists is because of Java's checked exceptions. Since reflectively calling a method could result in a checked exception being thrown but not handled (since the signature of Method.invoke does not declare it), all exceptions thrown by the target are wrapped in a checked InvocationTargetException, which is declared in the signature of Method.invoke.

Normally when calling Method.invoke() the InvocationTargetException must be explicitly handled, since it is a checked exception. However, in the above case we're calling it from inside an InvocationHandler's invoke method, whose signature declares that it throws Throwable. Because of this, it is very easy to write an InvocationHandler that throws an InvocationTargetException out of its invoke method (which the above code will do).

Now if you read the documentation for the InvocationHandler.invoke method, you'll see that it describes how Java dynamic proxies respond to any Throwable thrown out of the InvocationHandler. In particular, if the exception is either a checked exception that is declared by the proxied interface, or is an unchecked exception, it will be propagated directly to the caller of the proxied method. However, if the exception is checked and is not declared by the proxied interface, it will first be wrapped in an UndeclaredThrowableException. This is analogous in some ways to how Method.invoke wraps all exceptions in an InvocationTargetException. Again, the reason has a lot to do with the checked exception system in Java.

Remembering that InvocationTargetException is a checked exception, what this all boils down to is that any InvocationHandler written as above does not explictly handle the InvocationTargetException from Method.invoke() and will end up propogating an UndeclaredThrowableException to client code. The client code calling the proxied method is hardly ever going to expect this exception.

Given that most of the time, the goal of dynamic proxying is to provide transparent proxying of a service, this situation is hardly going to result in transparency. When client code invokes a proxied method, and the "real" implementation throws any exception (checked or unchecked), that exception should propagated to the calling client code. Any other implementation will result in client code that needs to be aware of the proxying, which means losing one of the main advantages of using dynamic proxies in the first place.

Here's the correct implementation of the InvocationHandler.invoke() method from the above code:

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
try {
Object value = method.invoke(delegate, args);
return value;
}
catch (InvocationTargetException ex) {
throw ex.getCause();
}
}

This implementation has all the right properties: any exception (checked or unchecked) thrown by the "real" method will be directly propagated to callers of the dynamic proxy. Re-throwing the InvocationTargetHandler's cause will not repopulate that exception's stack trace, so the client code will be able to see the real cause of the exception including the original stack trace.

If the thrown exception will propagate all the way to the top of the call stack, the distinction between the above two methods doesn't really matter much. That is, in the case where the exception is totally unexpected by client code and will be caught by a top-level exception handler, the second approach isn't technically needed. However, client code often explicitly handles exceptions and performs some sort of recovery. The only way to allow client code that does that is to properly handle the InvocationTargetException as in the second code example.

Admittedly the first code example is buggy. However, a lot of this complexity could have been avoided were it not for the Java language's checked exception design. This feature is a source of controversy among Java programmers - some love checked exceptions while others have grown to hate them. I admit that I can see both sides of the argument, but I fall firmly on the side of the fence that says checked exceptions were a nice experiment but have proven to be a failure.

June 30, 2006
» Eclipse 3.2 is out

Eclipse 3.2 is final as of today (downloads, torrents) (updated torrent link - thanks Martin). This is the once-a-year major new release of the Eclipse SDK (read the New and Noteworthy document for some highlights of what's been added and improved). Something new this year is that 9 other top-level Eclipse projects are being released simultaneously along with the SDK. This "release train" is codenamed Callisto.

I've been using early versions of the 3.2 SDK for months now, and just like in years past, it's a solid release. Lots of new features and better performance. Of course, we've tested our Eclipse plugin product (Teamprise) and it's working great with the new Eclipse release.

» Pack200

Perhaps you’ve heard the term Pack200 before but haven’t had a chance to become familiar with it. Or maybe you already know that Pack200 is related to deployment of Java applications, but aren’t sure how it could be used with your application. Compared to other new features of Java 5, Pack200 hasn’t gotten as much attention. In this entry I’ll answer two questions: first, What is Pack200?, and second, Why would I want to use it?

Pack200 was released as part of the Java 5 platform, and is essentially a technology for achieving much better compression ratios of deployable Java code. Java code has traditionally been packaged and deployed as JAR (Java Archive) files, which are nothing more than standard zip files with the extension .jar. Pack200 can result in radically higher compression ratios of Java bytecode when compared to traditional JAR packaging.

The name Pack200 is derived from two sources, and to understand why the name was chosen you have to know a little bit of the history of the technology. William Pugh (best known as the developer of FindBugs) released a paper detailing a number of advanced techniques for compressing Java class files. These techniques were used by Sun to decrease the size of the JRE and the JDK downloads, starting sometime around the Java 1.4.1 release. William Pugh’s ideas and the format Sun used were referred to as Pack. Around the same time period, Java Web Start / JNLP technology was becoming relatively popular and Java applets were also starting to become more popular once again. A JSR (Java Specification Request) was created called JSR 200. The JSR was created to specify a “dense download” format for Java bytecode, based on the technology presented by William Pugh in his paper and already in use internally at Sun. This technology was eventually called Pack200 (from the JSR number) and became public and supported with Java 5.

The target use case for JSR 200 was to enable more optimal web deployment of Java applications, specifically in the case of Java Web Start and applet applications. The motivation there was to reduce the download / update time for the client and the bandwidth usage for the server. Java 5 includes hooks in the JNLP bits