A Django site.
May 27, 2007
» Cost of broken builds

Somehow my blog editor Performancing managed to publish my blog instead of making it draft. I am sorry for the incomplete blog entry. I have removed it.

April 14, 2007
» Continuous integration kills large projects

Continuous Integration is an widely accepted approach in software development. Martin Fowler describes has written een great article about it.

However, for a large project it does not work. I will explain why not. Suppose that we have a big project of 100 software developers. Each of them is an excellent worker, and only 1% of their deliveries causes the daily build to fail. With 200 working days per year, a developer causes only 2 daily builds per year to fail. Can you beat that?

Now although this is an excellent achievement, with 100 developers this will cause about 200 daily builds to fail, which is... every day the build fails!

So every day, the build manager has to figure out what is wrong. And since each of the developers are excellent workers, the finding the true cause of the build failure is hardly ever a trivial challenge. So with the help of the developers that have delivered, it might well take the whole day before the problem is fixed. Then, the build has to be distributed across all development teams, all development sites and incorporated in the workspaces of the developers. There will be little time left for integrating the new build with the developer's changes in his private workspace. So... continuous integration leads to continuous build failures!

In addition to the continuous build failures, when a developer must stay in sync with the latest build he has to integrate the latest build with his own changes every day. If this takes 1 hour, he will spend 4 hours per week on synchronization, which is at least 10%, day-in-day-out. They probably will get frustrated about going through their old changes every day, just because "build management provides a new build". So... continuous integration lead to frustration!

But when developers get frustrated, they will get sloppy at their work. And when they are sloppy, they are more likely to deliver changes that cause build failures. Let's assume it increases from once per half year to once per quarter. That is 400 build failures per year, or 2 build failures per day. So build managers are confronted with more problems causing the build to fail, which probably requires more work to uncover and fix. One day will not be enough. So... continuous integration stops to be continuous!

What can we do to overcome this problem?

I think we can go in two directions:

  1. Reduce the scope of the continuous integration to smaller teams
  2. Replace continuous integration by iterative integration

You can reduce the scope of the integration by dividing the large project into smaller teams, for example sub-system teams. Each team then performs continuous integration in a local scope, independent of the build of other teams. Only when it succeeds, the changes are promoted to a higher level of integration, for example system integration. Then when the system integration fails you have a choice. Either get the "guilty" developers involved in fixing the system integration, thus extracting them from their sub-system team temporarily, or force the "guilty" sub-system team to incorporate the changes of the (failing) system integration build in their sub-system build.

The other approach is to go to a weekly integration, where all developers deliver to. It will be a "slow" variant of the continuous integration approach. With less frequent integration, the amount of integration problems will increase. But instead of some hours, you have some days to fix them. And developers lose less time on synchronization with the build since they only have to do it once per week. I have seen this work, but only for a relatively quick build. For a build that takes several hours to run, the cycles for fixing the build are too long and even weekly integration will get stuck.

So my conclusion is that for large projects, it is better to divide the project into smaller teams, each with their own continuous build. It requires a more thorough (layered) integration strategy, and propagation of changes will be slower than with continuous integration. Continuous integration does not work for large projects.

Updated 14/4/2007:
Added labels.