Next Thursday, November 18th, there will be an event organised by Bits&Chips; about managing complex projects. Sure, it will be mostly in Dutch but it will be interesting anyway.
The challenge for software projects is to increase the productivity significantly; not by 10% or 50% but with 10 to 100 times or more. Will current process improvement initiatives achieve that? I doubt it. In my opinion, we need to reduce the size of projects instead of increasing it, we need to replace manual work and communication by machine work.
In the article Writing a book is like developing software, Roy Osherove compares writing a book with writing software. Now let's assume that we would write books like we develop software.
Most books are written by a single author. Take for instance The Lord of the Rings by J.R.R. Tolkien. Suppose we would take up the assignment to write a sequel of this bestseller, but develop it like a software product. Suppose some of the requirements are
- It must appear on the market in English, Spanish and Russian simultaneously
- It must appear before Christmas 2007
- It must equal or exceed the success of the current books
To reduce intercontinental communication across timezones, we will assign different parts of the book to each team. Each team will write in its native language to assure the highest quality of writing. When one team finishes a part (or a preliminary version of it), it will be sent to the other teams for verification and translation.
The verification process will compare the story with the requirements. Which requirements? How do you make "exceed a previous success" SMART, if you cannot sell the book before finishing it? This is a job for marketing. They have to verify if the book will sell good enough.
We were talking about assigning parts to different teams. Which parts? If we divide the book into chapters, we can assign a chapter to a team. Which chapters will we have if we don't have the story yet? But the story is to-be-developed by the team. To divide into chapters we need the overall plot, in abstract form. But, who is responsible for the overall plot?
Okay, division into chapters seems to be a bad idea. Let's divide into characters (persons in the story), and assign different characters to each team. May be the cultural background of each team may even be an advantage for writing and for the overall result. But how can you write a story of characters in separate teams, if there is not plot that connects those characters? Characters need to interact in the story. Furthermore, different teams will be writing simultaneously on the same part of the story. So we need to integrate parallel updates, and we need to review the different story elements before integrating them.
Another issue is that during the story, we build up a history. If for example two characters meet in the beginning of the book and become friends, they cannot be complete strangers and enemies lateron.
Sigh! We need a story architecture that defines the development of characters and the development of time (history) across the whole book. It also defines the characteristics of the characters, the landscapes they visit, timing constraints (e.g. you cannot be at two different places in the same timeframe).
And we need configuration management to assure that the correct versions are reviewed, verified and translated. Especially translation is a dangerous one: if you translate an old Spanish section to Russian, the English and Spanish teams will not discover it because they cannot read Russian, except the people who translate from Russian. But why would a Spanish translator be bothered with reading a section that was originally in Spanish? So, CM must assure correct version and status handling.
I can go on and on, but probably you have lost faith that a book written this way will (1) be ready before Christmas 2007 and (2) exceed the success of The Lord of Rings. Yet, many companies develop their (software) products in multisite projects, with teams in America (US, Canada), Europe (UK, France, Germany, the Netherlands), Asia (India, China, Singapore), and without proper configuration management - except branched version storage and multisite replication.
Is multisite book-writing a good idea? And multisite software development?
Continuous Integration is an widely accepted approach in software development. Martin Fowler describes has written een great article about it.
However, for a large project it does not work. I will explain why not. Suppose that we have a big project of 100 software developers. Each of them is an excellent worker, and only 1% of their deliveries causes the daily build to fail. With 200 working days per year, a developer causes only 2 daily builds per year to fail. Can you beat that?
Now although this is an excellent achievement, with 100 developers this will cause about 200 daily builds to fail, which is... every day the build fails!
So every day, the build manager has to figure out what is wrong. And since each of the developers are excellent workers, the finding the true cause of the build failure is hardly ever a trivial challenge. So with the help of the developers that have delivered, it might well take the whole day before the problem is fixed. Then, the build has to be distributed across all development teams, all development sites and incorporated in the workspaces of the developers. There will be little time left for integrating the new build with the developer's changes in his private workspace. So... continuous integration leads to continuous build failures!
In addition to the continuous build failures, when a developer must stay in sync with the latest build he has to integrate the latest build with his own changes every day. If this takes 1 hour, he will spend 4 hours per week on synchronization, which is at least 10%, day-in-day-out. They probably will get frustrated about going through their old changes every day, just because "build management provides a new build". So... continuous integration lead to frustration!
But when developers get frustrated, they will get sloppy at their work. And when they are sloppy, they are more likely to deliver changes that cause build failures. Let's assume it increases from once per half year to once per quarter. That is 400 build failures per year, or 2 build failures per day. So build managers are confronted with more problems causing the build to fail, which probably requires more work to uncover and fix. One day will not be enough. So... continuous integration stops to be continuous!
What can we do to overcome this problem?
I think we can go in two directions:
- Reduce the scope of the continuous integration to smaller teams
- Replace continuous integration by iterative integration
You can reduce the scope of the integration by dividing the large project into smaller teams, for example sub-system teams. Each team then performs continuous integration in a local scope, independent of the build of other teams. Only when it succeeds, the changes are promoted to a higher level of integration, for example system integration. Then when the system integration fails you have a choice. Either get the "guilty" developers involved in fixing the system integration, thus extracting them from their sub-system team temporarily, or force the "guilty" sub-system team to incorporate the changes of the (failing) system integration build in their sub-system build.
The other approach is to go to a weekly integration, where all developers deliver to. It will be a "slow" variant of the continuous integration approach. With less frequent integration, the amount of integration problems will increase. But instead of some hours, you have some days to fix them. And developers lose less time on synchronization with the build since they only have to do it once per week. I have seen this work, but only for a relatively quick build. For a build that takes several hours to run, the cycles for fixing the build are too long and even weekly integration will get stuck.
So my conclusion is that for large projects, it is better to divide the project into smaller teams, each with their own continuous build. It requires a more thorough (layered) integration strategy, and propagation of changes will be slower than with continuous integration. Continuous integration does not work for large projects.
Updated 14/4/2007:Added labels.






