A Django site.
July 3, 2008
» The Simplest Thing That Could Possibly Work

One of the principles of Agile, mostly related to design and architecture, is “The Simplest Thing That Could Possibly Work.” This is sometimes taken as a license for cowboy coding. But that is not the intention. A better way to express it would probably be something like “The Simplest Solution That Could Possibly Satisfy Your Requirements.” For instance, if you have a requirement to create the back end for a web site like amazon.com, then while a perl/cgi solution on a single core machine could possibly “work,” it doesn’t work from the point of view of high availability, fast response time, or reliability.

From Oversimplification To Rube Goldberg
On the one hand, there is a wide spectrum of complexity of construction ranging from doing nothing to Rube Goldberg level complexity. On the other hand, there is the set of solutions that work, meaning that they meet all of the requirements. TSTTCPW refers to the solution which works and which is lowest in complexity.

Part of being simple means simple to read, maintain, use, design, understand, and implement balanced against the time it takes to get the job done. Spending too much time to create the ultimate in simplicity starts to get you into a different kind of trouble.

As somebody that struggles to apply this principle on a regular basis, I was happy to stumble upon an example of this principle which can be captured in a picture and kept in mind as I am working on a new design. Perhaps you will find it to be useful food for thought as well.

A Bridge Too Far
There’s a construction project that you’ve probably heard of which is affectionately called the “Big Dig.” Part of this project was the construction of the “Leonard P. Zakim Bunker Hill Bridge” aka the “Zakim bridge.” This part suspension bridge, part cantilever bridge is an enormous one of a kind architectural marvel. It supports five lanes of traffic in either direction for a total of ten lanes. It was built at a cost of approximately $11M per lane.

Running parallel to the Zakim (to the left in the photo) is the Leverett Circle Connector Bridge. It serves a total of four lanes of traffic. It was built at a cost of approximately $5M per lane.

Part of the requirements for the Zakim bridge were clearly “create a stunning new Boston landmark.” On the other hand, the Leverett Bridge is a very simple but also very strong bridge. It could have been made even more simply, but not without a safety risk and/or a shorter lifespan. In other words, it is “The Simplest Thing That Could Possibly Work.”

Next: The Faberge Egg

TOC: Zero to Hyper Agile in 90 Days or Less

[Note: revised 7/3/08 to reflect comments on reddit. Clearly the original post didn't work. :) ]

» Reinvest in Your Engine by Improving The Work Environment

There are really only five ways to increase the profitability of a business based on software development: reduce costs via outsourcing, reduce headcount, reduce other expenses, increase productivity or increase revenues. Reducing expenses can only go so far. The most expensive part of software development is the people. Thus, one of the most successful ways to increase profits is to increase the productivity of the software development team.

The Agile Workplace
At Litle & Co., developers like the fact that Agile provides the additional challenge of solving business problems instead of just technical problems which requires thinking at a higher level. Developers at Litle report that they have a higher level of job satisfaction than in previous companies that were not using Agile because they see the results of their software development efforts installed into production every month. Also, they like the fact that there is much less work which amounts to “implement this spec.”

Your development infrastructure is really no different than the general company infrastructure which includes your cube or office, the carpet, the artwork on the walls, the company cafeteria, your phone, your computer, and the company location. These are all part of your work environment. If you have a computer that is 5 years old, your work environment is not as good as if you have a computer that is only 2 years old. If you are writing in C rather than C++, C# or Java, your work environment is sub-optimal.

The closer that your development infrastructure is to the ideal environment for your circumstances, the more productive your team will be. This principal extends to all aspects of the development environment, from development language, to build system, to build farm, to issue tracking system, to the process that you follow.

Next: Your Development Process is Part of Your Work Environment

» Your Development Process is Part of Your Work Environment

Your development process (regardless of how it is implemented), is also part of your work environment. If as a result of your development process you regularly end up redoing work because problems weren’t discovered until just before the release, or projects get cancelled or shelved, then this is also likely to reduce productivity and job satisfaction. As this process improves, so does your work environment. The smoother it operates, the more pleasant your working environment will be.

There are many problems which you may think of as being unrelated to your development process. For instance, broken builds. Broken builds are simply the result of somebody making an idiotic mistake, right? Perhaps that’s true some of the time, but most of the time it is due to the complexity of integrating many changes made by many people for software that has many interdependencies.

To be sure, a “perfect” process does not guarantee happiness, success, or the absence of problems. You still have to debug complicated problems, port to new platforms, deal with unforeseen circumstances, etc. However, the state of your process impacts the efficiency with which your effort is applied.

If your process is perfect and completely frictionless, then 100% of your effort will be applied to the work that creates value. If it is rife with problems, it may mean that only 50% (or less!) of your effort will be applied to work that creates value. If there are problems with the process, then you are already expending effort which is essentially wasted. You would be better off investing some of that effort in removing the problems permanently instead of losing it to friction on a regular basis.

Next:
Quick Summary of The Benefits of Adopting Agile

June 24, 2008
» Sustainable Pace: Supply vs Demand

In a traditional project, the demand for resources from the four major aspects of software development see-saws dramatically over the course of a project. These aspects are project management and planning, architecture and design, development, and QA. You need resources on hand to serve the peak demand level, but during periods of low demand those resources will either be idle or used for activities which have a lower ROI.

A common circumstance is that there are insufficient resources on hand for the peak demand level and so people end up working in “crunch mode.” During crunch time, people tend to make more mistakes. Agile levels demand out over time and removes this see-saw effect which simplifies resource planning and removes the need for crunch time.


In the figure above, the straight green lines represent the resource load in an Agile project and the zig-zagging purple lines represent the resource load in a traditional project.

With traditional development, delays during development compress most of the testing to the end of the process which then requires taking shortcuts due to schedule pressure. I used to think that one way of compensating for insufficient QA resources was to delay the release until QA finishes. On the surface it seems to make sense. But only if the folks writing code sit on their hands while QA does their work. Ok, so you have multiple projects and the developers work on another project. But then they finish that. Now QA starts on the second project and the developers move to the third. The problem is still there.

On the other hand, as a result of the need for increased QA resources during testing, you may have two other problems. If you have enough QA resources to handle the pressure of the endgame, you may have too many QA resources during the rest of your development cycle. Alternatively, you may bring on additional QA resources on a short-term basis to compensate. Both of these options are obviously undesirable.

There’s a natural balance between the amount of effort required for developing something and the amount of effort required to QA it. No matter what you do, if you have the wrong ratio of development resources to QA resources, it will cause problems. If development creates more than QA can absorb, you will create a backlog of QA work that will always grow.

There are six options for dealing with a QA backlog: do less QA than needed and thus shift the burden of finding problems that you could have found onto your customers, increase your QA capacity, decrease your development capacity, have development idle, have development help with QA or allow the backlog to grow. The larger your testing backlog, the longer it will take to ship it and the greater your opportunity cost.

The imbalance may be in either direction. After you transition to Agile development, you may find that you have more QA resources than are needed. In that case, you have the option of having QA take on some of the work currently done by developers. See The Role of QA in an Agile Project for more on that topic.

This natural balance holds between all four aspects of software development. Depending on your organization, there may be an imbalance between supply and demand at any stage in the pipeline. Wherever there is an imbalance you have the same six options as described above. For example, you may end up with project plans that are never used, developers idle because the design isn’t ready yet, etc. To the extent that some of the resources are actually the same people you can use that fact to manage this problem.

When using short iterations, resource imbalances are easier to detect and correct. Having balanced resources means that all development activities are done consistently and on a regular basis and there is no need to take the shortcuts that are typical of traditional development.

Next:
The Usability of Short Iterations

TOC: Zero to Hyper Agile in 90 Days or Less

» Do You Need a Standup Meeting?

Stand-up meetings are a great way to reduce delays in communicating important information. Another benefit of stand up meetings is the elimination of time-wasting status and progress meetings.

Stand up meetings are most closely associated with Scrum and are called “Daily Scrum Meetings” within Scrum, but have become populare independent of any particular methodology which is a good indicator of suitability for mainstream use.

A stand-up meeting is simple to implement. There are just a handful of guidelines:

  • Limit the time to fifteen minutes.
  • Pick a regular time for the team to meet, preferably in the morning.
  • Start on-time regardless of who is absent.
  • Each person answers these three questions:
  • What have you accomplished since the last meeting?
  • What are you working on next?
  • What impediments do you have?
  • All discussion and problem solving is deferred until the end of the stand-up meeting.
  • Follow-up as needed in smaller groups
Although it is called a stand-up meeting and standing is encouraged, the time limit is the most important part and standing is optional.

The point of a stand-up meeting is to improve communication and to discover and resolve impediments, not to have a meeting just for the sake of having a meeting. If the team feels that other practices make the stand-up meeting redundant, then by all means reduce their frequency or even discontinue use until such time as it appears to be necessary again.

To help make this decision, let’s take a look at the expense side of stand-up meetings. First, people have to get to it. And then they have to get back to their computers. Scrum discusses how to minimize this time, but practically speaking, there is more overhead than just the ideal 15 minute meeting. If you are at a larger company, somebody has to book the room and let people know where it is. Let’s call the cost of the meeting 20 minutes per person. If you have 12 people in a stand-up meeting, that’s 4 person hours per day. That’s the equivalent of half of a person. Those meetings had really better be worth it!

Now let’s take a hard look at the stand-up meeting itself. One of the basic ideas of Agile (and Lean) is continual self improvement. If the value of the meeting exceeds the cost, then there’s no problem with the meetings, especially if they are eliminating other meetings. If the stand-up meeting is the only remaining meeting, that seems like a good thing. However, continuous improvement means we’re never satisfied. Now that you are down to just the one meeting, you should still ask the question: “is it providing more value than the cost? Is there a better way?”

What is the purpose of a stand-up meeting? To quickly find out if people are getting their assigned work done and if not why not. If it is more efficient to do that via e-mail, IM, an issue tracking system, or other means, then use those means. Someone might say “but seeing folks face to face is worthwhile.” Ok, so why not just do that then? Go out to lunch together or something like that.

Or perhaps the stand-up meeting is needed because otherwise folks wouldn’t complete their work, or people wouldn’t speak up when they run into an impediment. In that case the stand-up meeting isn’t a solution at all, it is a crutch. For instance, perhaps somebody isn’t completing their work because they don’t like it, but the constant peer pressure of the standup meeting is goading them into completing their work anyway. So then the real problem is lack of job satisfaction or low moral or something along those lines. Until you fix that problem, the stand-up meeting is just acting as a band-aid.

The real measure of project status and health is having an increment of shippable value at the end of every iteration. A standup will only expose problems that are on people’s minds, but the forcing function of the increment of shippable value is where you will get the true picture of how things are going. A one month iteration interval is good, but if you can get it down to 2 weeks or even 1 week, that may do far more to expose real problems than a standup will.

Next: How Agile Helped Litle & Co. Get to #1 on the Inc. 500

June 16, 2008
» Preparing for the Transition to Agile

Once, when I was just starting to snowboard, I was at Sugarloaf for the weekend and they had very little cover and very few trails open. But then Saturday night, they got 33” of powder. A friend and I came to a trail that was closed. It looked like a great trail; endless powder with no tracks. The problem was that it had been rocks and grass the day before and there was no base underneath, so it was just the same as riding on rocks and grass. It was not a pleasant experience. Adopting Agile without understanding it and without creating a proper ecosystem for it is destined for a similar fate.

Adopting Agile development requires breaking down mental barriers and building up new skill sets. There is nothing particularly hard about actually doing any of the Agile practices, but many of them are counterintuitive and do take a bit of practice to get used to. That said, don’t underestimate the amount of effort required. The effort required is at least on the order of taking a team which is very used to writing in C++ and moving to Java. There’s nothing particularly difficult about such a transition, but there are many subtleties which must be learned and it takes a while to build up the same base of experience.

Self Education
Before getting too far along, make sure that you have done your homework. Read other books on Agile, find other folks in your organization that have done Agile development before. Go to conferences, join the local Agile user group, become a certified Scrum Master, do whatever you do to find people that you can lean on when you need it.

Scope
Determine the best scope of the adoption. As with most things, it is best to think big, but start small. Is there a small project with no more than 12 people that is amenable to piloting something new? There are two advantages in starting small: minimizing disruption and leveraging what the pilot group learns about doing Agile in your environment when transitioning other groups.

Scouting
Agile development has certain perceptions related to it. One of the most prevalent perceptions is that it is “for small groups.” That was certainly my perception when I first started hearing about it. Another perception is that small iterations aren’t a good thing because customers don’t want frequent releases, there’s more overhead involved, the quality will be lower that way, and it makes marketing’s life more difficult.

If you just advocate Agile without knowing the landscape, you run the risk of alienating the people whose support you need in order to go Agile. Find out how receptive your organization is to going Agile. Think about who is in a position to help or hinder its adoption. Those are the key stakeholders. You will need to find out where they stand, what they like about the idea of going Agile, and what their objections are. This information will come in handy later in the adoption process.

Prepare Your Organization
Once you have a basic lay of the land, see what you can do to raise people’s awareness and understanding of the advantages and potential pitfalls of Agile. Do a presentation for folks that are interested, invite in somebody from the Agile community to do a presentation or workshop. Recommend books and websites that you found helpful.

Transition Development and QA Together
The most important component of reducing the rework period that comes from long iterations is improving your testing process. For many if not most organizations, this is the hardest goal to achieve in practice.

If you don’t have any automation at all, it is a good bet that there is an ingrained belief that automated testing is either a bad idea, doesn’t work as well as manual testing, is too expensive, or that the tools are too expensive. As a result, it may be that there are no QA automation experts in the building and possibly nobody with scripting skills in the QA group. The best course of action in this case is to concentrate on introducing the idea of QA automation.

If it is clear that there is a bias against automated testing that is too strong to overcome any time soon, another tactic is to have the development organization champion automation with an eye towards handing it over to QA once the idea catches on. A good place to start is with unit tests. It should be clear from the start that your goal is to have QA own test automation. Developers will write good tests, but they are too optimistic by nature. Developers start from the assumption that “it will work.” QA people start from the assumption that “it doesn’t work and I’ll prove it to you.” Pessimism is a good trait for a person creating tests.

Keep Your Healthy Skepticism
Think carefully about the value of each practice that you plan to adopt and make absolutely sure that it is appropriate for your exact circumstances before you adopt it. You shouldn’t be adopting practices simply for the sake of saying that you are adopting Agile practices. Every practice you adopt should be because now that you know about it you simply can’t imagine getting by without it.

Don’t Throw Out the Baby With the Bath Water

I’ve seen many first-time Agile projects fail because they threw out everything they already knew about developing software. Most of the individual steps of developing software with an Agile methodology are the same as traditional software development. You still have to talk to customers, decide what you are going to do, write code, write tests, do testing, etc. Agile development is simply a different framework for those steps. You have a business to run, and you don’t really need to introduce large and sudden risk factors. Before you decide to chuck everything that you know and start from scratch, spend some time developing your knowledge and understanding of Agile. Look closely at your existing practices and see which ones will fit well within Agile, and which ones may cause problems. Create a game plan and start out gradually.

Next: Agile Adoption Stage 2: Establishing a Natural Rhythm

May 16, 2008
» Writing Unit Tests as Programmer's Warm-Up


"Flow" is a the most productive state of programmer's mind. You can induce the flow by starting your day with writing unit tests.

All programmers know this state of "flow", when your productivity is at its highest. The design decisions are coming naturally, the written code is perfect and a week's work can be done in a couple of hours. And it feels really good. I think it's just the sense of creative fulfillment. Some call it creative Zen.

The problem is, the flow does not come on demand. It usually requires some kind of warming-up. What I have noticed is that it is much easier to get into the flow after you are doing something that you like, something rewarding but not laborious. Yes, meetings are ultimate killers of the flow. Want to be productive, don't go to meetings. After trying many activities, I found one that helps to get into the flow. It works great for me and may work for you, especially if you are into eXtreme Programming a.k.a XP. This activity is writing unit tests.

It works the following way: Start your day with finding classes or methods that not covered with tests. There are always that are not covered. Write tests for some of them. As you are adding tests, you may be noticing small deficiencies in the code. Refactor them. You may even find bugs while writing tests.

Even if you didn't not feel like programming at all in the beginning, in 30 minutes you will notice that interest appears. The test coverage improves, refactoring is already helping to improve design. And you have just found a couple of subtle bugs (you will for sure) thanks to the new tests. All this feels really good and you are already coding. You are in the flow.

This "programmer's warm-up" helps to get into flow even when it seems that creativity has gone forever. Try it and quite possible it works for you. And don't forget the nice side effect in form of increased test coverage.

Helpful Resources on Unit Test and Test Coverage

Some of these tools are not free, but, believe me, they worth every cent.

April 26, 2008
» Apply Elegant Architecture to Your Dev Team: Part II

Previous: "Apply Elegant Architecture to Your Dev Team: Part I"

When you are creating new software, what do you think about? Don’t you think about how it will scale to meet your needs as they grow with high availability? Even if you don’t achieve that on the first try, you are still thinking about it and striving to achieve it. You know that to do it, you will need to make the right technology and architectural choices. Over time, you may need to change some of those choices to keep pace with competitors. Even if your current needs are modest, software development architecture has evolved technologies and patterns to allow software to scale from a single user to hundreds or even thousands of users on multiple platforms at multiple locations with 99.999% uptime.

If you consider your development organization in the same way, how would you apply the same thinking? What technologies are you using? What is the architecture of your organization? Will it scale from its current size to double its size? Will it scale seamlessly to include new teams in new locations? What will happen if you acquire a company? When creating software, you want to design it so that it is flexible and adapts to new circumstances. The same should be true for your development organization.

Another way to look at it is how a particular process would fare if each of the resources available were available at seemingly random times for random periods of time. This exactly describes the world of Open Source Software (OSS). At any given time you have no idea who will be contributing on what or how valuable the contribution will be. You don’t know where the contribution will come from and in many cases you don’t even know who the contributor actually is. This is an extreme example of a development situation. Even though your situation is probably not as extreme as this, by using techniques from the OSS world you will be better positioned to handle unexpected events when they inevitably occur.

Problems are an inevitable and regular part of life. Examples include illness, job change, human error, flight delays, system failure, unanticipated market changes, and natural disasters. The ability to cope with these problems is one measure of the robustness of the organization.

Part of robustness is that as things scale up or down, the impact is minimal. Tools and processes should be selected and designed to work well together whether they are used by 1 person or 10,000 . At all times, it should always feel like each individual is part of a team which is no bigger than 12 people. If practices are not scaleable from 1 to 10,000 then people can develop bad habits that resist scaling. If you develop habits that exist in a scaleable framework, then it is more likely that scaling can and will happen as and when needed.

Next: How Agile Solves Problems

» Advanced Multi-Stage Continuous Integration

In parts 1 through 3 of this series, I described how Multi-Stage Continuous Integration can be used at a team and multiple-team level. In this post, I will describe how Multi-Stage CI can also encompass other development stages.

The Development Hierarchy
So far, we’ve split the mainline up into a hierarchy of branches. Each developer has their own workspace and is part of a team, each team has its own branch, and each team branch is based on and merges back to an integration branch, which may be the mainline. Sometimes work is organized at a feature level instead of at a team level. For simplicity, I’ll just refer to teams. Each part of the hierarchy corresponds to a stage of development. This hierarchy enables multiple parallel development pipelines where changes move from a low level of maturity to a higher level of maturity and the work in the parallel pipelines is continuously integrated.

We’ve covered three stages of development: individual developer coding, team integration, and full project integration. But there are many more stages in a typical project lifecycle, even when you are doing Agile development. Traditionally, the stages of the lifecycle occur in several different places including project plans, the SCM system, and the issue tracking system. This adds development stages such as: code reviewed, system tested, UAT passed, demoed, etc. Each of these stages can be added as a branch to your branch hierarchy.

When using Multi-Stage CI, unlike the typical use of branches, changes do not linger on a branch any longer than required to pass through the stage that branch is associated with. The difference between any given branch and its parent in the hierarchy cycles rapidly between almost nothing and nothing. The goal is to push changes through the hierarchy as rapidly as possible.


When working within a development hierarchy, the impact of any given change is limited in scope. If there are 4 members on a team, then only 3 other people are affected when a change is merged into the team branch. Because changes are made on the team branch only when the developer has things working in their own version, changes at the team level are made less frequently and have a lower probability of destabilizing the build. Once the team branch is considered stable, changes are then merged into an integration branch which consolidates the work for multiple teams. Again, changes are made less frequently and have a lower probability of destabilizing the build. Thus, the graph at the integration level shows fewer changes and the changes are smaller. As you proceed towards the root of the hierarchy, changes are less frequent and less likely to destabilize the build until finally you reach the root where you have code that is always shippable.

Getting Started
Getting to Multi-Stage CI takes time, but is well worth the investment. The first step is to implement Continuous Integration somewhere in your project. It really doesn’t matter where. For a recommendation on an excellent book on Continuous Integration, see the bibliography. The next step is to implement team or feature based CI. Once you have that working, consider automating the process. For instance, you can set things up such that once CI passes for a stage, it automatically merges the changes to the next level in the hierarchy. This keeps changes moving quickly and paves the way for easily adding additional development stages.

I’ve seen Multi-Stage Continuous Integration successfully implemented in many shops and every time the developers say something like: “I never realized how many problems were a result of doing mainline development until they disappeared.”

Next: It is Better to Find Customer Reported Problems As Soon As Possible

March 27, 2008
» Apply Elegant Architecture to Your Dev Team: Part I

There is another way to think of software development which allows you to leverage your skills as a developer and apply them in a new way. The process of developing software is, in effect, an algorithm implemented with various technologies. You can think of your team, the technologies you use, and your development methodology as a piece of software.

What do you do with software? You improve it. You add new functionality, you increase its performance and usability and you remove bugs. Also, software itself is a document: source code. It is a description of how to perform a set of tasks. You improve the software by changing the source code. Let’s call the combination of your team, tools, and techniques: “the process” and the source code for the process “the process document.”

Not only can you think of your development process as software, you can think of your whole development organization and the people in it as a combination of hardware and software with various communication links. I don’t recommend that you take this advice literally and think this way on a regular basis, or treat people as interchangeable cogs in the machine! However, by thinking about it this way you can leverage your technical design skills to think about how to organize and optimize your development organization and development process. You can leverage well-known design patterns. For instance, consider the communication aspect.

In the world of information transfer there is bandwidth, latency, and connectivity. The best environment for communication is high bandwidth with low latency that is always connected. The worst environment for communication is low bandwidth with high latency that is infrequently connected.

This gives a well-known context for discussing human interaction. On one end of the spectrum is self-communication. If you are responsible for two interdependent modules, then when you make a change to one, you instantly know you need to make a change to the other. You won’t misunderstand yourself or need to have a back and forth conversation to really understand. You just know. On the other end of the spectrum you have two people from different cultures revising a document via e-mail who live exactly half way around the world from each other. In between you have pair-programming, collocation, people working together but sitting on different floors of the same building, folks with a great deal of physical separation in the same time zone, a different time zone, etc.

Then there are different forms of information interchange such as video link, phone, e-mail, IM, wiki, document, etc. This way of thinking can guide your decisions about where to seat people, the value of having high bandwidth links, etc.

Next: Apply Elegant Architecture to Your Dev Team: Part II

March 26, 2008
» Is Your Dev Team Having Performance Problems? Try Niagra!

Have you ever thought “why do the companies I work at have so many problems with developing software?” Well, I can tell you from interacting with literally thousands of software development organizations that most development organizations have problems with reliably and predictably producing high quality software. What I hear over and over again is “if only we could hire the right people” or “if only people could be more disciplined.” The other thing I hear over and over again are various tweaks to traditional development that people believe will fix things but that “other people just don't get it.”

Well, there are only so many people out there to hire so unless we find some magical place to hire more of “the right people” from, we are going to have to figure out how to use the people we have. And I hate to break it to you but you just aren’t going to get more discipline out of people. Whatever discipline that exists today, that’s all you are going to get. So what’s left? Changes to the way we do things.

Let’s say there was some combination of techniques which would mean that most development groups could use traditional development to reliably and predictably produce high quality results. Let’s call it the “Niagra” method. Furthermore, you (or whoever wants to make this claim) get to define the Niagra method. It works every time. Just use Niagra and performance improves overnight (when used as directed).

If the Niagra method existed, it would have become widespread by now. It would have become the #1 prescribed solution to project performance problems. If it existed and produced the claimed results, people would start referring to it, and it would spread rapidly. That’s how C++, Java, and HTML became mainstream. People rapidly adopted them because they provided benefits that anybody could realize. In fact, there are many proposed Niagra methods, but none of them have become mainstream because they only work under special conditions.

This raises an obvious point. If traditional development is so bad and is such a big failure, then why has the software industry done so incredibly well? The software industry is an incredible success, that’s absolutely true. It is incredible how much of our daily lives runs on software and how much software has positively influenced our quality of life. The benefits of software, when it does finally ship and it does finally have a high enough level of quality are worth waiting for. Otherwise, of course, there would no longer be a software industry.

There have been incremental improvements to the software development process which have produced incremental improvements and have become widespread. Examples are: the use of software development tools such as source control and issue tracking, breaking down large tasks into small tasks, nightly builds, one-step build process, moving from procedural programming to object-oriented programming and many others. The important point here is that each of these improvements have become widely adopted and made things somewhat better, but still didn’t solve the root cause. The process is still unpredictable and unreliable.

This is akin to the O(n) approach to problem solving. If you have an algorithm which takes 100*n^2 operations, then getting that down to 50*n^2 is a good thing, but changing the algorithm to (for instance) be 200*n is much better. To date, changes to the software process have been of the first variety, shrinking the constant.

Perhaps Agile development is the answer. But isn’t Agile “yet another fad” which will blow over any day now? After all, we have had CMM, 9001, TQM, Critical Chain, and many other “next big thing” ideas in the past. What makes Agile any different? Previous ideas have all been centered around the idea that the framework underlying traditional development is fine, it is the implementation, discipline, people, or management of the people that is the problem. No, it is the underlying framework that is the root problem.

Certainly, people issues are important. Getting a team gelled and working well together with the right combination of skills is essential. But the success rates of these teams isn’t exactly stellar either! Yes, it is better, but they have similar problems with predictability and quality. A root problem remains. I do not claim that solving this problem means that suddenly teams that have insufficient skills and hate each other will start pumping out high quality software on a regular basis. But I do claim that Agile development has the ability to dramatically increase the potential of any team. And to truly succeed you have to do both: create a good team, and use Agile development.

Let's take a deeper look at the fundamental problems with traditional development.

Next: Traditional Development is a Chinese Finger Trap

March 25, 2008
» Is Your Software Development Organization Mainstream?

Have you ever wondered how your organization compares to other organizations when it comes to the process of developing software? Do you feel like there is an area that really needs improvement, but everybody else says "that's the way everybody does it?" While looking at what is involved in mainstream Agile adoption, I realized that it would be useful to create a list of the current mainstream software development practices for comparison purposes and I thought other folks might find it useful as well.

I define a practice as being mainstream if it is a practice that most people would agree that most people do. It may not be that they do that exact practice, they may do something equivalent or further along the spectrum of what constitutes a good practice, but they at least go to the level described.

The word "practice" is the key word here. This is about what people can be observed as actually practicing in their day to day real work. Things which are described by policy or documented as the correct way of doing something but avoided and worked around don't qualify. A mainstream practice is something that is in common usage that people do because they believe the value is worth the effort and not doing it is sort of like saying “electricity and hot and cold running water aren’t for me, I much prefer candles and fetching water from the well.”

While each of the following individual practices is considered a mainstream practice, that does not mean that it is mainstream to being doing all of the following practices. That is, in some of the areas that these practices cover, any particular development organization may be operating at a lower level than these practices, but for any particular practice, most development organizations operate at or above the level of these practices.

Overall

Basic flow – the common development activities are: talk to customers and/or do market analysis, document market needs via requirements, create a design, implement the design, write tests, execute the tests, do find/fix until ready to ship, ship.

Preparation

Requirements documented using Microsoft Word or Excel – while there are actually quite a few off-the-shelf requirements tools available, most people do not use them. The most common method for documenting requirements is via Word and Excel.

Recording of defects in Bugzilla – there are very few organizations which aren’t using at least Bugzilla to track bugs and Bugzilla is definitely the most popular choice. And recording of defects is pretty much as far as it goes. Enhancements, RFEs, requirements are tracked separately.

Basic initial project planning – this is the simple act of picking a bunch of work to be done, estimating it, dividing it up among the folks that are available to do the work and determining a target ship date. Some teams use MS-Excel, some teams use MS-Project. This does not mean sophisticated project planning. The extent of re-planning is a simple “are all tasks done yet” with the occasional feature creep and last minute axing of important functionality at the last minute.

Basic design – prior to coding features or defect fixes that will take more than two days to do or are highly impactful, a design document is created, circulated, discussed, and updated.

Development

All source code is stored in a source control system – this is not to be confused with “all work is done in a source control system.” It is actually surprisingly common to see the source control system used for archiving of work done rather than as a tool for facilitating the software development process.

Source control and issue tracking integration via a hand-entered bug number in the check-in comment, without enforcement – while there are certainly more sophisticated integrations, most people at least type in the bug number when checking in changes so that they can later find the changes associated with that bug.

Defects have a defined workflow that is defined in and enforced by the bug tracking system – even if it is as simple as Open, Assigned, Closed.

Mainline development – all developers on a project check-in to the mainline (aka the trunk).

Refactoring – while the term “refactoring” has come to mean just about any change to the software, the idea that code should be periodically cleaned up and simplified is pretty much universal at this point.

Nightly build – the entire product is built every night. This does not necessarily mean that tests are also run automatically or that there is an automatic report of the results, just that there is a nightly build.

Quality

Mostly manual testing – this one is the hardest to understand. The process of testing software is one of the most backwards parts of software development.

Unit tests – while the overall testing of software is still mostly manual, unit testing has caught on rapidly in a very short span of time.

Using defect find/fix rate trend to determine release candidacy – this is not actually a particularly good practice, but it is what most people do.

Separation of developer, test, and production environments – you might think this is so obvious it isn’t even worth mentioning, but this principal is violated enough that it is worth mentioning that it is actually a mainstream practice. I mention it to emphasize that if you aren’t doing this, you really need to.

Releasing

Basing official builds on well-defined configurations from source control – the official process for producing production builds includes the step of creating an official configuration in the SCM system and using that configuration to do the build.

Releasing only official builds – all builds that go into production are produced using the official process.

Major and minor releases - creating a major release every 6 to 12 months and minor and/or patch releases on a quarterly basis.

What do you think? Are these right on target or way off base? Do you think there are any missing areas? Do you think the mainstream practice level is higher or lower for some of these? Let me know what you think!

Next: "There is no Bug. It is not the Bug That Bends, it is Only Yourself."

February 14, 2008
» Presenting on Continuous Integration at The Silicon Valley Chapter of the ACCU



Today I have given a presentation on the topic of best practices for Continuous Integration and ways to avoid broken builds at the Silicon Valley Chapter of the ACCU.

The whole event went quite well. Walter Vannini, the lead of the chapter, turned out to be a very nice guy. The venue has been Symantec's office in Mountain View, with a large projector screen. Hooking up the laptop and a mike took a couple of minutes.

The crowd has been very technical so have been the questions and the discussion. Maybe it's a system, but the C/C++ bunch is usually more laid back and to the point.

There has also been a dude from the laptop for every child, we have had an interesting discussion on the humanitarian effect of free access to information. He has said they are about to start shipping.

All in all, it has been a nice gathering and it has been my pleasure presenting there.

January 27, 2008
» Scaling Agile and Stand-up Meetings

In the "ask away" section Matt poses the following question:

We're doing Scrum on a 10-12 person team (depending on what the DBAs are working on at the moment) and one of the more interesting problems I've been talking about with people is how to scale the process. I have a friend on a 200+ person project and they do Scrum-of-Scrums every day so one person from each "lower level" scrum team goes to another scrum meeting and so on up to the meeting with all the main project heads. This sometimes ends up taking 2 hours out of people's days if they have to go to a bunch of the scrum-of-scrums. What are your feelings on scaling Agile processes up to larger teams? Do you sacrifice somebody to be the "meeting person"? To me communication has always been the most important part of any Agile process, how do you keep that in large teams?

Scaling Agile
At a high level, it seems like your question is: “how do you scale Scrum?” In my experience, scaling is mostly independent of methodology. There are some Agile practices which can limit scalability, but luckily those practices are not required to get the primary benefits of Agile development. The practices which limit scaling are: relying on collocation and using 3x5 cards to the exclusion of other methods for storing and distributing project information. Other than that, what did the 200+ person do to scale prior to Scrum? Whatever it was, there is no reason that I can see to prevent them from scaling using those methods. Speaking specifically about Scrum, the Scrum-of-Scrums is the method recommended by Ken Schwaber for scaling project information flow.

Another area which is often a problem with scaling in general is the frequent integration required by the short iterations. For that you may consider Multi-Stage Continuous Integration.

Stand Up Meetings
As for the stand-up meetings, those are mandatory in Scrum. I don’t happen to agree with that requirement, but let’s hold that thought for a moment.

First, let’s take a look at the expense side of the Scrum meetings. First, people have to get to it. You have to wait until everybody is there. And then you have to get back to your computer. Scrum discusses how to minimize this time, but practically speaking, there is more overhead than just the ideal 15 minute meeting. If you are at a larger company, somebody has to book the room and let people know where it is. Let’s call the cost of the meeting 20 minutes per person. If you have 12 people in a Scrum, that’s 4 person hours per day. That’s the equivalent of half of a person. Those Scrum meetings had really better be worth it!

If they are worth it, then there is actually no problem and if somebody going to Scrum-of-Scrums has a personal dislike of meetings, even when they provide value, then perhaps somebody else should be doing the Scrum-of-Scrums.

Now let’s take a hard look at the stand-up meeting itself. One of the basic ideas of Agile (and Lean) is continual self improvement. If the value of the meeting exceeds the cost, then there’s no problem with the meetings, especially if they are eliminating other meetings. If the Scrum meeting is the only remaining status meeting, that seems like a good thing. However, continuous improvement means we’re never satisfied. Now that you are down to just the one meeting, you should still ask the question: “is it providing more value than the cost? Is there a better way?”

What is the purpose of the Scrum? To quickly find out if people are getting their assigned work done and if not why not. Isn’t it more efficient to do that via e-mail, IM, an issue tracking system, or other means? Someone might say “but the socialization aspect is worthwhile.” Ok, so why not separate the two? By all means let's have some social activities, but that's not a good reason for having a status meeting.

Or perhaps the Scrum is needed because otherwise folks wouldn’t complete their work, or people wouldn’t speak up when they run into an impediment. In that case the Scrum isn’t a solution at all. For instance, perhaps somebody isn’t completing their work because they don’t like it, but the constant peer pressure of the standup meeting is goading them into completing their work anyway. So then the real problem is lack of job satisfaction or low morale or something along those lines. Until you fix that problem, the standup is just acting as a band-aid for an underlying management problem. If you find that having standup meetings from time to time helps to expose management problems, then by all means have the meeting(s), find the problem(s), but don't have the meetings just for the sake of having them!

Short Iterations
One more thought on this topic: the real measure of project status and health is having an increment of shippable value at the end of every iteration. A standup will only expose problems that are on people’s minds, but the forcing function of the increment of shippable value is where you will get the true picture of how things are going. A one month iteration interval is good, but if you can get it down to 2 weeks or even 1 week, that will do far more to expose real problems than a standup will.

Speaking of short iterations, IMHO that’s really the number one thing which makes Agile scale better than traditional development. It is much easier to manage and coordinate large multi-team projects when the scope of everything is constantly focused on one month iterations (sprints). Of course, it may seem more difficult at times, but that is often due to the fact that problems are much more apparent and can’t “hide out” like they do in long iterations.

In summary, experiment and see what works for your particular situation, but always be on the lookout for opportunities for process improvement. Don’t do things by rote, demand that all activities provide real value.

Have a question on Agile or Software Development? Please visit the "ask away" section.

January 17, 2008
» Multi-Stage Continuous Integration Part III

In Part I and Part II I discussed the problems associated with Continuous Integration when combined with mainline development. In this part I’ll discuss a solution: Multi-Stage Continuous Integration.

Have Some Self Integrity

As an individual developer, you practice two simple best practices which you probably don’t even think about any more.


While you make frequent changes to your workspace which means its stability goes up and down rapidly, you only check-in your changes when you feel like you’ve done enough testing of those changes that you've sufficiently reduced the risk of impacting the stability of the mainline. Second, you only update your workspace when you are at a point that you don’t mind taking whatever the impact is from other people’s changes. These two simple practices act as buffers which protect other people from the chaos of your workspace while you prepare to check-in and protect you from problems that other people may have introduced during that same period. It also produces a variant version of the software for each developer which must all be integrated together. The longer you put off this integration, the more painful it will eventually be. Thus the need for Continuous Integration.

When you as an individual are working on a change, you are often changing several files and a change in one file often requires a corresponding change in another file. While it may seem like a bit of a trivial case, you can think of this process as self-integration. The reason that you work on those files on your own instead of having several people work on it is because the tightly coupled nature of the changes requires a single person.

It's All for One and One for All
The next level of coupling is at the team level. There are many reasons why a set of changes are tightly coupled, for instance there may be a large feature that can be worked on by more than one person. As a team works on a feature, each individual needs to integrate their changes with the changes made by the other people on their team. For the same reasons that an individual works in temporary isolation, it makes sense for teams to work in temporary isolation. When a team is in the process of integrating the work of its team members, it does not need to be disrupted by the changes from other teams and conversely, it would be better for the team not to disrupt other teams until they have integrated their own work. But just as is the case with the individual, there should be continuous integration at the team level, but then also between the team and the mainline.

Multi-Stage Continuous Integration
So, how can we take advantage of the fact that some changes are at an individual level and others are at a team level while still practicing Continuous Integration? By implementing Multi-Stage Continuous Integration of course! For now, I’ll just cover the basics, but in a future post I’ll go into more detail and cover advanced topics as well.

For Multi-Stage CI, each team gets its own branch. I know, you cringe at the thought of per-team branching and merging, but that’s probably because you are thinking of branches that contain long-lived changes. We’re not going to do that here.

There are two phases that the team goes through, and the idea is to go through each of them as rapidly as is practical. The first phase is the same as before. Each developer works on their own task. As they make changes, CI is done against that team’s branch. If it succeeds, great. If it does not succeed, then that developer (possibly with help from her teammates) fixes the branch. When there is a problem, only that team is affected, not the whole development effort.

On a frequent basis, the team will decide to go to the second phase: integration with the mainline. In this phase, the team does the same thing that an individual would do in the case of mainline development. The team’s branch must have all changes from the mainline merged in (the equivalent of a workspace update), there must be a successful build and all tests must pass. Keep in mind that integrating with the mainline will be easier than usual because only pre-integrated features will be in it, not features-in process. Then, the team’s changes are merged into the mainline which will trigger a mainline CI. If that passes, then the team goes back to the first phase where individual developers work on their own tasks. Otherwise, the team works on getting the mainline working again, just as though they were an individual working on mainline.


Multi-Stage CI allows for a high degree of integration to occur in parallel while vastly reducing the scope of integration problems. It takes individual best practices that we take for granted and applies them to the team level.

With Agile, you don’t have the luxury of doing big-bang integration at the end of your development cycle, but when you are scaling Agile beyond a single small collocated team, mainline CI doesn’t mix well with short iterations. Multi-Stage CI solves both problems.

Per-team CI is one form of Multi-Stage CI. In the next part of this series, I’ll introduce some more opportunities for Multi-Stage CI.

Next: Advanced Multi-Stage Continuous Integration

» Reducing the Risk of Producing a Hotfix

Let's say there is a problem reported in the field which requires a hotfix. Well, if you are doing traditional software development, you can't exactly ask the customer to wait until you finish your current release. And if you were to put out a fix for them using your main development process, that would probably take too long too. So, you have a hotfix development process. By definition, this is a development process which is not used for regular development and it is not used very often (so one would hope).

As a result, you have one process for regular development and one process for hotfixes because your regular development process takes just too darn long. That means that when you most need for things to go smoothly you are going to use the process which is the least practiced and probably also something that only a handful of folks know how to do or are allowed to do. What's wrong with this picture?

The solution is to get the path from deciding to make a change and being able to deliver that same change as short as reasonably possible. If the "overhead" from start to finish for a hotfix is 4 hours, then any development task should take you no more than the overhead plus however long it takes you to write the code and the associated tests. All development tasks should follow this same streamlined process, not by cutting out truly necessary steps, but by ruthlessly removing steps that add no real value and automating as many of the remaining steps as possible.

Once you have a sufficiently small overhead, then you really only need one development process which you use for both regular development and hotfixes! Since it is the same process, everybody is practiced in your hotfix process and everybody has experience with doing it. This will reduce risk and increase confidence in the results.

In practice, there will probably be at least two differences. It is unlikely that anyone needing a hotfix will want to take the risk associated with taking any changes other than the hotfix, no matter how ready for release you say those other changes are. So, you will need to develop the fix against the release that the customer is using. You will probably also deliver the fix using a slightly different path than the usual. However, the closer you get to just these two differences, the better off you will be.

There is a shortcut you can take on the way to getting to a single process. You can refactor your regular process such that you act like every development task is a hotfix, and then do the rest of the process in a second stage. For instance, if you normally do a subset of your full test cycle for hotfixes, then always do that subset first during regular development.

For some software projects, getting the full test cycle down to a short time frame will be impossible or impractical. In this case, refactoring to have the first stage be the same as the hotfix process is still recommended and keeps you focused on keeping the second stage as small as possible.

January 5, 2008
» Multi-Stage Continuous Integration Part I

I’m a big fan of continuous integration. I’ve used it and recommended it for many years. But it has a dark side as well. When it is combined with the practice of mainline development, especially for a large development effort, it can turn into “Continuous Noise.” In this case, you get notified every 10 minutes or so (depending on how much building and testing is going on) that the build and/or test suite is still failing.


In the diagram, the stability of the mainline is graphed over time. In this example, the graph starts when the product is released and the mainline is 100% stable. It builds without errors and all tests pass. As new development starts, developers make changes that break the build or cause tests to fail. Because they are all working against the mainline, it is likely that just as one developer fixes the problems they caused, another developer is checking in more changes that will create one or more new problems.

As the stability of mainline goes up and down, everybody that is working against mainline is affected. If the mainline doesn't build or there are tests that don't pass, then everybody has to wait until the problem is fixed.

It could be argued that the problem in this case is that the developer that broke the build should have updated their workspace, done a full build, and then run the full test suite after that. Basically, a full integration build and test. In practice this doesn’t work very well.

For any meaningful project, a full integration build and test will take time. Even if you somehow get that time down to 10 minutes, a lot can happen in 10 minutes, such as other people checking in their changes and invalidating your results (whether you realize it or not). It is more likely that this full cycle will take on the order of an hour or more. While you can certainly do something else during that hour, it is inconvenient to wait and necessitates task switching which is a well-known productivity killer.

Even though developer integration build and test has its problems, many shops have implemented it. While this practice is not ideal, it is an idea that is headed in the right direction.

Next: Multi Stage CI Part II .

» Multi-Stage Continuous Integration Part II

In Part I of this topic, I said that while Continuous Integration (CI) is a great practice, it can turn into “Continuous Noise” when combined with mainline development. Also, when the mainline is broken, everybody is affected until it is stabilized. I mentioned the idea of developer integration builds to solve this problem and pointed out that this can lead to task-switching and still doesn’t remove the problem of mainline instability.

In typical mainline development, in order for you to integrate your work with anybody else’s work, they first need to make their changes available by merging their changes with the mainline. Now, everybody is forced to integrate these changes. If the process of integrating your changes with another person or team’s changes requires multiple go-rounds, then it is much worse. Now, everybody else must continually integrate with two groups of changes as you and the other team go back and forth merging your changes into the mainline until they work together.

This creates a huge opportunity for duplication of effort. If you are working on module A which interacts with B and C, and B doesn’t currently work with C, then you are blocked until B and C are integrated. You might decide to take a look at it yourself, not knowing that somebody else is already working on it. Sure, you might be aware of who is responsible for B and C, but perhaps they are in a meeting right now and you have no good way of finding out. In any case, you are blocked. It may be that you knew ahead of time that there was a problem with the mainline. In that case, you just don’t take updates and concentrate instead on your own work. In effect, your ability to easily take changes from the mainline is “down” until the mainline is stable again.

A typical coping mechanism in this case is to serialize integration and ask everybody who is not involved in the current integration effort to hold off on merging in their changes until the current integration work has finished. This turns parallel development into serial development and kills productivity.

Next: Multi-Stage CI Part III

December 29, 2007
» The Role of Defect Management in Agile Development

There are some who recommend against using a defect tracking system. Instead, it is recommended that when a bug is found, it is fixed immediately. While that is certainly one way of preventing an ever growing inventory of defects, the tracking of an inventory of defects is one of the smallest benefits of a defect tracking system. Overall, a defect tracking system serves as a facilitator. It simplifies the collection of defect reports from all sources. It isn’t just the developers responsible for fixing the defects that find problems. Customers, developers working on dependent systems, and testers also find defects. Even if you have a policy of fixing defects as soon as they are found, it isn’t always logistically possible to do so. For instance, if you are currently working on fixing a defect and in the process of doing so you find another one, you don’t want to lose track of it. Thus, a defect tracking system coordinates the collection of defect reports in a standard way and collects them in a well known location, insuring that important information about the functioning of your system is not lost.

A defect tracking system also manages the progress of work through the development life cycle from reporting, to triaging, to assignment, to test development, to completion, to test, to integration, to delivery. It simplifies the answering of customer questions such as “what is fixed in this release” and “what release will the fix appear in.” A defect tracking system also allows for the collection of metrics which aids in the spotting of trends. I have heard from multiple sources that metrics collected from a defect tracking system are worthless because developers will just game the system. That may be true in an unhealthy environment where the metrics are tied to compensation. However, in an environment where developers are actively participating in the improvement of the process, they will want this information in order to help to find and fix problems, including the root cause of individual problems.

November 11, 2007