A Django site.
August 6, 2008
» Four Rules for Simple Codelines

Some of you may be aware of Kent Beck's Four Rules of Simple Code that state simple code:

  1. Correctly runs (and passes) all the tests
  2. Contains no duplication (OnceAndOnlyOnce and The DRY Principle)
  3. Clearly expresses all the ideas/intentions we needed to express (reveals all intent and intends all it reveals)
  4. Minimizes the number of classes and methods (has no superfluous parts)
(I've seen some boil this down into some of the same rules for writing clear prose: correct, consistent, clear, and concise.)

Lately I've been noticing some parallels to the above and rules for what I would call "simple codelines" and I think there may be a similar way of expressing them...

Simple codelines:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

To elaborate further...

Correctly build, run (and pass) all the tests

This is of course the most obvious and basic of necessities for any codeline. If the codeline (or the "build") is broken, then integration is basically blocked, and starting new work/changes for the codeline is hindered.

Contains no duplicate work/products

The same work and work-products should be done OnceAndOnlyOnce! Sometimes effort is spent more than once to introduce the same change/functionality. This is sometimes because of miscoordination, or simply lack of realization that what two different developers were working on required each of them doing some of the same things (and perhaps should have been accomplished in smaller chunks).

Other times, rather than modify or refactor a common file, some will simply copy-and-paste the contents of one or more files (or directories/folders) because they don't want to have to worry about reconciling what would otherwise be merges of concurrent changes to the common files.

This is akin to neglecting to refactor at the "physical" level (of files and folders) as opposed to the "logical" level of classes and methods. It adds more complexity and (over time) inconsistency to the set of artifacts and versions that make up the codeline, and also eventually adds to the time it takes to merge, build, and test any integrated changes.

If content is being added to the codeline, we want that content to have to be added only once, without any duplicate or redundant human effort.

Transparently contains all the changes we needed to make (and none of the ones we didn't)

The above is sometimes the cause of much undesirable additional effort that is imposed for the sake of attaining traceability and ensuring process compliance/enforcement. Here, I mean to focus on the ends rather than the means, and I say transparency rather than traceability for that very reason.

If people are working in a task-based and test-driven manner, it should be simple to report what changes have been made since a previous commit and that only intended tasks were worked-on and integrated.

If a codeline is truly simple, then it should be very simple and easy to reveal all the changes that went into it without adding a lot of overhead and constraints to development. It should be easy to tell which changes/tasks have been integrated and what functionality and tests they correspond to. One very simple and basic means of tying checkins (or "commits") to backlog-tasks and their tests can be found here; others are mentioned in this article.

Minimizes the number and length of sub-branches and unsynchronized work/changes

Branching can be a boon when used properly and miserly. It can also add a heck of a lot of complexity and redundancy for maintaining two or more evolving variants of the project. The additional effort to track and merge and build many of the same fixes and enhancements in multiple configurations can be staggering.

Sometimes such branches are useful or even necessary (and can help with what Lean calls nested synchronization and harmonic cadences). But they should be as few and as short-lived as possible, preferably living no longer than the time it takes to complete a fine-grained task or to integrate several fine-grained tasks.

Even when there are no sub-codelines of a branch, there can still be un-integrated (unsynchronized) work-in-progress in the form of long-lived or large-grained tasks with changes that have not yet been checked-in or synced-up with the codeline. Keeping tasks short-lived and fine-grained (e.g., on the order of minutes & hours instead of hours & days) helps ensure the codeline is continuously integrated and synchronized with all the work that is taking place.

Another (possibly less obvious form) of unsynchronized work is when there is a discrepancy between the latest version of code checked-in to the codeline, and the latest version of code that constitutes the "last good build." Developer's lives are "simpler" when the latest version of the codeline (the "tip") is the version they need to use to base new work off of, and to update their existing workspace (a.k.a. "sandbox").

When the latest "good" version of the codeline is not the same (less recent) than the latest version, it can be less obvious to developers which version to use and become less likely that they use/select it correctly. Some use "floating tags" or "floating labels" for this purpose where they "move" the LAST_GOOD_BUILD tag from its previous set of versions to the current set of versions for a newly passed/promoted build. Sometimes the developers always use this "tag" and never use the "tip" (except when they have to merge their changes to the codeline of course).

Even with floating tags however, it is still simpler and more desirable when the last good version IS the latest version. Even if the latest version is known to be "broken", the lag between "latest" and "last good" version of a codeline can be a source of waste and complexity in the effort required to build, verify and promote a version to be "good" (and can introduce more complexity when having to merge to "latest" if your work has only been synchronized with "last good").

Plus, this lag-time often leads many a development shop to separate merging (and integration & test) responsibilities between development and so called integrators/build-meisters, where the best developers can attempt is to sync-up their work with the "last good build" and then "submit" that work to a manually initiated build rather than being directly responsible for ensuring the task is "done done" by being fully integrated and passing all its tests.

Such separation often leads to territorial disputes between roles and build/merge responsibilities. This in turn often leads to adversarial (rather than cooperative and collaborative) relationships and isolated, compartmentalized (rather than shared) knowledge for the execution and success of those responsibilities.

So there we have it! Four rules of simple codelines.

Simple Codelines should:
  1. Correctly build, run (and pass) all the tests
  2. Contain no duplicate work/work-products
  3. Transparently contain all the changes we needed to make (and none of the ones we didn't)
  4. Minimize the number and length of sub-branches and unsynchronized work/changes

Sometimes there are legitimate reasons why some of the rules need to be bent, and there are important SCM patterns to know about in order to do it successfully. But any time you do that, it makes your codeline less simple. So you want those scenarios to be few and far between, and to keep striving for the goal of simplicity. (Other SCM patterns, such as Mainline, can help you refactor your codelines/branches to be more simple.)

May 29, 2007
» Five R's of Agile SCM Baselines

Almost a year ago I posted an entry on the "5 C's of Agile SCM Codelines". This time I'm posting on the 5 R's of Agile SCM Baselines. I think these are as follows (note that most of these are not unique to Agile/Lean):

  • Repeatable -- The steps to create the corresponding Build (Configuration) from its sources should be repeatable: for any baselined configuration, I should be able to build it the same way, over and over again.

  • Reproducible -- The corresponding Build (Configuration) should be reproducible: for any baselined configuration, I should be able to reproduce it whenever desired.

  • Reportable -- The corresponding Build (Configuration) should be reportable: for any baselined configuration, I should be able to report all needed details about its content: what files & versions are in it, which changes/requests are in it, who made which change to what (& when), how it was built and with what tools & options, etc.

  • Releasable -- The corresponding Build (Configuration) should be releasable: if I have "baselined" it, then almost by definition, it means it should be of 'releasable quality' to the next downstream consumer. (This implies it should be correct + consistent + complete to the extent agreed upon with the its stakeholders.)

  • Repairable -- The corresponding Build (Configuration) should be readily repairable: if, for any reason, it is discovered to have some kind of problem, then I must be able to readily repair it either by retracting it and replacing it with it's predecessor, and/or by removing/repairing the offending content and releasing it as a new (corrected) baseline.

I use the term "reportable" instead of "traceable" here for two reasons: 1) 'traceable' doesn't begin with 'R', and 2) 'traceable' brings to mind many negative associations with manual tracing, rather than simply providing the necessary transparency and ability to trace (without necessarily implying doing all that tracing, much less doing it all manually).

Note also that "repairable" may not imply that the cost of repair is low. Ideally, the repair can be done as quickly as possible by the development organization, but getting it to the consumer(s) may be both costly and time-consuming. So, rather than "low cost", being "repairable" speaks more to maintainability, and the ability to quickly understand the system and what must be done to repair it. We want to repair it with minimal interruption of flow, and with a minimum amount of overhead.

Why is any of that particularly "agile"? Most of it isn't, but the 'take' on reportability certainly is, and the notion of "releasable" may seem agile to those who feel the codeline should (ideally) be in a readily releasable state. CMers would say that "releasability" was always part of what a baseline requires (and they'd be right).

What do you think? Did I miss any other important R's? Or should something else be used instead? Would an additional R-word or two not listed above help differentiate between "Agile" CM versus more traditional CM?

Here are all the other R-words I considered:
    Recoverable Reliable Reversible Retractable Retainable Realizable Relocatable Remediable Repealable Replicable Revocable Relapsable Rebuildable Recapturable Reconfigurable Reconstitutable Reconstructible Recordable Recyclable Referable Retrievable Reusable Restorable Renewable Replaceable Representable Respectable Responsible Removable Reachable Readable Receivable Reclaimable Recognizable Recommendable Reconcilable Recreatable Remissible Rectifiable Recuperable Redeemable Reducible
If anyone cares, I looked it up at Dictionary.com searching for "r*ble"

February 21, 2007
» Recursive Make Reconsidered

In an earlier blog-entry reviewing the book Code Craft, I mentioned the classic paper by Peter Miller entitled "Recursive Make Considered Harmful" ...

Anyway, I recently ran across a bunch of webpages that examined or revisited the issue. I thought several of them were worth sharing, so here they are:


December 18, 2006
» Product-Line CM in CACM

The current issue of Communications of the ACM is focused on Software Product-Lines for software engineering. It has a number of interesting articles on software product-lines and product-families for large-scale reuse.

It even has a few articles related to CM of product-lines, particularly change-management and variability-management:




December 3, 2006
» The Buildmeister's Guide

I received a copy of the book The Buildmeister's Guide: How to design and implement the right software build and release process for your environment, by Kevin A. Lee, who runs www.buildmeister.com (the book is also available on Amazon.com).

I really liked Kevin's earlier book on ClearCase, Ant and CruiseControl: The Java Developer's Guide to Accelerating and Automating the Build Process. Even though it was specific to ClearCase it had a lot of really good information in general about build/release process automation. The Buildmeister's Guide "builds" on that (no pun intended) and covers build automation tools (such as CruiseControl and BuildForge) as well as Version Control in general (including tool selection and branching/merging policies). It also covers more than just Java, and has sections on other language & environment factors like .NET and C++.

All in all, it looks like a very good, and short (~110 pages) guide for beginning and intermediate build-meisters to learn a whole lot more about effective practices, resources and tools for software building and releasing.

October 16, 2006
» Scaling Agility: Seamless Agility across the Enterprise

David Anderson writes about the recent Agile2006 conference in his blog-entry Thoughts for Agile2006:

Scaling Agile. The BIG issue for this year is scaling agile across a whole organization. I see this as having three parts - program or multi-project management and the rollup of schedules and resource plans to a Director or VP level; architecture and enterprise level modeling of a domain and data center; and finally configuration management including build, integration, branch and merge strategies, and work-in-progress batching and related communication.

Ive been dealing with this topic a LOT lately in my own organization as part of efforts to spread amd adapt Agile methods across a large distributed enterprise working with large systems and teams. Ive been researching and collecting lots of resources, including some earlier blog-entries on Agile CMMI and Dancing Elephants and Agile Adoption across the industry.

My perceptions of where the "seams" of the enterprise are that are hardest to introduce Agility into are the close collaboration and alignment required across organizational (lifecycle discipline) boundaries and geographic boundaries (and I find the former to be more difficult to surmount than the latter.)

If I try to categorize them as different areas or aspects that each require the ability to be agile, I come up with something like:
  • Process - Adapting Agile to the Organization (making processes responsive to change)

  • Product - Agile Systems Engineering/Architecture (making the requirements & architecture be responsive to change)

  • Project - Agile Program Management & Governance (making the project be responsive to change)

  • People - Distributed Agile Development (collaborating across multiple sites, teams, and timezones)

  • Organization - Agile Metrics/Reporting, Governance, and Organizational Design

  • Environment - Agile CM, deployment, operation/support, etc.

I'll be blogging separately with lists of resources of found for several of the above.