A Django site.
May 16, 2008
» The fastest way to insert 100K records

I’m doing some performance tests today with our three different database backends: MySql, Firebird and SQL Server.My goal is to find the fastest way to insert a big number of records into a table taking into account the different backends.My test table is the following in the three databases:


CREATE TABLE testtable (
iobjid BIGINT NOT NULL,
ifield0 BIGINT,
ifield1 BIGINT);
iobjid is a primary key and the other two fields are also indexed.So, let’s go with the first loop:
IDbConnection conn = // grab a connection somehow
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;

IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
for( int i = 1; i < 100000; i++)
{
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1)"+
" VALUES ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i);
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms", Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

How long does it take to insert 100K records on my old laptop?
· Firebird 2.0.1 (embedded) -> 38s
· SQL Server 2005 -> 28s
· MySql 5.0.1 -> 40sI’ve repeated the test with all the possible IsolationLevel values and didn’t find any difference.Insert with paramsMy second test tries to get a better result using parameters on the commands... Here is the code:

IDbConnection conn = //get your connection
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;
IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
// sqlserver and firebird use ‘@’ but mysql uses ‘?’
string indexParamName =
GetParametersNamePrefix() + "pk";
string field0ParamName =
GetParametersNamePrefix() + "field0";
string field1ParamName =
GetParametersNamePrefix() + "field1";
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, field1) "+
"VALUES ( {0}, {1}, {2} )",
indexParamName, markerParamName, revisionParamName);
IDbDataParameter paramIndex = command.CreateParameter();
paramIndex.ParameterName = indexParamName;
command.Parameters.Add(paramIndex);
IDbDataParameter paramField0 = command.CreateParameter();
paramField0.ParameterName = field0ParamName;
command.Parameters.Add(paramField0);
IDbDataParameter paramField1 = command.CreateParameter();
paramField1.ParameterName = field1ParamName;
command.Parameters.Add(paramField1);
for( int i = 0; i < 100000; i++)
{
paramIndex.Value = i;
paramField0.Value = 300000 -i;
paramField1.Value = 50000 + i;
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

How long does it take now?
· Firebird -> 19s
· SQL Server-> 20s
· MySql -> 40sSo, it seems MySql is not affected by parameters, but the other two really get a performance boost!One insert to rule them allLet’s now try a last option: what about inserting all the values in a single operation? Unfortunately neither SQLServer nor Firebird support multiple rows in the values part of an insert. I know they can use some sort of union clause to do something similar, but performance is not better.So, let’s try with MySql:

IDbConnection conn = // grab your conn
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;

IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
StringBuilder builder = new StringBuilder();
builder.Append(string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1) "+
"VALUES ( {0}, {1}, {2} )",
0, 300000, 50000));
for( int i = 1; i < 100000; i++)
{
builder.Append(string.Format(
", ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i));
}
command.CommandText = builder.ToString();
command.ExecuteNonQuery();
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

And the winner is... MySql takes only 9 seconds to insert the 100K records... but only using the multi-value insert operation.Enjoy!

May 5, 2008
» Plastic smart branches are out!

I’m very proud to announce the first release of smart branch support in Plastic. The release is still not official (so don’t use it for production before contacting us) and it can be directly downloaded here, no registration needed.



What are smart branches exactly? They’re the evolution of Plastic basic branching functionality answering some common users’ demands: they can remember their starting points so that when you switch back to a branch you don’t have to remember it yourself. While this is very good for a number of well-known branching patterns, it will really help our branch per task practitioners when they have to recover old branches.

A smart branch is, conceptually, very close to a stream, but we preferred to stick to the more traditional name of branch instead.

With smart branches Plastic remembers which one is the starting point of a branch at any given point in time. The branch properties dialog (also new in the BL101 release) will show you which one is the current starting point of the branch and will also let you modify it creating a new one (which is very helpful during rebase operations, for instance).

The new properties dialog also lets users modify the branch name and specially its comments, which was something customers were asking for since 2.0 was released.



Changesets are also given more visibility with the new release: they’ve been present in Plastic since day one, but now branches cannot just be created “from a label or baseline” but also from a given changeset, which eases maintenance.
Plastic branch inheritance mechanism is flexible enough to define many different branching strategies and now it can be easily tuned (easier than before) with the usage of smart branches.

A smart branch is just a regular Plastic branch with a link to a starting point. A starting point being:

  • another branch (in which case it will inherit from the LAST on this branch, implementing dynamically updated inheritance)
  • a given changeset on a branch (which specifies fixed inheritance from a well defined starting point)
  • a label (which is the regular case normally used as best practice for branching).

    A new changeset is created after a new branch base is set so that users can easily find a checkpoint to be used later on, if needed, to recover this specific configuration.



    If you look at the figure above you’ll see that if a developer chooses to go to changeset 99 the branch /main/task001 will use label00 as basis, but label01 if cset 100 is selected.

    Changes are also introduced in the selector definition so that now rules like the following will be allowed:


    rep “codicetest”
    path “/”
    smartbranch “/main/task001”


    And all the required branch inheritance details will be set.

    The release BL101 makes all this new functionality available from the GUI and our next release will also introduce support for smart branches from the Branch Explorer.

    We expect smart branches to make Plastic branching easier to use for both new and existing users and also introduce more advanced branching scenarios when needed.

    Enjoy!
  • April 30, 2008
    » SCM trends at DDJ

    The folks at DDJ have just interviewed me about SCM Trends.
    I tried to give my own view about the future of SCM and even talked about how threading and concurrency (which I've to admit is one of my favourite topics since long, long ago I first read Ben Ari's book, now there's a new edition available and also the superb Jeffrey Richter's book which is also renewed this year) have an impact in the version control field, or even better how I believe SCM can help there.
    I've also focused on why C# was an important decision in the early Plastic development, and more specifically why Mono was actually the key which really opened the C# door for us!

    April 17, 2008
    » new blog skin

    Our blog skin has been changed today to match the website looks and I started to browse old entries.

    I found a couple of funny videos showing how to control our 3D version tree with a gamepad and a wii remote.

    Just for fun:



    April 9, 2008
    » The break up, a Clearcase bad love affair

    This is the true story of a break up: a sesioned SCM manager tells us why he finally broke up with is beloved system: clearcase.

    It probably sounds just like a funny joke, but please read on carefully and find the hard facts behind his disappointment...

    Here it goes:

    After 12 years working and playing with clearcase I'm fed up with it.
    The beautiful lady I fell in love with after my first months in a software development and R&D; company became a real pain for me. And a new mother in law, IBM, didn't make it any better.

    At the time we met we were doing development for digital media on these nice systems made by Silicon graphics.
    It was the time when a system at work was way more powerful, and much more expensive, compared to the silly 486 PC running some version of DOS I had at home. So I went to work with the idea that what I did really mattered for me and the company. After all it was before the bubble.

    What I particularly liked about clearcase was the fact that it was completely invisible for the developer.



    You set a view and you could start working with any tool you wanted, the sources were right there where you needed them. It was a fast and advanced environment. It had directory versioning, so you could change the structure of your code without trouble. You had advanced branching and merging, you had merge tracking and the branching strategy we used at the time was branching per task. It was possible to do this efficiently in clearcase since it was so good at branching and merging.

    And it had all this 12 years ago!

    And the stunning fact is that while the world evolves, clearcase chooses the completely wrong direction.

    It tried with the ugliest implementation ever conceived for a SCM environment, UCM, Unified Change Management, trying to impose a ridiculous overloaded process on top of the old base. She completely ignored her old lover boy, the developer, while trying to impress those people who hated development. It broke down the whole idea of branching per task.


    It gave up on having a consistent user interface on all the platforms it supports, using clearcase on the different flavors of UNIX, Microsoft windows or Linux really is a different thing. The clearcase GUI on UNIX and Linux is really bad. And why has the command line so much more features compared with the GUI? And why is Apple ignored?

    The fact that there is no standard way of doing a recursive check-out or check-in, the error messages that seem to be completely unrelated with the real error, the fact that if line endings are changed in textfiles used in a crossplatform environment and the merge and compare tools that fail on these, the fact that the number of characters on a line, e.g. in an XML file is limited and makes the merger fail are all things I could handle.

    But the real showstopper comes with the integrations provided to 3rd party tool like eclipse, WSAD, Visual studio etc... , these integrations are in fact only integrations to do a check-in/check-out, all other clearcase operations are a pain in these environments. Simply switching views in an IDE to a view on another branch in the same project is almost impossible, effectively making the entire branching power useless.

    While at the old days, clearcase was invisible for the developer, a silent and helpful companion, nowadays it makes a developer's work difficult, slow and cumbersome. Some people even argue that going back to the stoneage of SCM practice, by using tools like CVS or Subversion, is better then using clearcase.

    And right at the time when I really needed to feel young again, fresh and excited about software development, a beautiful, open product appeared on the scene, made by developers for developers, all the focus is again on how development and integration can be done in an efficient, fast and powerful way. With a consistent look-and-feel on all the platforms it supports. (it even runs on Apple, a brand ClearCase always ignored), with an easy to use branching and merging environment, with good integrations with your IDE, so switching workspaces to another branch is a very easy and fast task, with merge tracking and correct directory versioning.
    In fact with all the good things of clearcase and a solution for most problems it has.

    Please go and check-out plastic SCM 2.0 as fast as you can, you won't be disappointed.

    April 7, 2008
    » Advanced selectors, part III

    After the previous posts describing selectors in detail, it's now time to enter the last selector frontier: multi repository selectors.

    As you know plastic can manage multiple repositories. You can map each one of your projects inside a plastic repository, or go for more advanced practices like component oriented development.

    Repositories can be totally independent from each other, but there're also situations in which they can be tightly related. For instance you can have shared libraries you’ve developed and you reuse between different projects. If this is the case, it can be useful to have one repository for each project, and one repository for the libraries.

    But then, how can developers use the code from the libraries and the project at the same time?

    Let's take a look at a very simple repository like the one on the figure. It has just a couple of files and an empty directory. Probably none of your projects looks so simple. Suppose this is the repository named “proj00”.



    Then you have another repository containing your library code. It looks like the one at the following figure. This repository is named “lib_repos”.



    Now we need to make the lib_repos repository available to the “proj00” developers.
    Please note we’ve created an empty “lib” directory which will be used as mount point in “proj00” to plug “lib_repos”.

    Take a look at the following selector:


    repository "lib_repos" mount "/lib"
    path "/"
    branch "/main"
    checkout "/main"
    repository "proj00"
    path "/"
    branch "/main"
    checkout "/main"

    Remember how selector rules work:
    from top to bottom, and you’ll see how we’re telling plastic: take everything from lib_repos at the main branch, but mount it at /lib. Later on it will need to resolve the /lib path, and it will be solved using the next repository rule (proj00).

    If you run the following ls (with a format modifier to show the repository information) you’ll see the following.

    >cm ls
    br:/main#1@rep:proj00@local:8084 .
    br:/main#0@rep:proj00@local:8084 file00.txt
    br:/main#0@rep:proj00@local:8084 file01.txt
    br:/main#1@rep:lib_repos@local:8084 lib


    Note:
    We used the following PLASTIC_LS_FORMAT environment variable:
    LS_FORMAT="{3}@{8,-26} {4,-5} {5}"

    Right, the lib directory is being loaded from the lib_repos repository.
    Of course if you move inside the directory you’ll check that everything inside is from the same repos. Look at the following plastic screenshot showing the repository details of the files and directories.

    Implementing a real working environment
    The selector above showed how to go for main branch development on a multi-repository scenario. But you’ll probably need to implement a whole branching strategy. If so, then consider the following selector:

    repository "lib_repos" mount "/lib"
    path "/"
    label “lib_00”
    repository "proj00"
    path "/"
    branch "/main/task001" label “BL010”
    checkout "/main/task001"


    This resembles a branch per task pattern on a multi-rep scenario.

    Please note we’re now mounting “lib_repos” as read-only because we’re just specifying a label and not a check out rule.

    The way to use the mounted repository will vary depending on your project’s needs. It can happen than a different dev group completely manages “lib_repos”, then is ok to mount it read-only because developers at “proj00” will only use it as a “library”. Lib_repos will go through its own release cycle and the team at proj00 will only have to take care of changing the label of the mounted repository when a new release is available and approved for their project.

    It can also happen that your team is responsible of both repositories. You’ve decided to split them because they’re clearly different components but there is only one release cycle for them.

    Then it makes sense to follow a combined “branch and merge cycle” for the two repositories. If you’re working on task001 then it can happen you need to change code at both lib_repos and proj00. You’ll be probably using a selector like:


    repository "lib_repos" mount "/lib"
    path "/"
    branch "/main/task001" label “BL010”
    checkout "/main/task001"
    repository "proj00"
    path "/"
    branch "/main/task001" label “BL010”
    checkout "/main/task001"


    Please note that:
    There’re two branches /main/task001, one at each repos, and the same for labels, but you can use a naming convention (using the same name as I’m doing here) to enforce their relationship.

    Going really advanced, configuring what you mount
    So far we’ve seen typical mount scenarios. But, what if you need to mount inside the “/lib” directory at “proj00” something which is not at the root of “lib_repos”?

    Then we’ll use the power of plastic branch inheritance to actually get the desired result.

    Suppose we want to mount inside “lib” the content of the “/bin” directory. So, we want to use “/bin” as the root of the lib_repos repository. To make things easier we’ll assume there’s a “release” subdirectory inside “bin”. Check the following figure.



    Let’s go to the lib_repos repository and create a branch named “/main/mount-point”.
    Then let’s use regular rm and mv commands to correctly configure the “mount-point” branch as we need.


    >cm co .
    Checking out . ... Done

    >cm rm doc src
    Item doc has been removed.
    Item src has been removed.

    >cm co bin
    Checking out bin ... Done

    >cm mv bin\release .
    bin\release has been moved to .

    >cm ls
    0 07/04/08 dir br:/main/mount-point#CO CO .
    0 07/04/08 dir br:/main/mount-point#CO CO bin
    0 07/04/08 dir br:/main#0 release

    >cm ci bin
    Checking in bin ... Done
    Created changeset
    cs:3@rep:lib_repos@repserver:CONRAD:8084

    >cm rm bin
    Item bin has been removed.

    >cm ls
    0 07/04/08 dir br:/main/mount-point#CO CO .
    0 07/04/08 dir br:/main#0 release

    >cm ci .
    Checking in . ... Done
    Created changeset
    cs:4@rep:lib_repos@repserver:CONRAD:8084


    Then we can set the following selector:


    repository "lib_repos" mount "/lib"
    path "/?"
    branch "/main"
    checkout "/main"
    path "/" norecursive
    branch "/main/mount-point"
    repository "proj00"
    path "/"
    branch "/main"
    checkout "/main"


    Please note the following:

    We’re using the /main/mount-point branch just as a way to “refactor” the directory structure, but all the contents will be loaded from the main branch of the “lib_repos” repository. Of course instead of the main branch we could be using a different one.

    The purpose of the /main/mount-point branch is not being merged back into “/main”
    but just hold a project reorganization. In fact we can even prevent it to be merged denying the merge permission.



    Let’s set the following selector:


    repository "lib_repos" mount "/lib"
    path "/?"
    branch "/main/task001" label “BL010”
    checkout "/main/task001"
    path "/" norecursive
    branch "/main/mount-point"
    repository "proj00"
    path "/"
    branch "/main/task001" label “BL010”
    checkout "/main/task001"


    And a little explanation:

    Whenever you make changes to your code inside the “lib” directory, you’ll be placing the changes directly inside “/main/task001”, but using the directory reorganization from “/main/mount-point”.

    When you merge back /main/task001 in lib_repos, you’ll be only getting the changes made on the task, and not the entire reorganization made inside “mount-point”.
    That’s why plastic branch inheritance is so powerful and allows many different scenarios to be implemented.

    Future work
    We’re currently working on the design of new selector rules to allow different repositories to be mounted on different locations directly. Some people will find the “mount-point” branch solution helpful, but other will prefer to be able to do something equivalent just using selector rules.

    We’re introducing new selector rules to be able to specify which one is the root item to be used in a mount point, for instance. This way you could specify that /bin is now the root at lib_repos.

    Also we’re working on creating “workspace selector directories”, which are local directories not under source control but managed by the tool (created by the update process) and able to hold controlled code…

    So, we’re open to suggestions… feel free to contact me by email if you have ideas about possible selector evolution.

    April 4, 2008
    » Integration strategies

    Previously we were discussing about the future of continuous integration, according to Duvall’s award-winning book and possible alternatives.

    Today I’ll be focusing precisely on this topic: different alternatives to handle the integration phase.

    Some will prefer to stick with the main-line development style, while others will gravitate through a more controlled (and maybe less agile if we go religious) approach.

    Beyond the selected branching strategy, there will be also an integration strategy.
    Let’s start with main-line development. This is probably the most well-known and spread technique in version control. How does it work? Simple, all check-ins go to the main branch, as you can see in the following figure:

    Of course it has its own advantages and disadvantages.

    Main-line pros

  • It is simple to set up and easy to use
  • Every version control system out there is able to handle it (at least main-line is supported by everyone)

    Main-line cons
  • Traditional use leads to continuous project instability.
  • People are updated to the latest version after each commit, so they can be easily infected by the “shooting a moving target” disease.
  • To prevent the previous problems developers are enforced to check-in only when their code is fully tested. It can potentially lead to code being outside the version control for long periods. Besides developers start using the version control only as a deliver mechanism, it’s not a tool for development anymore, they can’t check-in every five minutes just to create check-points unless they’re totally sure the code won’t break the build... Then you loose the ability to use the version control to know why your code was working before the last minor change you made...


    Continuous Integration standard practices try to solve all the above problems with two principles:
  • Commit as often as possible, even several times a day, (please note: several times a day is less than you’d be checking-in code if you had your own branch to submit even intermediate non-working code, just to help you during development).
  • Always make sure you don’t break the build. You’ll need a strong test-suite to really help you checking your changes plus the previously submitted ones together don’t break anything. If you follow test driven development, then you’re in a good position to achieve this goal.

    What you get in return is a project that evolves very fast, which is great. The problem I usually found is that teams have problems preventing the build to be broken, and they normally prefer to trade fast evolution for stability. This is not always true, of course.

    Some alternatives
    The alternative I’ll be talking about is the branch per task pattern. Some aliases are: Activity Branching, Task Branching, Side Branching or Transient Branching. It is probably my favorite pattern because of its flexibility, ease of use and its associated benefits. It also sits the basis for new trends in version control like stream management.

    How does branch per task looks like? Take a look at the following figure:

    Note you have a main-line, which has been labeled as “BL00” and then there’s a task branch there named task001.

    There are three important considerations here already:
  • Each task starts from a well-known baseline: you remove the “shooting a moving target” problem from day one.
  • The mainline will only receive stable changes. So you really enforce the “never break the build principle”.
  • You create an explicit link between tasks in you favorite project management system (think about Jira, DevTrack, VersionOne, OnTime, Mantis, Bugzilla or just your internal mechanism!). There’s something really important here: a developer works on a branch, which he knows is related to a given task, because it is named after the task!. So developers are totally aware of the exact planning item they’re working on. Which means both project managers (or scrum masters or whoever you’ve to report to) and developers finally speak the same language!

    In fact you’re getting rid of the main-line style problems. The drawback here is that it is a bit more complex to understand (just a little bit!) and not every version control out there supports it (that’s why we developed Plastic!).

    Rapidly you’ll be creating more branches to implement more changes, as you can see in the next image:



    But the point here is not only when or how you create your branches, but when, how and who integrates them back into the main-line.
    I’ll be talking about two different approaches.

    Running mini-bigbangs
    You’ve decided to go for branch per task and then you have your colleagues creating branches for several days. When you should integrate them all back into the main branch?

    I’d normally say: no longer than a week. In fact if you let time between integration span longer than a week you’ll be normally hitting one of the biggest version control problems: big-bang integration!

    Big bang integration is a very well-known and documented problem, one of the “roots of all evil” in software development, but it is still out there waiting for new victims.

    How does it work? You’re working on a, let’s say, 6 months project. Then you plan 2 milestones and split your team in sub-teams. They work on separate features and they’ll be integrating their work together one week before each milestone. Sounds familiar? Well, I hope it doesn’t because it is a great recipe for disaster!

    If you follow this approach instead of a week, probably your first “integration” will last much longer, and needless to say the second one won’t be better... You’ll get an amount of software which never worked together before, and you need to be sure it works... in a week! Crazy!

    That’s why it is such a good idea to reduce time between integrations. I’d say integration frequency has to be inversely proportional to the amount of work your team can perform. I mean, if you’re running an small 5 developers group, maybe it is ok if you run an integration a week, but as soon as you get bigger, maybe you’ve to run them more frequently, even more than once a day!

    The whole point here is avoiding integration problems. It is exactly the same rule introduced by continuous integration: if something is error prone... do it often to reduce the risk!

    And remember the integration problem shouldn’t be actually merging the files and directories. If it is: switch to another version control!. The problem is that even when code compiles correctly, it can break a lot of tests or just hide an unexpected number of critical bugs. As I mentioned, the problem shouldn’t be merging. We were working with a company once which was running weekly integrations with CVS. They needed several hours to just merge the code together.
    Then they switched to plastic and they’re using the “spare” time to run a whole test suite. That’s the point of integration, being sure your code works as expected, not being worried about how to merge it.

    Being that said let’s take a look at how a short merge iteration looks like:



    I named them “mini-bigbangs” because they’re actually big bang integrations: you take a number of separate developed code changes and merge them back together. The key is that they’re short enough so that the real “big-bang” diseases don’t show up.

    Why then run this approach still, why not directly going to pure continuous integration on the main-line? Well, having your own branch for each task you develop still sounds as a very good idea, it gives you a great place to create changes, prevents mainline corruption and all the other advantages of branch per task.

    You’ll then continue working and your development will look like the following figure:


    Until you run again another integration and:


    Is it now clearer? Remember you test suite is also a cornerstone for this integration approach. You must enforce a subset of the whole test suite to be run upon task completion (you can set up a build server to do that, polling for new finished tasks, downloading, compiling and unit testing them once they’re marked as finished).

    There are some warning lights to watch here: I’ve been running this kind of integration very successfully on different projects, but it can come a time when, for some reason, integrations become a real pain. In my case it actually happened because of the test-suite: it grew too large that checking each task upon integration (something you have to do!) took too long. Then integration started to grow longer and longer, and they became a real pain, as I mentioned before.

    Please note that the real problem isn’t at the version control field but at the testing ground: maybe the tests were so fragile or took too long, and they have to be somehow fixed. But anyway, let’s try to figure out some alternatives.

    The first one was running staggered integrations: the developer running the “mini-big-bang” decided to group tasks together, integrate them into different intermediate integration branches, test them in parallel, and then merge them all together into the main branch.



    Remember the whole point here is not speed up merging, which is already very fast, but be able to run the biggest number of tests in parallel while merging branches.

    Merging branch per task and CI
    If you’ve a situation with a very large test suite and you need to prevent integrations to last too long, or you want to avoid the integrator’s role (in the purest agile approach) or you go after an even faster release cycle, you can combine the branch per task technique with the continuous integration approaches.



    With this approach each developer merges his own branch back into the main-line. Remember he must first (as it’s shown on the second branch) merge down from the main branch to take the latest changes, then run the tests (using a CI server, for instance) and only when they all pass merge the changes up (it will be just a “copy merge”) to main.

    Then a build can be triggered on the main branch, even a long big one, and it everything goes right, a new release can be created.



    Continuous integration with build branches
    There are other options available. Suppose each developer merges back into a specific “build branch” like the following figure shows:



    The build branch can be used to run integrations during a work-day, and then over night an automated nightly build process can run. If everything works as expected, the build branch with all the changes made during the last day can be automatically merged back into the main branch, a new release created and used as baseline for development the next day.



    Wrapping up
    As you can see there are several alternatives when running integrations, from the simplest to the most sophisticated ones. I’d always follow the KISS principle, and try to keep things as simple as possible. Of course this is not always the case, and then you’ve to figure out which is the best alternative for your team.

    Do you want to try yourself?
    If you want to try it yourself... why don't you download plastic here? There's also a Linux version available here.

  • » Hiding branches using security

    How can I set up plastic security so that developers can only see their own branches plus the main one and the maintenance?
    I was asked this a couple of days ago and now I’ll try to give a detailed answer and explain, step by step, how to set it up with plastic.
    Secure your rep server
    The first step will be securing the repository server. Do you know what the rep server is? Well, it is just the server which is handling all the repositories, and as you know, repositories are the ones which actually contain the data (branches, revisions, items, labels...)
    Why is such a good idea to secure the rep server? Plastic implements a whole security hierarchy. Everything inherits from the repserver, then the repository, then the actual objects inside each rep. Take a look at the following diagram for more information:



    The first thing we’ll do is to allow access to the repserver only to our allowed users. In our case I’ll be granting access to management, development and the special users named OWNER.
    To set up repository server permissions from the GUI go to the repository view, right click on a rep and go to repository server permissions....



    Please note both the owner and the management group have all the permissions granted, but developers will have the view permission not set. Check the following figure:



    This way the developers won’t be able to view any object they don’t have created, but we’ll grant access to the right branches, items and labels so they can work correctly. The permission granted to the owner ensures they can work on the branches they create, as the initial question asked.
    Note: check you set up permissions with a user belonging to the right administrative group, management in my case.
    Grant access to the code!
    Right now no developer can access the code because all items inherit from the repository, and the repository from the repserver, and the repserver doesn’t have view access set for developers.
    So, as an administrator go to your item’s view, right click on the root item, and go to item’s permissions as the following figure shows:



    Once you’re there make sure developers get the view permission set!



    Securing the branches
    Now it’s the turn for the branches. In the example we want all developers to view the main branch and the branches they create. The second requirement is already achieved due to the owner permissions at the repserver level.
    To actually allow developers to view the main branch and the maintenance (and any other branches you want to make available) go to branch permissions and enable the view permission.



    Check everything works
    Check the following figure. The plastic instance on the left is run as a privileged user, while the one on the right corresponds to a developer. Note that the developer can only list the main and maintenance branches and the one he has created, but not the one created by the admin user (/main/task001pablo) in the sample.



    Wrapping up, don’t forget the labels
    If you remember the security hierarchy introduced above, you’ll notice that labels won’t have view permissions either. So after creating a new one, make sure you grant the right access.



    The plastic security mechanism gives users huge flexibility. The purpose of setting permissions doesn’t always need to be related with preventing unwanted access but also being to enforce certain development policies. I’ll be talking about it on a future post.

    April 3, 2008
    » Obtaining reports about repository activity

    What if you want to know which revisions have been created in the last days? Who made the changes, where? Or if you're looking for changes made at a certain component to locate an specific modification you'd like to review?

    It is very easy now with the customizable views in Plastic 2.0.

    Take a look at the following figure (click to enlarge):



    I've customized the changeset view to actually display revisions. And I'm using the query system to retrieve the revisions I've created since April 1th at our main repository.

    I'm then using the filter box to focus on the revisions at the directory 01nerva

    The query is very simple:

    find revisions where date >= '4/1/2008' and owner = 'pablo' on repository 'codice'

    Of course I could go even further and try to focus on the revisions created on a given branch, or bigger than a certain changeset... and so on.

    The following query will locate the revisions created at two repositories after a given date on a branch different than the main one.

    find revisions where date >= '4/1/2008' and owner = 'pablo' and branch != '/main' on repositories 'codice','pnunit'

    As you see, the query system is very useful to create activity reports, locate certain changes, inspect modifications and so on.

    Hope it helps!

    April 1, 2008
    » Plastic selectors: welcome to the dark side

    I could have titled this blog post "the most complete trip trough workspace selectors ever", which is probably more accurate, but too boring.

    I will try to explain, step by step, how the key component inside the plastic server works, because almost all the other concepts in plastic have something to do with it.

    First things first: what's a selector? A selector is just a piece of text, associated to a workspace, whose only mission is to tell the server what has to be downloaded into a workspace and how checkouts have to be performed. It actually "selects" (that's why it is called a selector) a set of revisions from the repository to be downloaded into the workspace.

    Maybe you've been using plastic for months and you've never heard about selectors... if so, how is this possible? Well, the GUI folks at codice did a pretty good job trying to make them as transparent as possible. So in fact, even if you're using the standard version, each time you perform a "switch to branch" or "swith to label" operation, you're setting a selector.

    If you right click the name of your workspace, a pop up menu like the one in the following figure will show up, and then if you click on "set selector" you'll see your current selector.



    A typical selector (the default one when you create your first plastic workspace) looks like:


    repository "default"
    path "/"
    branch "/main"
    checkout "/main"


    Which means, rule by rule:
  • repository "default" -> I'll work on a repository named "default"
  • path "/" -> apply the following rules for all the items whose path matches "/" (the root, so all the elements) inside the previous repository
  • branch "/main" -> take the latest revision on the branch "/main" for each item processed. Why "the latest", well, by default, if you don't specify something else (like a changeset rule or a label or a revision number), the branch rule takes "the last on the branch".
  • checkout "/main" -> which means: if you ever have to check out something, place the check out on the main branch, please.

    Does it sound confusing? Ok, let's try to check how it works step by step. But first let's think a little bit about two concepts I've just introduced: what's an item? and... what's a revision?



    An item represents a file or a directory, but it is very important to note that it doesn't actually resemble a real element on a file system. It is a bit difficult to understand at the beginning. An item tells plastic "something" exists, but it doesn't tell plastic any other information.

    The real information associated to an item goes inside the revisions. Each item has one or more revisions, and revisions, as you know, are stored inside branches.

    As programmers we can think of items as "classes" and "revisions" as their instances. Is it clearer now?

    And now you might think: ok, and items store the names of the files and directories... wrong! It is a bit more complicated than that, and it gives a lot of flexibility.

    Names of files and directories are stored as data of directory revisions. I mean, what's the data of a given file? Ok, this one is clear, the file's content, right? Then, what's the data of a given directory? The files and directories it contains? So whenever you create an item, its name isn't associated to the item itself, but introduced as data inside the directory containing the file or directory. That's exactly why you've to check out the parent directory to add a file or a directory, as you've probably realized while you were using plastic.

    There's something very important here, which introduces an important indirection and is critical to introduce selectors: directory's data contain the names of the items it contains, not the names of the revisions. So, a directory knows which items it contains, not the exact revisions, which will be decided at "runtime" by the specific selector rules.

    Let's explore a simple sample now. Whenever you create a new plastic repository, the following objects will be created by default: a root item (an special item which is the root directory of your new repository, so you can actually add contents inside it), a main branch (the branch "/main", which you're probably familiar with) and a revision of the root item inside the main branch. This revision is the one which let's you start working with your newly created repository.

    If you set a selector like the one described above, and you update it, the content of the new repository will be downloaded.

    On selector resolution, plastic will always try to find the root item for each specified repository. All repositories have a root item, if not, something very wrong happened to the repository (it's broken!) and the system will abort the operation. In the previous selector, the repository is "default".

    Once the root item is located it will try to locate a rule that matches, by path, with the current path. The current path is the root item, "/" so the first rule matches.

    Try to set the following selector to work with your newly created repository and check what happens:


    repository "default"
    path "/dumb"
    branch "/main"
    checkout "/main"


    Then plastic will complain with the following message:

    Can't load the root item. Probably, the workspace selector contains errors

    Why? Well, it tries to load a revision for the root item, then it finds none of the rules match the "/" path because they need the path to be, at least "/dumb", and then the root directory can't be loaded.

    If we use the right selector (with a rule path "/") plastic will be able to locate a rule, which is telling: "load the last revision at branch main". There's only one revision at this moment at branch main, which is the root revision (the first, the last, and the only one existing for the item right now), so it will be loaded.

    Then plastic will get the data of the root revision. It is empty, so the selector is solved.

    What if we now add a new file inside the workspace?. To do so (the GUI handles it automatically, but from the command line you've to manually check out the root dir) the root directory will be checked out, a new revision will be created in check out status, and then a new item for the new file (let's say file01.txt) will be created. A entry for file01.txt will be created on the checked out revision of the root directory, pointing to the newly created item. Then a revision (in check out status) will be created for file01.txt. If you now check in both the directory and the file, you'll get something like the following figure.



    There are two interesting conclusions you must understand after reaching this point: first, how is the information stored in directory's revisions, and second: how a directory points to its content by item, and not by revision.

    Let's play a little bit with plastic, and create a couple of new revisions for file01.txt. We do so by checking out the file, adding more content, check in it, check out again, more content and check it back in.

    We get something like the following figure.



    Let's now play a little bit with the selector.

    Set the following selector and check what happens:


    repository "default"
    path "/"
    branch "/main" revno "0"
    checkout "/main"


    Once your workspace is updated you'll check that... it's empty!!
    Where file01.txt has gone? Look carefully at your selector again. You're telling it: get the revision "0" at branch main for all the items which path matches the "/". Then you could think, "ok, but then I need the first revision of file01.txt!!". And this isn't true, because the first thing the selector does is solve the root item, once it is located, it will load its revision 0, and if you look at the figure carefully, you'll notice the first revision of the root item on main is... empty! There's no content to be downloaded, and that's why your workspace is empty now.

    Then, how can you load the revision 0 of file01.txt. There are several options, let's check the first one.

    repository "default"
    path "/file01.txt"
    br "/main" revno "0"
    path "/"
    branch "/main"


    You're actually specifying how to load file01.txt.
    Plastic evaluates the rules from top to bottom, it will first try to use the first rule, if it finds a match, then it won't go for the second for this item. So, how does it work with this example?
  • plastic locates the root item and tries to load a revision for it. It evaluates the first rule, but the path rule doesn't match the "/", so it goes to the second one.
  • The second rule tells plastic how to load the root item, so it loads the latest revision on main for the item, which is revision 1.
  • plastic gets the content of the revision 1 at main for the root item, which contains one file: file01.txt. Then it goes back to the first selector rule.
  • the first rule matches the path, because it is actually "/file01.txt" so it applies the load rule, which is telling plastic to get the revision 0 at branch "/main".
  • there are no other pending items, so the selector is fully loaded


    Let's explore a second (and a bit more complex) possibility:

    repository "default"
    path "/?"
    br "/main" revno "0"
    path "/" norecursive
    branch "/main"


    Study the previous selector in detail.

    Path rules understand two different wildcards right now: * and ?. * stands for 0 or more and ? for 1 or more. So, we're telling plastic, for each path starting with / and containing at least one more character, go to the first rule, otherwise go to the second.

    The second rule, which is the one loading the root item now, uses a new keyword: norecursive which means: use this rule only for paths exactly matching the rule, not files or directories "inside" the path in the rule.

    Are you familiar with changesets? A changeset resembles and atomic commit and is created each time you check in one or several files or directories together. A unique number is generated and assigned to all the revisions checked in together. This number is the changeset.

    Selector rules can be also used to work with changesets. Suppose you want to load the revisions at changeset "0" which is the one created by default with the initial root directory revision, you'd set a selector like:

    repository "default"
    path "/"
    branch "/main" changeset "0"


    Specifying different numbers you'll be able to jump to different changesets.


    Well, that's enough for today, in the next chapter I'll be talking about labels, and how they work with selectors. If you have any questions, feel free to ask...
  • March 31, 2008
    » Selectors: welcome to the dark side, part II

    Welcome back to the plastic dark side! Today we'll be moving deeper in branching concepts, and trying to explain how labels and branches work.

    In the previous post I've introduced several important concepts like how selector rules are evaluated by plastic one by one, from top to bottom, and how the path specifier works.

    Let's now go to the "label" selector rule modifier. But first, what's a label?

    A label is simply an object inside a plastic repository. Just that. When you create a label from the command line or from the GUI, you're just creating a new object inside the repository.

    Labels are useful once they're applied to revisions as you can see in the following version tree. In the tree you see that revision 20 has only one label (BL051) but revision 19 has a lot of them (which in this sample means it belongs to a number of baselines... it wasn't changed for a long time).



    So to label a revision in plastic you first need to have a label, then apply the label to a revision. From the GUI the only option available is to label all the entire workspace content, which means apply the label to the revisions you have currently loaded. Why? Because users normally use labels to group together a set of revisions at specific moments like releasing a new version. And more often than not is the whole workspace what they want to label.

    Let's take a look at the sample in the following figure.



    You see we've a directory with a couple of entries, one of them labelled with label "BL001". So, what would be downloaded to our workspace if we set the following selector?


    rep "default"
    path "/"
    label "BL001"



    We're telling plastic:
    download everything marked with BL001. So, what do you expect to be downloaded?

    Nothing!

    You'll get an error saying you can't load the root item.

    Why? Well, remember how plastic resolves selectors: first it gets the root item and tries to find a revision for it using the selector rules. So it will try to find a revision for the root item labelled with BL001 and... yes, it can't... end of the story.

    So, whether you label the root item too, or you provide a rule to load the root.


    rep "default"
    path "/"
    label "BL001"
    path "/"
    branch "/main"



    The previous selector solves the problem.
    It will go down to the second rule to find a revision for the root item, so problem solved.

    Now you understand why it is so dangerous to label separate revisions instead of the whole workspace tree, unless you know what you're doing.

    Labels are very useful because they also introduce a lot of flexibility. For instance, in the following example, if you specify a selector like the previous one, you'll download the workspace a tree like the one at the bottom of the figure.



    Enter the world of branches
    So far we've played with revisions, changesets and labels. Let's start talking about branches.

    In plastic a branch is just a revision container. When you create a new branch you're just creating a new empty object in a repository. You'll have to work with it to really put some content on it. Note this behaviour differs from systems like Subversion, CVS, SourceSafe, TeamSystem or Perforce, where branch creation actually means copying revisions to the new branch.

    In plastic, you create a new branch, and it is totally empty.

    How can we use it? Suppose we've just created a main branch /task001. Let's do it from the command line:


    $ cm mkbr br:/task001


    Note that from the GUI you normally create only child branches.

    I'll discuss child branches in detail later on.

    Ok, now you have task001, but, what can you do with it?

    You're almost an expert in selectors, do you have any idea?

    What if we set a selector like the following:


    rep "default"
    path "/"
    branch "/task001"
    co "/task001"


    Yes, you're right...

    The server will complain telling it can't load the root item.

    So we've to find out a way to make it reach the task001 branch...

    What about the following?


    rep "default"
    path "/"
    branch "/task001"
    path "/"
    branch "/main"



    Ok, it will work.

    But we're loading the revisions from main... How can I actually create a revision on the new branch?


    rep "default"
    path "/"
    branch "/task001"
    co "/task001"
    path "/"
    branch "/main"
    co "/task001"


    Watch the selector above. Specially the second rule.

    It is telling plastic: get the revisions from main, but if you've to checkout something... place the checkout at "/task001". So, as soon as you checkout a file or directory... it will go to task001, and you'll start using your newly created branch!

    Please note if your second rule were something like path "/" branch "/main" co "/main" the revision wouldn't ever go to the task001... so the first rule would be useless.

    So, now you've your first checkout on a separate branch... If you run a ls from the command line, or if you check the item's view on the GUI, you'll see the revision is at a separate branch.

    Two important things here:
  • now you have a better understanding of selectors
  • now you understand how branching works in plastic: that's exactly the reason why only changed revisions go to branches while the rest is kept in the parent branch...

    Branches and labels combined
    With what we've learned so far... how would you implement a branch per task pattern? Easy?

    Yes, just create a new branch for each new task, and then set up a selector like the one above. And you're done.

    "Ok" - you may ask - "but what if I need to start working from an specific baseline?".

    Let's go.

    Suppose you want to work on task001, but your starting point must be a well-known stable baseline labelled BL059.

    How would you set up your selector?

    Remember: you need to take everything from BL059, unless you've something on your task branch, which will be retrieved first...

    Take a look at the following selector.


    rep "default"
    path "/"
    branch "/task001"
    co "/task001"
    path "/"
    label "BL059"
    co "/task001"


    Done, right?

    Then, what are child branches for??

    If you think they aren't needed... you've become a real selector hacker!! Congratulations! :-)

    Let's go again to the selector dealing with branches /main and /task001


    rep "default"
    path "/"
    branch "/task001"
    co "/task001"
    path "/"
    branch "/main"
    co "/task001"


    Here you're!

    After some weeks using them for branch per task at the beginning of the project (long, long ago), we came up with the following:

    what if we make /task001 inherit from /main somehow? It is what we're trying to do with the selector anyway...

    And then child branches were born!

    If you create task001 as a child of main you're saying if task001 has content, take the revisions from it, otherwise take them from main which is exactly the same as the previous selector.

    But child branches make your live easier. To create a child branch you can go to the GUI or type


    $ cm mkbr br:/main/task001


    Then the selector to use it would be:


    rep "default"
    path "/"
    br "/main/task001"
    co "/main/task001"


    Which actually saves you two lines!.

    And that's basically the reason why child branches were born: to save some lines writing selectors!! (Ok, please note that by the time when child branches were introduced, selectors were still written in XML format, this happened in the plastic prehistory, before any customer have even heard about plastic... and saving some lines was very important to the lazy developers!!).

    Child branches and labels

    "Ok" - you say - "but what if I want to use a baseline, which is the regular way of working anyway..."

    And you're right!

    To use a baseline you combine the label rule inside the branch rule, like the following selector


    rep "default"
    path "/"
    br "/main/task001" label "BL051;LAST"
    co "/main/task001"


    For shortness the label rule can be simplified:


    rep "default"
    path "/"
    br "/main/task001" label "BL051"
    co "/main/task001"


    The good thing here is that it is very simple to write multi-branch selectors


    rep "default"
    path "/"
    br "/main/release50/bug-fix490" label "stable040;BL050;LAST"
    co "/main/release50/bug-fix490"


    The branch per task rule

    And now admire one of the least known plastic selector rules:


    rep "default"
    path "/"
    branchpertask "/main/task001" baseline "BL009"



    Which is equivalent to:


    rep "default"
    path "/"
    branch "/main/task001" label "BL009"
    checkout "/main/task001"


    But even shorter!

    Wrapping up!

    Well, now you're familiar with all the core selector concepts, and you're ready to jump to our next big topic: multi-repository selectors!!

  • March 28, 2008
    » Continuous integration future?

    A few days ago I was re-reading the book "Continuous integration" by Paul Duvall. I find it a really interesting reading, especially when you use agile practices.

    The book dates from mid 2007, so is quite new, and there's a chapter at the end of it which really surprised me. It is titled "the future of continuous integration", and it focuses on two interesting questions:

  • How can broken builds be prevented?
  • How can builds get faster?

    The first question is not a concern for us internally, but the second one is probably one of the toughest problems we've reached here at Codice. Can they be solved with version control?

    The author starts examining the first question: can broken builds be prevented? And if so, how? Well, he states something that really shocked me:

    Imagine if the only activity the developer needs to perform is to “commit” her code to the version control system. Before the repository accepts the code, it runs an integration build on a separate machine.
    Only if the integration build is successful will it commit the code to the repository. This can significantly reduce broken integration builds and may reduce the need to perform manual integration builds.


    Then he draws a nice graphic representing an "automated queued integration chain". He introduces something like a "two-phase" commit, so the code doesn't reach the mainline until the tests pass...



    I don't know if I'm missing something because I find a too obvious answer, something all plastic users know by heart now... commit to a separate branch, rebase it from the mainline, run the tests, and only merge up (which would be a "copy up") if the tests pass... Branching is the answer, isn't it?

    I mean, I couldn't understand such a "futuristic" set up with a two-phase commit scenario, if this is precisely what you already have with systems with good branch support.

    I understand when he states "the only activity the developer needs to perform is to "commit"", his problem is not actually checkin the changes in, but being able to have a place where the code can reside in some sort of intermediate status and then, while the tests pass, the developer can continue working.

    Again, I must be missing something here, because otherwise I only see one reason to find it a "future improvement": the author is always thinking on "mainline development" (you know, only working with the main branch, or just a few more at most, and directly checking in changes into this mainline). Because if you're used to patterns like "branch per task", then you don't have this problem anymore. You're used to deliver your changes to the version control system and continue working on something else without ever breaking the mainline.



    He continues with:

    An alternative approach to preventing broken builds is to provide the capability for a developer to run an integration build using the integration build machine and his local changes (that haven’t been committed to the version control repository) along with any other changes committed to the version control repository.


    Of course it is! That's why branch per task is a better alternative than mainline development for almost every development scenario I've been involved into!

    The problem behind all this statements has a name: the most well-known version control tools out there (including glorified Subversion, which is the tool the book focuses on) have (did I say have? I wanted to say have) big problems dealing with branches. They don't always fail creating a big number of branches (which is what every SVN or CVS user tells me whenever I mention plastic can handle thousands of branches... "mine too" they say), the problem is handling them after a few months (on the "test day" everything works great, doesn't it?), merging them, checking what has been modified on the branch, tracking branch evolution, and so on. And, believe it or not (and that's why we wrote plastic in the first place!) all of these well-known-widely-available-sometimes-for-free tools, lack proper visualization methods, proper merge tools (ok, there're third party ones sometimes) and sometimes even basic features to deal with branches like true renaming and merge tracking.

    I guess that's the reason why after 200 pages of decent reading, I've found such an obvious chapter, describing as a "future innovation" some well-known and widely used SCM best practices. I'd rather recommend going to the now classic Software Configuration Management Patterns, which I still found the best SCM book ever written.


    The question about how to speed up test execution remained unsolved...

  • March 25, 2008
    » Branch explorer tour

    It's very likely you've drawn a diagram like this one at least once... (and I'll bet probably you do it very often).



    It is just a tipycal branch diagram, showing the relationships between branches, merges and labels (changesets are not normally displayed when you draw the diagram manually :-P). It clearly shows when a merge happened (from a project point of view) or when a label (and hence a baseline) was applied.

    The folks at Microsoft's Team System group are dreaming about releasing something similar in the future.


    The good news is that Plastic already have it! :-)

    The branch explorer released with Plastic 2.0 is all about representing branch evolution... the kind of stuff you'd usually draw on paper... but now rendered at your screen.



    So, what I'll be showing is a quick tour through the branch explorer, and how an entire branching and merging cycle can run from it.

    Watch the entire tour here!

    March 24, 2008
    » Custom file types

    Plastic SCM handles two different file types: binaries and text files. By default it tries to identify a newly added file using an internal algorithm and a built-in list of known extensions. But sometimes a file that should be binary is identified as text or viceversa.

    When a file type is interpreted by Plastic SCM as binary, it is not possible to show the differences in text mode. This is not a problem because it is possible to change the type of a revision. When you change the type of a revision, future revisions of the item, will be of the same type.

    But now, with Plastic SCM 2.0, users can associate file extensions to file types.

    For example, if you want to specify that the files with the extension .cpx are text revisions, you must add a line to "filetypes.conf" file which is placed together with your client.conf file (if you're running on Windows chances are it is located in the directory C:\Documents and Settings\Local Settings\Program Data\plastic

    For linux users the file is located in /home/XXX/.local/shared/plastic

    This file will look like:

      
    # PlasticSCM custom file types.
    # Syntax: <extension>:<type>
    # Type can be 'txt' or 'bin'.
    # Examples:
    # .cpp:txt
    # .jpg:bin
    .cpx txt

    » Going distributed with Plastic

    One of the most important new features in Plastic 2.0 is the distributed system. Today I'm going to talk about the set up I'm using to work from my laptop, disconnected from any network, and how I use the distributed system to synchronize changes back and forth.

    That's exactly what I find great from the distributed system: it is not only useful for big development teams working at several geographical locations, it can also help developers working at their laptops... even if they belong to really small teams!

    First have a look at the following figure. It is a deployment diagram of the different servers we're using here at the office. Of course all these repositories could be on a single server, but we've set up a multi-server scenario to test Plastic capabilities on a daily basis (remember eat your own's dog food).




    As you can see on the picture we've three repository servers, two of them running on Linux/Mono and the third one on Windows. The windows server (mordor) is also playing the workspace server role, so it is the one we have configured as workspace server at our computers.
    So my typical workspace selector usually looks like the following:


    rep "cmuser" mount "/06cmuser"
    path "/"
    label "BL091"

    rep "importers" mount "/05importers"
    path "/"
    label "BL091"

    rep "licensetools" mount "/04licensetools"
    path "/"
    label "BL091"

    rep "pnunit" mount "/03pnunit"
    path "/"
    smartbranch "/main/SCM2937"

    rep "nervathirdparty" mount "/02nervathirdparty"
    path "/"
    label "BL091"

    rep "codice"
    path "/"
    smartbranch "/main/SCM3278"

    Yes, 6 repositories mounted on the same workspace!

    If you look carefully you'll notice a rule you're probably not used to... yes, the smartbranch selector rule... It is already working on Plastic 2.0, although not yet unveiled, I'll be writing more about it on a future topic, but now it is time to focus on the distributed system :-P

    When I work at home I usually connect to our office VPN, and then work with my laptop in the same way I do at the office. Nothing changes... but from time to time the network goes down... which is not a problem if you use Visual Studio (disconnected support) or you're just working on the same task (you can continue making changes, find the changed files when the connection comes back and check out everything). But what if you don't have internet connection and you want to create new branches or switch from one branch to another?

    Well, that's why I've installed a Plastic server on my laptop and I use the distributed system to continue working even when the network goes down.

    I've set up a server at my box (beardtongue) at port 6060. We're currently using user/password authentication mode, so I've set up the same identification mode on my server, although I could use a different one and replication will also work (check the distributed system manual).



    Once you've it set up, remember to make your client point to the right server. To do this you can manually edit your client.conf file (somewhere on documents & settings if you're a windows users) or run plastic --configure.

    The repositories I actually need to use on a daily basis are codice (where all the plastic code resides), thridparty (libraries and so on) and pnunit (our test system). So after setting up my server I've created three empty repositories named "codice", "thirdparty" and "pnunit" at my server:

    $ cm mkrep localhost:6060 codice
    $ cm mkrep localhost:6060 pnunit
    $ cm mkrep localhost:6060 thirdparty


    Once they're created, I've replicated the main branch from the three repositories into my local ones:

    $ cm replicate br:/main@rep:codice@venus:9090
    rep:codice@localhost:6060
    $ cm replicate br:/main@rep:pnunit@juno:9092
    rep:pnunit@localhost:6060
    $ cm replicate br:/main@rep:thirdparty@venus:9090
    rep:codice@localhost:6060

    In my case replicating the main branch takes some time the first time.
    The reason is it is actually copying all the revisions, changesets, labels and data from the main branch to my local rep, and we've more than 1Gb only on the main branch on all the reps.

    Once replication is finished I have the whole history of the main branches of the three repositories at my laptop server.

    There is another way to run replication and it involves using what we've called replication packages. Suppose for some reason my laptop can't access one of the servers, let's say venus. Then I could go to a machine with the right access and run

    $ cm replicate br:/main@rep:codice@venus:9090
    --package=replication.pack

    Which means replication will create a package with all the replication data for branch br:/main.

    Once it is finished, I could copy it on a usb-drive and move it to my laptop. Once there I can run

    $ cm replicate rep:codice@localhost:6060
    --import=replication.pack

    To import the data into my repository.

    So far I've set up a server on my laptop, and managed to have an entire copy of the main branches of the repositories I need to work on a daily basis.

    Then I'd create a workspace and set a selector like the following (of course things are much simpler if you only use a single repository):


    rep "pnunit" mount "/03pnunit"
    path "/"
    label "BL091"

    rep "nervathirdparty" mount "/02nervathirdparty"
    path "/"
    label "BL091"

    rep "codice"
    path "/"
    label "BL091"

    And set up a workspace to be able to compile and debug BL091.

    Ok, but the whole point of the distributed system is not only able to work in "read-mode" mode, but also make modifications.

    Suppose I have to work on task 4312. Then I'd create a branch issue-4312 (from the GUI or from the command line, depending on the way you like to work) and set the following selector



    rep "pnunit" mount "/03pnunit"
    path "/"
    label "BL091"

    rep "nervathirdparty" mount "/02nervathirdparty"
    path "/"
    label "BL091"

    rep "codice"
    path "/"
    branch "/main/issue-4312" label "BL091" co "/main/issue-4312"


    I could now do as many check out and check in cycles as needed, and they will all go into my laptop's server.

    At a certain point in time I go back to the office and want to "push" the branch into the real server. In this case I've created a branch for repository codice, which resides on venus:9090, so I'd do something like:

    $ cm replicate
    br:/main/issue-4312@rep:codice@beardtongue:6060
    rep:codice@venus:9090

    Which will replicate all the changes from the branch 4312 created on my laptop into the team's main server.
    Of course I could also update my copies of the main branches running again the replicate commands from the real team's servers, which will now be much faster than the first time.

    But, what if I modify the main branch at my laptop? Well, it is not a problem, it will entirely depend on the way you want to work, whether you want everyone to modify the main branch or not is up to you. If you do so, what if someone modifies the same revision on the main server meanwhile? Well, whether you replicate from your laptop to the server or just the opposite, Plastic will detect a revision has been modified at both sides. It will create a "fetch-branch" containing the files and folders modified in parallel, and you'll be able to synchronize changes back using a simple branch merge operation...

    Which one is my preferred way of working? Well, it will depend on the scenario. For "roaming developers" I prefer them to work on separate branches (we do it all the time using the branch per task pattern anyway) and use the main branch as read-only. Then they push changes to the server and get the main branch updated when there's a new release.

    So, this is just an easy way to get the benefits of the new Plastic distributed system: use it to code from a laptop even when you don't have internet access to your server.

    What's next? Well, as you might notice :-P all the distributed system functionality is available only through the command line, so expect an update to Plastic 2.0 as soon as the GUI folks around here wrap it with some nice and easy to use windows...

    » How to configure MySQL backend in Plastic SCM 2.0

    It's very simple to set up, you only need to create (or edit) a file named 'db.conf' at the server installation directory.

    Its content must be like the following:


    <dbconfig>

    <providername>mysql</providername>

    <connectionstring>Server=_SERVER_;User
    ID=_USER_;Password=_PASSWORD_;Database={0};Pooling=true</connectionstring>

    <databasepath></databasepath>

    </dbconfig>

    replacing the parameters _SERVER_, _USER_ and _PASSWORD_ with the appropiate ones according to the server configuration that you want to use. Thus, a valid 'db.conf' file in our development environment would be:


    <dbconfig>

    <providername>mysql</providername>

    <connectionstring>Server=venus;User
    ID=myuser;Password=mypwd;Database={0};Pooling=true</connectionstring>

    <databasepath></databasepath>

    </dbconfig>

    Finally, we must set the mysql configuration parameter max_allowed_packet to support up to 10MB. If you require more information about how configure this parameter, you can take a look at this article.

    » Logging to the Event Log

    By default Plastic server logs both activity and errors to a file (loader.log.txt) located at the server's installation directory.

    Since Plastic uses log4net as logging mechanism, it is very flexible to customize logging messages and output.

    The following script shows how to configure the log (editing loader.log.conf at server's directory) to output errors to the Windows Event Log.


    <log4net>

    <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
    <layout type="log4net.Layout.PatternLayout">
    <conversionPattern value="%message%newline" />
    </layout>
    </appender>

    <appender name="FileAppender" type="log4net.Appender.FileAppender">
    <file value="loader.log.txt" />
    <appendToFile value="true" />
    <layout type="log4net.Layout.PatternLayout">
    <conversionPattern value="%message%newline" />
    </layout>
    </appender>

    <appender name="EventLogAppender" type="log4net.Appender.EventLogAppender" >
    <threshold value="ERROR" />
    <applicationName value="Plastic Server" />
    <layout type="log4net.Layout.PatternLayout">
    <conversionPattern value="%message%newline" />
    </layout>
    </appender>


    <logger name="UpdatePerf">