A Django site.
July 3, 2008
» July '08 DevDiv TFS Dogfood Statistics

I missed the June Dogfood statistics - sorry about that.  This report represents the change since my last dogfood report in May.  The big thing that you will observe is that downloads have dropped dramatically (from a peak around ~150,000,000 to ~50,000,000).  The reason for this is that we installed a TFS proxy on our corp net and had the majority of users configure their clients to use it.  The proxy is 2 machines configured behind an NLB load balancer.  The reason we had to make this configuration change was that during peak hours, downloads were reaching over 1,000 downloads per second.  The server simply couldn't service that many requests and the request queue would fill up and start returning "server unavailable" errors to the clients.  Adding a proxy allowed us to offload the download volume and keep the request queue from overflowing.  We used an NLB proxy "cluster" to avoid having the same request queue overflow problem on the proxy.

The other "big event" in the past month was a move of our server from our data center in Tukwila to our new data center in Quincy, WA.  I wish I could say that went smoothly.  Transferring over 8 terrabytes of data several hundred miles and building out new server infrastructure for a mission critical server is a daunting task.  We hit quite a few bumps along the way and my back side is still sore from the beatings (admittedly deservedly) I took over it.  Fortunately, we shouldn't have to do such a thing again soon.  One of my big learnings from the process though was that we need a better way to simulate our production environment in a non-production test environment.  We really needed to test all of the configuration changes we were making on a reasonably accurate simulation before trying it on the live environment.  The problem is that, to clone the hardware, it would cost close to $250K - mostly in the cost of the SAN.  Anyway, we've embarked on a process of creating such a test environment (even if it doesn't match the hardware exactly).  Hopefully this will smooth any further large scale deployment changes we make down the road.  Preventing almost 2,500 people from getting their work done is not a recipe for a long and healthy career :)

 

image image

image

Users

  • Recent users: 2,409 (up 451)
  • Users with assigned work items: 4,293 (up 567)
  • Version control users: 4,345 (up 525)

Work Items

  • Work Items: 446,048 (up 33,117)
  • Areas & Iterations: 10,536 (up 452)
  • Work item versions: 3,713,236 (up 257,532)
  • Attached files: 258,580 (up 14,454)
  • Queries: 27,944 (up 2,291)

Version control

  • Files/Folders: 312,965,192/75,535,960 (up 43,182,945/up 11,300,347)
  • Total compressed file size: 2,607,236 MB (up 389,270 MB)
  • Checkins: 484,546 (up 48,613)
  • Shelvesets: 40,028 (up 8,611)
  • Merge history: 756,402,342 (up 104,599,145)
  • Pending changes: 39,586,207 (up 15,367,172)
  • Workspaces: 11,415 (up 2,494)
  • Local copies: 2,948,671,753 (was 2,214,366,807)

Builds

  • Builds: 6,524 (up 369)

Commands (last 7 days)

  • Work Item queries: 564,970 (up 106,437)
  • Work Item updates: 29,854 (down 8,574)
  • Work Item opens: 156,578 (down 56,465)
  • Gets: 792,700 (up 470,385)
  • Downloads: 52,063,240 (down 44,994,791)
  • Checkins: 7,247 (up 515)
  • Uploads: 159,837 (up 49,135)
  • Shelves: 4,238 (up 1,134)

Brian

May 12, 2008
» May '08 DevDiv TFS Dogfood Statistics

Today seems to be blogging day.  Here's post #3 of 4 or 5 that are coming today.  Sorry for the deluge but it's been a couple of weeks since I blogged.

I think we are finally nearing the end of the full scale roll out of TFS to the Developer Division.  Almost everyone working on the next version of VS/.NET are now on TFS.  There's some other projects that have not switched yet but I expect most will before too long.  The biggest sign of this continued growth is in # of Recent users (up 145).  That's 145 more regular users this month than last month.

The other number that staggering (at least to me) is the # of local copies.  There are over 2.2 BILLION rows in that table.  Wow!  That's a lot of data.  Last I checked, the DevDev TFS database had gotten to around 8 terra-bytes.

I've started including a new section in this report for Builds.  Lots of teams are now using TFS for continuous integration and other buddy build systems, making the numbers start to seem significant.

Here's the graphs to show trends.  A big part of why downloads are not continuing to grow is increased usage of proxies.  We have found that too many downloads can overwhelm the application tier and block other operations.  We started to hit those problems at around 100,000,000 downloads a week (but only during peak hours).  Soon we will be deploying a change that allows us to "force" clients to use a proxy.  This is a server setting that causes the client to use it Active Directory location to select the appropriate proxy.  At that point, the downloads will drop dramatically.

image

image

image

Users

  • Recent users: 1,958 (up 145)
  • Users with assigned work items: 3,726 (up 91)
  • Version control users: 3,820 (up 140)

Work Items

  • Work Items: 412,931 (up 24,819)
  • Areas & Iterations: 10,084 (up 379)
  • Work item versions: 3,455,704 (up 173,764)
  • Attached files: 244,126 (up 11,853)
  • Queries: 25,653 (up 1,733)

Version control

  • Files: 269,782,247 (up 32,956,473)
  • Folders: 64,235,613 (up 7,849,640)
  • Total compressed file size: 2,217,966 MB (up 127,435 MB)
  • Checkins: 435,933 (up 24,762)
  • Shelvesets: 31,417 (up 3,424)
  • Merge history: 651,803,197 (up 78,266,752)
  • Pending changes: 24,219,035 (up 7,381,865)
  • Workspaces: 8,921 (up 346)
  • Local copies: 2,214,366,807 (was 2,004,549,728)

Builds

  • Builds: 6,155 (up 532)

Commands (last 7 days)

  • Work Item queries: 458,533 (up 81,627)
  • Work Item updates: 38,428 (up 19,384)
  • Work Item opens: 213,043 (up 104,217)
  • Gets: 322,315 (up 73,578)
  • Downloads: 97,058,031 (down 21,524,757)
  • Checkins: 6,732 (up 1,014)
  • Uploads: 110,702 (down 1,614)
  • Shelves: 3,104 (down 3)

Brian

April 15, 2008
» April '08 DevDiv TFS Dogfood Statistics

Due to my sabbatical, I missed the March Dogfood statistics.  In my absence, adoption has continued apace.  The team has been very busy making sure the server is behaving well and applying fixes when it is not.

Looking at the graphs below, you can see that several of the statistics have really started to grow at dramatically higher rates in recent months - File downloads, Files, Workspaces.  In fact, the only reason you see file downloads decreasing is that we continue to move more high load users (like the build lab, checkin validation, etc) to use proxies for downloads rather than the main server.

We continue to drive improvements to handle the additional load.  Unfortunately, we've had to cut off their incorporation into TFS 2008 SP1.  Many many of the improvements made it, but at some point we had to draw a line and we did that a month or two ago.  The additional improvements will, of course, make it into the following TFS release.  However, I believe you'll find (if you also have a very large server) that the improvements that we've included in SP1 will result in some very nice performance improvements for you.

image

image

image

Here are the detailed numbers.  If you look at them closely, you will find that they don't match the difference between what I reported last time and this time.  That's because (although I didn't report it), I actually did take a snapshot a month or so ago and these differences are against that snapshot.

Users

  • Recent users: 1,813 (up 116)
  • Users with assigned work items: 3,635 (up 123)
  • Version control users: 3,680 (up 226)

Work Items

  • Work Items: 388,112 (up 20,245)
  • Areas & Iterations: 9,705 (up 216)
  • Work item versions: 3,281,940 (up 185,805)
  • Attached files: 232,273 (up 10,325)
  • Queries: 23,920 (up 1,287)

Version control

  • Files: 236,825,774 (up 30,000,636)
  • Folders: 56,385,973 (up 6,807,315)
  • Total compressed file size: 2,090,531 MB (up 163,906 MB)
  • Checkins: 411,171 (up 36,479)
  • Shelvesets: 27,993 (up 5,320)
  • Merge history: 573,536,445 (up 73,745,308)
  • Pending changes: 16,837,170 (up 6,244,260)
  • Workspaces: 8,575 (up 1,595)
  • Local copies: 2,004,549,728 (up 539,146,465)

Commands (last 7 days)

  • Work Item queries: 376,906 (up 133,484)
  • Work Item updates: 19,044 (down 8,035)
  • Work Item opens: 108,826 (up 46,398)
  • Gets: 248,737 (down 412,995)
  • Downloads: 118,582,788 (up 16,758,498)
  • Checkins: 5,718 (up 489)
  • Uploads: 112,316 (down 54,245)
  • Shelves: 3,107 (up 962)

Thanks,

Brian

February 9, 2008
» Feb '08 DevDiv TFS Dogfood Statistics

If you follow my dogfood statistics, you’ll notice that some of the numbers at the bottom are up quite a bit.  The +200 recent user bump represents the increase in usage as we approach the completion of the roll out of TFS to the entire division.  At this point I'm expecting the roll out to be done within 4 - 6 weeks.  There’s been a corresponding increase in Local copies.  We’ve also seen significant jumps in merge history as we do more and more of our large tree merges in TFS.  The biggest jump is in downloads (almost 2X).  This is due to the build lab ramping up.  Yesterday, they rolled out config changes to download all files from a TFS Proxy and we should see that number fall substantially in next month’s report.

We continue to refine the perf and scale of TFS.  I think not a week goes by these days when we don't push some update (or several) onto the production server to address our latest perf and scale issues.  Most of the time it runs quite well but every so often just the right combination of huge (~million file) operations hit a problem and cause the overall server responsiveness to suffer.  We're making good progress but it's no consolation when you are staring at the screen waiting for your one file to checkin just because someone else is checking in 400,000 deletes or some such.  All of the fixes we make continue to flow into our TFS 2008 SP1 work so you should be able to get them later this year.

Unlike TFS 2005 SP1 and TFS 2008 RTM, I don't think the average user out there will see significant improvements from the additional performance work we're doing in TFS 2008 SP1.  Perhaps some small improvements but, at this point, most of the fixes we make are primarily address contention and/or I/O issues in really large scale operations.  I know some of our customers do this but the majority do not.

Here's the statistics for this month...

image

image

image

Users

  • Recent users: 1,442 (up 200)
  • Users with assigned work items: 3,434 (up 64)
  • Version control users: 3,289 (up 174)

Work Items

  • Work Items: 357,964 (up 12,459)
  • Areas & Iterations: 9,006 (up 722)
  • Work item versions: 3,013,626 (up 104,498)
  • Attached files: 216,046 (up 7,082)
  • Queries: 22,080 (up 660)

Version control

  • Files/Folders: 189,532,166/44,405,661 (up 21,960,278/up 5,009,562)
  • Total compressed file size: 1,850,092 MB (up 127,273 MB)
  • Checkins: 359,635 (up 20,577)
  • Shelvesets: 20,471 (up 645)
  • Merge history: 460,256,443 (up 49,113,668)
  • Pending changes: 10,989,165 (up 4,017,996)
  • Workspaces: 6,169 (up 153)
  • Local copies: 1,268,350,414 (up 233,085,191)

Commands (last 7 days)

  • Work Item queries: 201,039 (up 77,416)
  • Work Item updates: 24,648 (up 8,949)
  • Work Item opens: 57,411 (up 13,488)
  • Gets: 501,590 (up 12,165)
  • Downloads: 145,120,659 (up 70,546,272)
  • Checkins: 4,417 (up 201)
  • Uploads: 204,064 (up 106,420)
  • Shelves: 2,152 (up 796)

Brian

December 6, 2007
» December '07 DevDiv Dogfood Statistics

The massive spike that I've been foreboding for a long while now has started.  In the last month the momentum towards moving the entire division over to TFS has really picked up.  A significant fraction of the branches for the development of the next version of Visual Studio/.NET Framework have been created.  Overall, I expect this ramp up phase will last another 2-3 months - right now a lot of planning is happening; development is slowly ramping up.

Demonstrating this change, you can see an increase of over 32 million in the number of files and 154 million in number of local copies.

Just this week, we are doing training for all of the people in the division who have not yet started using TFS.  I expect the number of recent users will grow every month for the next few months.

Other preparations continue as well.  One of our biggest challenges has been getting the central build lab moved over - both due to the number of tools/scripts and due to the load they put on the system.  Right now we're working on getting their nightly sync times down so that builds complete in a reasonable amount of time.

We've also been struggling with some out of memory problems on the server.  I don't think we thoroughly understand the problem yet.  However, we've learned a few things.  The version control file cache on the server has gotten to 5 million files and the algorithm to manage it has gotten to be slow and very memory intensive.  I think we are going to need to move to an algorithm does not require scanning the file system for age to manage the cache size.

Overall, it still seems to be going reasonably well.  Here are the chart and detailed statistics...

image

image

image

 

Users

  • Recent users: 1,160 (up 63)
  • Users with assigned work items: 3,252 (up 65)
  • Version control users: 2,991 (up 87)

Work Items

  • Work Items: 305,958 (up 7,348)
  • Areas & Iterations: 7,921 (up 115)
  • Work item versions: 2,606,046 (up 62,686)
  • Attached files: 120,835 (up 4,300)
  • Queries: 21,026 (up 484)

Version control

  • Files/Folders: 148,258,991/34,914,899 (up 32,180,349/up 6,996,437)
  • Total compressed file size: 1,639,701 MB (up 139,400 MB)
  • Checkins: 320,961 (up 15,718)
  • Shelvesets: 18,165 (up 1,620)
  • Merge history: 364,731,019 (up 90,041,592)
  • Pending changes: 5,393,525 (down 3,689,144)
  • Workspaces: 5,275 (up 463)
  • Local copies: 862,271,941 (up 154,381,520)

Commands (last 7 days)

  • Work Item queries: 126,851 (down 59,525)
  • Work Item updates: 18,102 (up 3,348)
  • Work Item opens: 41,178 (up 3,244)
  • Gets: 550,374 (up 345,131)
  • Downloads: 67,865,017 (up 25,596,147)
  • Checkins: 11,305 (up 6,156)
  • Uploads: 953,711 (up 149,148)
  • Shelves: 1,544 (up 46)

Brian

November 6, 2007
» November '07 DevDiv Dogfood Statistics

It's been a while since I wrote about the DevDiv TFS statistics.  Sorry about that, I guess it's just been a really busy summer.  Usage continues to climb steadily and we are just now beginning the rollout to the rest of DevDiv.  The next version of VS/.NET will be built entirely using TFS - no more usage of the older internal tools.  It's been exciting and challenging getting ready for that.

The numbers you'll see below (while much larger than what I published in Aug) are actually quite a bit smaller than what they might have been.  We have been doing quite a lot of server clean up in preparation for the broader rollout - deleting old workspaces, destroying unused source branches, deleting old shelvesets, etc.  You'll see the effects in some of the graphs below.

We continue to make product improvements based on dogfooding.  The vast majority of them over the past 2 years have gone into TFS 2008.  We just recently stopped putting them in (because TFS 2008 is almost done) and have started queuing them for TFS 2008 SP1.  Just in the last week we've made a few nice improvements for working with really large trees.  As an example... I've mentioned before that the build lab gets all of the source (about 3 million files) onto about 75 different machines every night.  We found that the initial part of the get operation was taking about 230 seconds to compute what files were needed.  After profiling, we found a inefficiency in permission checking that enabled us to reduce that time to about 100 seconds - a nice improvement.  This particular one won't make a big difference to most people, most of the time but it's good to keep finding the bottle necks and removing them.

I expect many of the graphs to take big jumps in the next couple of months.

image

image

image

 

The deltas in these numbers are actually changes from about 1 month ago.

Users

  • Recent users: 1,097 (up 39)
  • Users with assigned work items: 3,187 (up 160)
  • Version control users: 2,904 (up 136)

Work Items

  • Work Items: 298,610 (up 15,401)
  • Areas & Iterations: 7,806 (up 105)
  • Work item versions: 2,543,360 (up 146,757)
  • Attached files: 116,535 (up 8,964)
  • Queries: 20,542 (up 840)

Version control

  • Files/ Folders: 116,078,642/27,918,462 (up 8,278,542/up 3,724,381)
  • Total compressed file size: 1,500 GB (up 163 GB)
  • Checkins: 305,243 (up 32,182)
  • Shelvesets: 16,545 (up 2,317)
  • Merge history: 274,689,427 (up 23,355,241)
  • Pending changes: 9,082,669 (up 2,766,941)
  • Workspaces: 4,812 (down 1,938)
  • Local copies: 707,890,421 (down 85,555,293)

Commands (last 7 days)

  • Work Item queries: 186,376 (down 29,118)
  • Work Item updates: 14,754 (down 8,414)
  • Work Item opens: 37,934 (down 18,617)
  • Gets: 205,243 (down 140,891)
  • Downloads: 42,268,870 (down 1,638,417)
  • Checkins: 5,149 (up 1,333)
  • Uploads: 804,563 (up 674,032)
  • Shelves: 1,498 (up 147)

Brian

September 15, 2007
» Update on Microsoft TFS adoption

Here's an update on overall TFS adoption at Microsoft.  In a bit I expect to blog the latest DevDiv server stats.

Overall adoption continues to grow rapidly.  We passed some notable milestones this month:

  • We have 1,160 Team Projects in production, passing the 1,000 mark
  • We topped 1 million work items (across all TFS instances) with 1,023,088

Adoption in Office continues strong.  Windows adoption is growing but we have hit some pain along the way.  They are adopting the same preliminary Rosario build that Office is using and have run into quite a number of installation issues with the Team Explorer client.  We've been working through those with them.  DevDiv continues preparations for a final and complete switch over to TFS - I'm expecting sometime by around the end of the year.

We now have 24 TFS servers in production.  3 are running a preliminary Rosario build.  17 are running TFS 2008 Beta 2 (see when we say "Go-live" we mean it :)).  And 4 are still running TFS 2005 SP1 but will be upgraded to Beta 2 soon.

Here are some interesting graphs to see overall trends...

 

Brian

August 7, 2007
» August DevDiv Dogfood Statistics

I think I missed posting dogfood stats last month - sorry about that.  It's just been so busy with all of the recent releases going on, it's been difficult to find time to do it.

The “big change” in the past couple of months is that the Visual Studio central build lab is really making progress in enabling support for TFS (rather than using the mirrored legacy system).  Some of the substantial increases in activity are due to the load they put on the system.  For example, 5.7 million of the 36 million downloads are from the build lab.  They’ve also been pushing the system in some ways that it had not previously been pushed.  This has led to a handful of bug fixes and performance improvements that have been patched on the dogfood server recently - of course, all of these improvements are being incorporated into TFS 2008.  I'm expecting the work to move the build lab over to be done within about a month or so.

Other than the build lab load, normal user load continues to stay moderately flat.  I’m expecting that will ramp up over the next few months as we prepare for the next version of Visual Studio, when we will shutdown the legacy system and everyone in the division will be using TFS.

Notable values…

  • We’ve finally reached 100 million files (actually over 120 million if you count folders too).
  • The LocalVersion table is at about 700 million rows.  The 1 billion mark isn’t too far away.
  • We passed 250,000 work items.
  • We recently passed 40 million requests per week for the first time.

 

And the detailed stats...

Users

  • Recent users: 1,015 (down 15)
  • Users with assigned work items: 3,073 (up 157)
  • Version control users: 2,649 (up 231)

Work Items

  • Work Items: 268,635 (up 23,239)
  • Areas & Iterations: 7,575 (up 72)
  • Work item versions: 2,249,745 (up 205,077)
  • Attached files: 98,989 (up 11,251)
  • Queries: 18,936 (up 1,065)

Version control

  • Files/Folders: 99,559,163/21,692,272 (up 15,613,488/up 3,850,160)
  • Total compressed file size: 1,170 GB (up 181 GB)
  • Checkins: 253,993 (up 22,629)
  • Shelvesets: 12,844 (up 1,745)
  • Merge history: 232,689,548 (up 38,096,682)
  • Pending changes: 3,934,204 (up 1,533,477)
  • Workspaces: 6,224 (up 719)
  • Local copies: 695,274,358 (up 90,790,224)

Commands (last 7 days)

  • Work Item queries: 318,036 (down 18,800)
  • Work Item updates: 31,651 (up 3,495)
  • Work Item opens: 225,279 (down 115,081)
  • Gets: 586,788 (up 521,611)
  • Downloads: 35,919,563 (up 16,712,610)
  • Checkins: 5,275 (up 3,011)
  • Uploads: 134,830 (down 225,997)
  • Shelves: 1,313 (up 381)

Brian

July 24, 2007
» Update on adoption of TFS at Microsoft

Our internal adoption team produced an update on our status this week and I thought I'd share the results with you.    We're now up to 21 TFS instances in production, hosting a total of 734 projects about 5,600 users.  2 of those instances are running an Orcas Beta 2 build.  3 are running a very early build of Rosario.  and the other 16 are running TFS 2005.  We plan on upgrading the remaining TFS 2005 instances to Orcas (TFS 2008) in August.

I know I've talked before about the Office's adoption of TFS for project management.  Their use has really taken off over the past few months.  I may or may not have mentioned that Windows has also decided to adopt TFS for project management/feature tracking.  They are just dipping their toes in right now but I expect that by the fall sometime they will be in full swing.  Here's a high level picture of how adoption compares in the major divisions at Microsoft.  It leaves out the many smaller groups who are also adopting.  Much to my chagrin the Microsoft IT organization has now passed DevDiv in number of users :).  Their databases/projects are a lot smaller but they are really driving adoption.  This fall I expect DevDiv adoption to pick up again (and possibly pass them).

 

As you can see active users continues to grow steadily.

and number of projects is growing a a good clip as well.

 

Brian

June 20, 2007
» June DevDiv Dogfood Statistics

Orcas Beta 2 has been deployed on our dogfood server for about a month now and has been running quite well with very few patches.  We've been focusing primarily on cleaning up the event log and making sure we fix any bugs generating event log entries, making appropriate eventlog entries clear and actionable and removing spurious ones.

I think I've mentioned before that increased adoption of TFS within DevDiv has been stalled for the past several months as the division has been totally focused on getting Orcas finished.  It's performance review time of the year at Microsoft and as part of that process Soma has put on his review commitments (and I on mine) that all future development in DevDiv will be based on TFS - we'll be essentially shutting down the internal tools we've used in parallel with TFS over the next 6 to 9 months.  This is an exciting step for me!

Also Office's adoption of TFS continues apace and we went live with a TFS server for Windows about a week ago.  They have built a custom process template and plan on using TFS for their project management in the next version of Windows.  They've only just started rolling it out so I'll talk more about their usage in a few months as they really get going with it.  The SQL division continues to use TFS and is planning on upgrading to the Orcas release in July.  Also, yesterday, I talked to the CodePlex team about upgrading to Orcas once Beta 2 ships to help address some of the scale issues they face as the number of projects they are managing grows rapidly.

Progress on Orcas Beta 2 is coming along well.  We are in what we call "ask mode".  That means that the # of fixes going in now is very low and every fix is being extremely carefully reviewed - 2 code reviews and at least 2 "triage" reviews to assess the severity of the issue and appropriateness of the fix.  We do this to reduce the chance of introducing a regression as we fix the last few significant issues before we ship the Beta.

Beta 2 is going to be a great release for us.  TFS will have a "go-live" license, so we'll support (and encourage) any one putting it in a production environment.  We will also support migrating the data in all Beta 2 installations forward to the final release and beyond.

On to the statistics...

The notable recent thresholds include:

  • We finally passed 100,000,000 files and folders
  • We passed 2,000,000 work item versions

Here's the recent trend data:

 

Users

  • Recent users: 1,030 (down 3)
  • Users with assigned work items: 2,916 (up 147)
  • Version control users: 2,418 (up 94)

Work Items

  • Work Items: 245,396 (up 19,046)
  • Areas & Iterations: 7,503 (up 109)
  • Work item versions: 2,044,668 (up 159,296)
  • Attached files: 87,738 (up 8,125)
  • Queries: 17,871 (up 709)

Version control

  • Files/Folders: 83,945,675/17,842,112 (up 2,700,770/up 606,822)
  • Total compressed file size: 989,643 MB (up 107,300 MB)
  • Checkins: 231,364 (up 12,847)
  • Shelvesets: 11,099 (up 904)
  • Merge history: 194,592,866 (up 7,183,846)
  • Pending changes: 2,400,727 (down 248,552)
  • Workspaces: 5,505 (up 327)
  • Local copies: 604,484,134 (up 38,314,931)

Commands (last 7 days)

  • Work Item queries: 336,836 (up 178,131)
  • Work Item updates: 28,156 (down 8,095)
  • Work Item opens: 340,360 (up 255,757)
  • Gets: 65,177 (down 4,694)
  • Downloads: 19,206,953 (down 4,818,821)
  • Checkins: 2,264 (down 1,093)
  • Uploads: 360,827 (up 193,973)
  • Shelves: 932 (down 212)

Brian

May 16, 2007
» May DevDiv Dogfood Statistics

It's been a pretty uneventful month on our Dogfood server.  I almost decided not to post about it but I figured what the heck.  The most exciting thing is that we have our Orcas Beta 2 dogfood server upgrade happening this weekend.  This should be an indication to you that we think our Orcas Beta 2 server is approaching ready.  There will be some more bug fixes before we are done with it but we are feeling pretty good about it.  This upgrade is also the first to use the revamped setup and customer ready upgrade experience that will be available in Orcas Beta 2.  It's been a good exercise because we've found some bugs in it.

Along with the upgrade to Orcas Beta 2, we are switching the TFS data tier to a clustered SQL Server configuration for reliability.  Also, because we had some problems in this config with SP1 and one way to make sure that never happens again is to use that config ourselves.

I'm not expecting any major changes in performance or stability but I'll let you know how it went in a week or so - once it's up there are we have used it enough to have useful info.

On the statistics front - overall, the division is heads down bug fixing and preparing for the Orcas Beta 2 release.  As a result overall activity/growth on the dogfood server is down a bit.  I suspect it will stay down for a few more months as that's just the phase of the product cycle we are in.  The notable stats from below are:

  • We are approaching 100,000,000 files on the server (about 98M now).
  • We are approaching 250,000 work items

Users

  • Recent users: 1,031 (down 16)
  • Users with assigned work items: 2,770 (up 27)
  • Version control users: 2,324 (up 75)

Work Items

  • Work Items: 226,340 (up 17,004)
  • Areas & Iterations: 7,394 (up 8)
  • Work item versions: 1,885,212 (up 162,804)
  • Attached files: 79,608 (up 6,493)
  • Queries: 17,161 (up 696)

Version control

  • Files/Folders: 81,244,653/17,235,264 (up 3,586,001/up 847,285)
  • Total compressed file size: 882,322 MB (up 102,236 MB)
  • Checkins: 218,507 (up 13,951)
  • Shelvesets: 10,186 (up 830)
  • Merge history: 187,408,524 (up 7,462,762)
  • Pending changes: 2,649,435 (up 24,067)
  • Workspaces: 5,197 (up 413)
  • Local copies: 568,033,369 (up 60,357,685)

Commands (last 7 days)

  • Work Item queries: 158,655 (down 372,453)
  • Work Item updates: 36,207 (down 180,070)
  • Work Item opens: 84,591 (down 189,131)
  • Gets: 69,890 (down 102,098)
  • Downloads: 24,024,219 (up 5,881,445)
  • Checkins: 3,356 (down 7,191)
  • Uploads: 166,944 (down 95,431)
  • Shelves: 1,151 (up 71)

Until next time...

Brian

April 18, 2007
» Dogfood I/O Analysis

I've been falling so far behind on everything I'm supposed to do I just can't stand it.  Today is my day to try to catch up on blogging.  I promised I'd follow up on the dogfood I/O analysis from the Orcas upgrade.  Well, I got the results a week or two ago and I just can't find anything particularly useful.  The problem is that the data we have from the various samples we've taken over the last year are all different enough that it's hard to compare apples to apples.  We haven't used the same methodology.  We've reconfigured the drives and repartitioned tables, etc.  To a first approximation I've given up trying to extract useful before and after I/O data.  Instead we are going to run another trace to see which sprocs now have the highest I/O demand and focus on just making those better and worry less about quantifying the improvement.

In case you care, here's some data we got from the last set of I/O analysis - at least you can see in the absolute what kind of I/O load we are seeing.  As I've said before, Version Control is really where all of our I/O load is, so that's where we focused.  Also, you'll note we've broken the Version Control tables across multiple volumes due to the high load.  Here's how the tables map to drives:

  • G: tbl_LocalVersion
  • J: all the remaining tables in version control.
  • K: tbl_Version

If you can extract useful info from this, please let me know :)  Once we get the break down by sproc, I'll share that.

Thanks,

Brian

» April DevDiv Dogfood Statistics

It’s now been over a month since the DevDiv dogfood server upgrade to Orcas bits.  We’ve continued to poke and prod at the server and make patches as we identify issues.  The rate has slowed down from that first hectic week of several patches a day to about 1 a week or so now.  Overall performance is much better and reliability has stayed very good (7 day availability is at 100% right now).  We’ve identified a bunch of great bugs for which the fixes are going into Beta 1.  Last weekend, we are did the “final” upgrade of the Office TFS server with the Rosario Internal Release bits for Office’s “go live” on Monday.  Just seems like the excitement never stops.

We passed a couple of milestones in the past month:

  • We passed the 200,000 work item point.  It’s not huge by historical standards but it’s pretty big and is a cool milestone.
  • The local version table passed the 500,000,000 row mark.  Yes, half a billion rows!  By any measure, that’s a lot of data :)

 

Current Statistics

Users

  • Recent users: 1,047 (down 85)
  • Users with assigned work items: 2,743 (up 172)
  • Version control users: 2,249 (up 98)

Work Items

  • Work Items: 209,336 (up 16,151)
  • Areas & Iterations: 7,386 (up 119)
  • Work item versions: 1,722,408 (up 141,465)
  • Attached files: 73,115 (up 6,330)
  • Queries: 16,465 (up 785)

Version control

  • Files/ Folders: 77,658,652/16,387,979 (up 5,100,133/up 1,788,298)
  • Total compressed file size: 780.1 GB (up 125.4 GB)
  • Checkins: 204,556 (up 19,673)
  • Shelvesets: 9,356 (up 978)
  • Merge history: 179,945,762 (up 10,961,392)
  • Pending changes: 2,625,368 (up 1,581,548)
  • Workspaces: 4,784 (up 435)
  • Local copies: 507,675,684 (up 66,868,508)

Commands (last 7 days)

  • Work Item queries: 531,108 (up 308,847)
  • Work Item updates: 216,277 (up 183,991)
  • Work Item opens: 273,722 (up 200,940)
  • Gets: 171,988 (down 322,075)
  • Downloads: 18,142,774 (down 5,166,454)
  • Checkins: 10,547 (up 7,032)
  • Uploads: 262,375 (down 56,531)
  • Shelves: 1,080 (down 42)

You'll notice that gets are way down this month.  That's because we finally fixed a tool someone wrote that was spamming the server with Get requests once every few seconds 24 hours a day, 7 days a week.  That brings me to an interesting observation about TFS.  It's kind of amazing to me how many internal tools people have written against TFS.  I think it comes from the power of having a completely open and public API.  Here's a list of the tools that I see have run against the dogfood server in the past week.  I've removed from the list everything that shipped in the box with TFS - so these are just tools that someone else has written.  Some of them are listed a couple of times due to different versions.  There's about 80 in the list sorted in decending order of load they put on the server.

Thanks,

Brian

March 20, 2007
» Orcas Dogfood Upgrade - I/O Statistics Delayed

We got some I/O statistics on Friday night.  Unfortunately it was not what I was looking for.  The version control drives were omitted from the collection and there was much less sproc analysis than I expected.  The ops team is going back to replan what analysis they are going to do and will try again.  I'll let you know when we have an ETA for the new analysis.

 Brian

March 16, 2007
» Orcas Dogfood Upgrade - CPU Utilization

I think we've got enough data now that we can put a stake in the ground about where we stand on CPU utilization improvements.  We've still got a bit more tuning and improvements to make but it's probably within 10% of where it will turn out.

We've made less progress investigating the regressions this week than I expected - too many other things going on.  Given that, I expect it will be another couple of weeks before we put it to bed.  That said, we did identify a significant issue in one of the usage patterns of QueryItems.  Although it was not a regression to start with, I expect it to go green once we apply the patch.  We have also fixed GetBuildUri.  It didn't show up in the last post because there were no occurances in the sample that I used to generate it but previous samplings showed a significant regression.  Some progress - but not as much as I'd hoped.

On to the CPU utilization...

Because no two time periods are quite the same, any comparison is a little like apples to oranges.  The technique I have used is to average the CPU utilization from the week before the dogfood upgrade and from this week.  I then took this week's CPU utilization and "normalized" it.  That means dividing it my the average # of requests per hour this week and multiplying by the average number of requests per hour in the earlier week.  This is the best attempt I can think of to make oranges look like apples.  So looking at this for the data tier (which as you will recall has always been our bottleneck), we get:

Effective CPU utilization this week:

20.85% * 134,454 / 180,020 = 15.57%

The previous week's average CPU utilization was 28.82%.

So comparing them:

15.57%/28.82% = 0.5404

In other words, overall Orcas uses about 46% less CPU cycles on the data tier to do the same amount of work as TFS 2005.  We're pretty psyched about that.

Doing the same analysis for the application tier yields an effective CPU utilization of 14.85% compared to 24.90%, meaning the application tier is about 40% more efficient.

You'll remember that in our configuration (and in our general recommendation) the application tier has half the number of cores that our data tier has (4 for the AT and 8 for the DT).  And still the AT CPU utilization is less than the DT CPU utilization.  I had been a bit worried that all of the improvements in DT efficiency would mean we needed to change our guidance and start recommending balanced AT/DT pairs but given what I see now, we are good to stick with our current guidance.

We are expecting to get the I/O analysis tonight so I'll write about that as soon as I can.  It may very well be mid next week before I get to it because I'm traveling to San Francisco to give a talk at SD West on Monday.

Thanks,

Brian

March 13, 2007
» March DevDiv Dogfood Statistics

Time for the monthly installment again :)  I've been swamped lately and not able to make progress on my "Managing Quality" series.  I'll get back to that shortly.

This is the first monthly dogfood statistics update since the upgrade of the DevDiv dogfood server on 3/3.  So, it’s going to be a bit longer than usual.

Monday was a bit hectic and we had one outage for a couple of hours late Wed afternoon.  Other than that, the upgrade seems to have gone relatively smoothly.  We’re still assessing the overall performance impact on the server.  Yesterday and today, the IT team is taking I/O traces so that we can do a before and after analysis of I/O load and patterns and see if there are any remaining issues to look at.  It’s going to be another week or two before we can say anything too definitive about changes in CPU utilization – load is random enough that it’s going to take that long to establish a clear pattern in the data.

But… there is quite a lot we can say.  Overall performance seems to be much better.  At the bottom you’ll find an update on before & after comparisons (with a more complete request list) of aggregate performance by request type.  We’re still investigating the degradations and a few that didn’t improve as much as we thought they would.  Generally we’re resolving around 3 of them per day.  I expect that we’ll be done in the next week or so.

Trends

Current Statistics

As you look at the size data below, you’ll notice that the number of files and folders is down.  This is because we did a bunch of data clean up and scrubbing in the upgrade process.  One of the things that jumps out at me in this data is the number of Get operations in the last week (705,682).  The highest I’ve ever seen before in a monthly report was about 250,000.  If we hadn’t done the upgrade, that may have created undo load on the server.  In the data at the bottom you’ll notice that Get is almost 10 times faster on average.

Users

  • Recent users: 1,127 (up 42)
  • Users with assigned work items: 2,566 (down 122)
  • Version control users: 2,145 (up 130)

Work Items

  • Work Items: 191,955 (up 17,617)
  • Areas & Iterations: 7,259 (up 155)
  • Work item versions: 1,574,453 (up 173,557)
  • Attached files: 66,472 (up 6,923)
  • Queries: 15,651 (up 1,181)

Version control

  • Files/Folders: 72,558,448/14,599,432 (down 4,461,032/1,414,759)
  • Total compressed file size: 654 GB (up 140 GB)
  • Checkins: 184,431 (up 18,683)
  • Shelvesets: 8,328 (up 733)
  • Merge history: 168,748,294 (up 15,376,432)
  • Pending changes: 1,367,233 (up 393,512)
  • Workspaces: 4,335 (up 251)
  • Local copies: 436,857,001 (up 66,472,972)

Builds & Tests

  • Builds: 2,374 (up 392)
  • Test runs: 1,655 (up 53)
  • Test results: 168,709 (up 2,801)

Commands (last 7 days)

  • Work Item queries: 223,579 (up 51,233)
  • Work Item updates: 32,260 (up 2,424)
  • Work Item opens: 72,209 (down 3,558)
  • Gets: 705,682 (up 610,570)
  • Downloads: 24,170,464 (up 4,109,528)
  • Checkins: 3,702 (down 383)
  • Uploads: 358,536 (up 162,532)
  • Shelves: 1,159 (up 56)

Before & After - Average Request Duration

In this chart, I show average request execution time before and after the upgrade.  The improvement column is (New – Old)/min(New,Old).  Red indicates a regression of 20% or more and green indicates an improvement of 20% or more.  There are 26 reds and 66 greens.  Some of the reds will stay red – either because we intentionally made the trade off to make one thing slower in order to make others faster (as is the case with UpdateWorkspace) or because the method is rarely used and the regression insignificant enough (in the absolute) that it isn’t worth the time.  Perhaps it's bad marketing to put the regressions first :)  But then, I'm a "glass half empty" kind of guy when it comes to performance and quality.

Conclusion

Over the next week or two I'll follow up with updates based on additional bug fixes, perf counter trends and I/O analysis.  Let me know if there's anything else you'd like to hear about.  All this goodness is coming your way pretty soon in the Orcas Beta 1 release.

Brian

March 9, 2007
» Dogfood Server Upgrade - End of Week 1

The first week of the Orcas Dogfood server upgrade will end today.  It's been a fantastic (if hectic) week.  After the initial spate of issues we hit Monday, it quieted down pretty quickly.  We got the significant issues fixed on Tuesday and have been making small performance patches all week.  We're down to the last half dozen or so issues to investigate and will finish that up at lower priority over the next few weeks.

Before & After

We now have enough data to start to do some semi-meaningful before and after comparisons.  I picked a set of server requests to compare.  I chose them based on a few criteria:

  • Aggregate cost of the request (over the period of several days) is in the top 10
  • The average execution time of the request is in the top 10
  • The numbers looked suspicious to me in some way :)

Given that, here's some comparative results.  These results show average duration of the request during the month of February compared to the average time since the upgrade (with some tinkering to account for patches that we've made).

As you can see there are some very healthy improvements.  The most concerning regressions are Upload and Download.  We will be investigating those shortly.  We have been inclined to disbelieve those results as we really didn't change that code in Orcas but I think we have enough data now to show that something is afoot.  We believe ReadIdentityFromSource is suffering from some ActiveDirectory latency issues but we don't know for sure yet.  ReadIdentity is showing a huge regression in multiples but pretty small in absolute value.  It's going to require some poking around to understand.

SQL CPU Utilization

We expected to see a substantial reduction in CPU utilization based on the changes we've made but we haven't.  The standard deviation has gotten much less (with no more large spikes) but the average doesn't seem to have gone down much.  We need a bit more trend data and need to do some investigation.  I expect we'll learn more about this over the next couple of weeks.

SQL I/O

We're going to be starting our detailed I/O analysis in the next couple of days (now that most of the biggest perf issues have been investigated and addressed).  I'll share that with you next week.  However, I've done some preliminary looking at the I/O perf counters on the data tier and the results are interesting.  I'm seeing a dramatic drop in reads on both the data drive and the TempDB drive (2X or more).  However, I'm seeing increases in writes to both.  The increases in writes to the data drive are small and the increases for the TempDB drive are modest.  I think we'll know a lot more after the detailed analysis.

Conclusion

Overall, things are going really well and I'm psyched about it.  It's been a lot of fun the last few days hammering out all of the issues that are hard to find outside a high-scale production environment.  I'm planning on producing my March dogfood statistics next Tue or Wed, so keep your eyes open for that.

Until next time,

Brian

March 6, 2007
» DevDiv Dogfood Status Update

The deployment of Orcas to the DevDiv dogfood server finished ahead of schedule this weekend.  We completed the procedure by Saturday night thanks to hard work from the IT team and key people on the TFS team assisting.  We spent Sunday shaking out a couple of issues with the system and went live for real early Monday.  As of right now, 742 people have already used the server since the upgrade.

Yesterday was a bit of a tumultuous day.  As lots of people started using the system, we found a variety of issues.  About a 3rd of them were problems with the upgrade process - steps we forgot to include, data that didn't get updated, etc. and about 2/3rds were bugs and/or performance issues.  There's just nothing quite like getting that many people beating on software all at once.  As of this morning all but a few of the issues have been resolved.

Other than a couple of bad query plans we have tracked down, we've heard some positive feedback about the performance improvements.  Over the next week or so we are going to need to update tools, workspaces, etc to take advantage of some of the new capabilities and do some further tuning.  I expect we'll hit a few new issues today but I'm hopeful we are on the downward slope of the problem pile.  The good news (for you) is that all of these fixes will go into our Beta 1 release so it's just that many fewer issues you have to deal with :)

I'll do a more formal report in about a week but I wanted to give you a status update.

Brian

March 3, 2007
» Some Cool Posts

I've seen some cool posts on our product over the past few days that I wanted to bring to your attention.

I also wanted to mention that this weekend is a big weekend for us.  This is the weekend that we upgrade the DevDiv TFS server to an Orcas build.  We've spent the last month preparing for it - testing, perf testing, load testing, stress testing, running trial upgrades, ad-hoc testing of upgraded copies of the production data, and more...  It's taken a lot of prep work and I'm holding my breath waiting to see how it goes.  The database is over 2 terrabytes now so just about anything takes quite a while to do (backups, for example, are about 7 hours now).

Once the system is back up and running and we've been able to collect a solid weeks worth of data on the operation of the server, I'll publish my March Dogfood statistics update and talk about the changes that we see.  I'm really excited about it and can't wait to try out the upgraded server.  This is a key step for our team in our progression towards our Beta 1 release.  We want to make sure we get any fixes we find in our internal production environment in before we ship Beta 1.

Until next week...

Brian

February 8, 2007
» February DevDiv Dogfood Statistics

Overall, it's been a pretty uneventful month for the dogfood server.  The biggest milestone is that we are now consistently over 1,000 active users on the system.  Files & Folders are approaching the 100M mark and should hit it within the next few months for sure.

I've added a new statistic this month - Merge history.  That's how many individual file merges have been performed.  At 153 million, you can see we do quite a lot of merging :)

In the next few weeks we will be upgrading the DevDiv server to an Orcas build.  This is the first time and marks a pretty big milestone for us.  The team has been working hard over the past couple of weeks getting ready for it - testing builds, preparing upgrade scripts, doing pre-production runs, bug bashing, etc.  I'm really excited and eager to see this happen soon.

 

Here's the graphs and stats:

Users

  • Recent users: 1,085 (up 105)
  • Users with assigned work items: 2,688 (up 112)
  • Version control users: 2,015 (up 140)

Work Items

  • Work Items: 174,338 (up 15,679)
  • Areas & Iterations: 7,104 (up 30)
  • Work item versions: 1,400,896 (up 117,983)
  • Attached files: 59,549 (up 6,028)
  • Queries: 14,470 (up 922)

Version control

  • Files/Folder: 77,019,480/16,014,191 (up 2,918,367/up 600,564)
  • Total compressed file size: 514 GB (up 86.3 GB)
  • Checkins: 165,748 (up 12,395)
  • Shelvesets: 7,595 (up 619)
  • Merge history: 153,371,862 (new statistic)
  • Pending changes: 973,721 (up 57,280)
  • Workspaces: 4,084 (up 363)
  • Local copies: 370,384,029 (up 46,810,898)

Commands (last 7 days)

  • Work Item queries: 172,346 (up 22,408)
  • Work Item updates: 29,836 (up 5,791)
  • Work Item opens: 75,767 (up 3,782)
  • Gets: 95,112 (up 34,483)
  • Downloads: 20,060,936 (down 1,114,730)
  • Checkins: 4,085 (up 1,467)
  • Uploads: 196,004 (up 53,086)
  • Shelves: 1,103 (up 174)