Slashdot Mirror


Microsoft Introduces GVFS (Git Virtual File System) (microsoft.com)

Saeed Noursalehi, principal program manager at Microsoft, writes on a blog post: We've been working hard on a solution that allows the Git client to scale to repos of any size. Today, we're introducing GVFS (Git Virtual File System), which virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored. And because we do this all at the file system level, your IDEs and build tools don't need to change at all! In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files. With GVFS, this means that they now have a Git experience that is much more manageable: clone now takes a few minutes instead of 12+ hours, checkout takes 30 seconds instead of 2-3 hours, and status takes 4-5 seconds instead of 10 minutes. And we're working on making those numbers even better.

5 of 213 comments (clear)

  1. Did they just turn git into svn? by lucasnate1 · · Score: 5, Insightful

    The whole point of git is that you have identical copy on your machine. Why take away git's biggest advantage?

    1. Re: Did they just turn git into svn? by tangent · · Score: 5, Interesting

      > Why take away git's biggest advantage?

      Because "clone now takes a few minutes instead of 12+ hours, checkout takes 30 seconds instead of 2-3 hours, and status takes 4-5 seconds instead of 10 minutes."

      That is problem is not unique to Git. JÃrg Sonnenberger tried importing the NetBSD repository into Fossil, and "the rebuild step which (re)creates the internal meta data cache took 10h on a fast machine." There are ways to make Fossil skip the rebuild on clone, which results in a suboptimal DB, but it still takes hours to clone. NetBSD's project history goes back something like a quarter century; it's going to take time to pull and organize all that.

      DVCSes are great when you can afford their associated costs â" namely, the very advantages you refer to â" but for very large repos, those costs can be very high.

      Do you really need every single version going back a quarter century? And if you do, do you need it 5 minutes after the initial clone?

      One idea that's come up on the Fossil mailing list is to do a shallow clone initially, then trickle the back history in over time. I'd like a DVCS that gave me the past 30 days of history at the tip of every open branch, then over the next day or so back-filled the rest.

  2. Re:Meh... by Transcendent · · Score: 5, Interesting

    Microsoft's repos *are* that large. That's why they implemented this.

    Microsoft Office's repository is over 1 TB in size. Yes, terabyte. For *office*. They absolutely cannot (could not, I suppose now) use Git on it.

  3. Re:Ah nostalgia by Anonymous Coward · · Score: 5, Informative

    In practice what it meant was the source code was constantly changing under your feet, and binaries were constantly stale or in a mystery state because you didn't know what they were compiled against.

    Then you had a piss-poor release engineer who didn't understand how to construct config specs based on a stable baseline, label & promote stable builds regularly, and use clearmake properly, or manage dependencies and allow you to do a clean, fast local build.

    I love git, and I work with it daily, and the monorepo craze baffles the shit out of me, to be honest. But I used and supported ClearCase for 14 years at a large financial services company, and I can assure you that the problems you're complaining about are not limitations of the tool - they are limitations of your team's release engineers. ClearCase has many failings, but the issues you're describing simply reflect poor implementation and design choices.

    And because this was IBM software it was unusably slow across WANs, memory hungry and enjoyed triggering random blue screens.

    It stemmed from fundamental concepts cribbed from Apollo's DSEE environment. HP's acquisition of Apollo prompted what would then become the ClearCase team to leave Apollo/HP and form Pure, then they combined with Atria to form PureAtria, then Rational acquired PureAtria, and then IBM acquired Rational -- so ClearCase was a thing long before it was IBM software, and the features you're griping about were extant long before the IBM acquisition. The IBM era mostly saw them continue to focus on jamming ClearCase into their "Application Lifecycle Management" toolset, Rational Team Concert, wrapping everything in a ghastly blue Eclipse RCP client, and making it more of a pain in the ass to use.

    Dynamic views as you're talking about were not - and never were - intended for use across WANs, their Admin & Deploy guides specifically stated that it required a fast connection to a local server. If you wanted WAN connectivity, you either used RTC (Rational Team Client) to pull web views, or you used snapshot views, or you ponied up for MultiSite licenses and set up a sync scheme so that each site could have local copy on a VOB & View server they had a fast connection to.

    Again - poor implementation by your release team. It's like complaining that a hammer makes a giant hole in the drywall when you put screws in with it - it doesn't mean there's a problem with the hammer, it means there's a problem with the operator. If you use the tool in a way it's not intended to be used, then don't be surprised when it does a shitty job.

  4. Re:Meh... by Anonymous Coward · · Score: 5, Funny

    all right, you've clearly nominated yourself to untangling a 1TB repository. get on it bud.