Git Adoption Soaring; Are There Good Migration Strategies?
Got To Get Me A Git writes "Distributed version control systems (DVCS) seem to be the next big thing for open source software development. Many projects have already adopted a DVCS and many others are in the process of migrating. There are a lot of major advantages to using a DVCS, but the task of migrating from one system to another appears to be a formidable challenge. The Perl Foundation's recent switch to Git took over a year to execute. The GNOME project is planning its own migration strategy right now after discovering that a significant majority of the project's developers favor Git. Perhaps some of the projects that are working on transitions from other mainstream version control systems can pool their resources and collaborate to make some standardized tools and migration best practices documentation. Does such a thing already exist? Are any folks out there in the Slashsphere working on migrating their own project or company to a DVCS? I'd appreciate some feedback from other readers about what works and what doesn't."
Why Git Is Better Than X.com/ YouTube - Tech Talk: Linus Torvalds on git (yeah, I'm a convert)
In my day job we are migrating to something totally new...clear case.
(shit).
http://michaelsmith.id.au
In the UK and to a lesser extent here in Australia a "git" is akin to a moron.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
If the system you are migrating from manages trees, you should be fine. CVS migration is pretty easy and I understand that Perforce works quite well too (in both directions!). Most of the migration tools are listed in the GIT FAQ.
The places where people are likely to have trouble is migrating from tools that don't understand that there's more than one file. For example, RCS and SCCS both support branches, but in a completely different way to git (branches are per-file, they're not for the whole repository). This means that during conversion, something useful has to happen with them, but the right answer isn't clear to a program. If versions 1.1, 1.1.1.1, 1.1.1.2 and 1.2 of file "foo" exist, then versions 1.1.1.2 and 1.2 are on different branches and either may be the older revision. It's not clear if revision 1.43.1.3 of file "bar" is the same "branch" as "foo" 1.1.1.2 or not. Because RCS and SCCS deal with single files only, it's not possible to find an answer to these questions in the history files at all - if there is an answer, it's just a convention of the user. Essentially what's happening here is that the git import process requires information which isn't represented in the files you're converting from. For what it's worth, migrating from SCCS or RCS to CVS has a similar problem.
Personally, I've migrated from CVS to git for findutils (well, Jim Meyering did the actual migration; he migrated coreutils too). I haven't regretted migrating at all. It took me a long time of using git locally before I was comfortable migrating the master repo, though. As a git beginner the thing I found most worrying was that I found it hard to envisage the effect of the git command I was typing. The thing it took me a long time to figure out is that with a distributed version control system, it's safe to screw up your local copy, as long as you don't push the result.
Free software tends to soar.
"It is our choices, Harry, that show what we truly are, far more than our abilities." -- Prof. Dumbledore
I have migrated to svn many repos from older stuff, like SCCS and VSS. Migration strategies are important, and to decide about them you need to answer a few questions. First of all, ask yourself if git or DVCS is the best option for you, your project and your company. Just don't be led by hype. It may be that a centralized VCS like svn may be a better option. There are tools to make them perform as DVCS, like the plugins git-svn and bzr-svn. Second, ask yourself how much of the project history is needed, if any is needed at all. That may save you lots of time, disk, chaos and entropy. When migrating, it is very important to tidy up the repo. Purge unnecessary files, binaries, archives, branches, etc. I have seen people who use VCSs like a trashcan. Bad practices may sink the repository performance. After migrating, make sure users know how to use the repo and that understand the basic VCS concepts, either generic or specific for the VCS of choice. Try to remove practices and concepts from the older VCS. As you mentioned it, best practices are very important, and they are not easily found on literature. There is more I could say, but I guess by now it's ok.
Git is Free and does not cost thousands of dollars. Two pretty good reasons.
I always though bzr had the edge on git in terms of being a better DVCS. Is there a reason why the article seems to think that git is the default?
No such thing as 'better' here.
Bazaar was the runner-up DVCS, and rightfully so, but it has both advantages and disadvantages. Git is faster and currently more popular. Bazaar has an easier interface, better GUIs, is more easily extensible (Python), and runs better on non-Linux platforms.
So which you prefer is a matter of what you are looking for.
It's four good reasons.
1. You can use git for any purpose. You have to pay serious coin before you can use Bitkeeper for any purpose.
2. You have the freedom to see how git works, down to every last line of code. I can't comment on whether Bitkeeper also includes this level of freedom.
3. You can make any damn changes you want to Git, without prior approval.
4. You can pass on all these freedoms, and the freedom to use your change, to anybody you want. It was precisely the fact you can't do that with Bitkeeper that led to it being dropped by the Linux developers and replaced with a coded-from-scratch replacement.
Does my bum look big in this?
I have watched Linus talk about git on google tech talks, and am inspired to use it.
Unfortunately I think I need a tool like TortoiseSVN for git because I am a git.
Another reason is that the bitkeeper license prohibits users to work on competing products.
I think it's more popular for one of the same reasons that Bitkeeper initially became popular - it's being used by Linus for the kernel. Getting Linus to use one of your tools is one of the best marketing coups you can land. Outside of this, Bitmover is a small company and it's hard to see how they would have gotten the kind of exposure that they did with the kernel. That said, they seem to be surviving just fine today.
The other reason that it's popular is because it's free. This is fine for open source projects. In the commercial land, managers tend to underestimate the importance of good revision control tools and processes, and the importance of tools which make it easy to build and enforce those processes. Bitkeeper (and some of it's competitors) go to a lot of effort to provide both tools and processes. Git is not so good at this. Other tools that are not good at this include Clearcase (although UCM is an attempt, albeit a controversial one) and CVS.
And I wouldn't say that Bitkeeper and Git are the same. The underlying design concept - distributed version control, changesets, and the benefits that flow out of this eg proper merge tracking and a greater degree of determinism - are the same. Bitkeeper has much better GUI tools, and it's a lot more user-friendly; the command interface is coherent and consistent, and the commands are simpler and easier to remember, options that do similar things are the same across different commands. For example, the "-r" flag always refers to a changeset number, in any command that accepts this parameter. I used BK on a project with between 20 and 45 users; it never once corrupted the repository and there wasn't a single time when the server went down. There were a few times when things were weird when new users unfamiliar with the tool broke their repos, but that stopped after a couple of weeks. The real benefit is that it makes it very easy to see who broke what, and how - whether it was during development or during a merge.
Git isn't friendly or forgiving at all, and you need to really know what you are doing. There are operations that are very dangerous, like the rebase operation; BK does have an equivalent but it incorporates some basic measures to stop someone from messing up the repository they are pushing to.
Additionally, Git will break things in unexpected ways. Try pushing a change into another Git repository, then navigate to that repository and run "git status" - git does not auto-checkout changes in the destination repository during a push. It's the user's responsibility to detect this and deal with it. I find that design approach - the idea that the user is expected to spot and deal with the internal behaviour of the tool - to be pretty bad.
Linus says that anyone who thinks Git is hard to use is an idiot. Idiocy is not the problem here. The developers in the organization I work in do not want to have to know or care about how the internals of the tool work. They want to cut their code, merge it and integrate it as quickly and as effectively as possible. BK easily beats Git on this measure. On the other hand, Git is far and away the superior open-source revision control tool. Anyone who thinks that Subversion is better just doesn't get it.
Right, that was always the weakness of git, and although it's improved I still have problems with its usability (or lack of it). For all the dumping Linus does on Subversion/Perforce and its ilk, they are easy to understand and it's basically always clear what you're doing. I haven't used git for a while, but last time I did it was like a box of sharp knives. Although hard to mess up the remote copy, messing up your local copy was much easier.
I'm trying to use git as much as possible --- I'm still pretty crappy at doing anything even slightly complicated with it, but even with minimal skills it's brilliant at keeping track of changes to local directories.
The only problem is that I'd really, really like a decent Eclipse git plugin. I'm used to using Subclipse for SVN, which is fantastic: I can point at a file or directory, say 'Synchronise with repository', and then get a graphical diff of every change and the ability to quickly and easily revert or commit changes on a per-change, per-file, per-group-of-file basis, etc. (And you can do this with any revision, which makes backing out one specific change very easy.) Doing the same with git's command line tools seems terribly clunky by comparison, especially when I'm struggling to remember the syntax, and the fundamentally unfamiliar workflow.
I do use the Eclipse git plugin at git.or.cz, but it's still very crude. The file decoration is invaluable, which lets me see at a glance which files are new/changed/pristine in the Eclipse project view, but actually trying to *do* anything with it is deeply unpleasant --- no synchronise view, no graphical diff, and some weird behaviour like if you point at a file, say 'commit this', you get a dialogue prompting you to commit *all* files. Which is not what I want. And there's lots of UI clunkiness all round, due to simple immaturity.
I've had some luck with giggle, but the UI is pretty bad, and some changes (I forget what; new files, perhaps) don't show up in it, which is a bit awkward. I've had a play with some other GUI frontends but they're all pretty nasty by comparison with Subclipse. Still, the git plugin is getting better with time --- I'm just hoping that Synchronise shows up soon...
It's not surprising that developers increasingly favour distributed version control. It's a much more flexible way to work - and if you need to continue working in a centralized way, you can still do that.
If you're migrating to Git, there are a few things you need to account for. Do you want to migrate the history, or make a clean break using the current head of the tree ? Migrating the history is usually hard if it is in a commercial tool, since commercial companies quite obviously do not design their tools to make it easier for you to abandon them (some provide export tools). If you choose not to migrate the history, you have the issue of having to keep the old revision control system around to debug/fix/review older releases.
You need to be clear about your decision. Import some example code into Git (perhaps from your existing repository), put it on a server and ask your developers to play with it to make sure they're happy with the feel and that they're confident they can work with it. Make sure you practice doing things like merging conflicts and merging older maintenance branches. If you do this right, most of your developers should be fairly enthusiastic about the switch when it comes. Make sure you account for Windows users - Git isn't so hot over on Windows and I don't think there is an official stable Windows release yet.
Other than this, there's really very little to it.
I used to use cvs, subversion and perforce. After switching to git, it feels a lot more powerful, at the cost of more things that can go wrong.
My workflow with subversion was:
- regular update: update, check/fix conflicts, continue work
- commit: update, pick files I want to commit with TortoiseSVN, verify the changes in the diff view, write log message, commit, continue work
On GIT:
- regular update: stash my changes, change to master branch, pull, check for errors or dirty files (mostly endian problems), switch to work branch, rebase from master, check for errors or dirty files, unstash my changes, check for errors or dirty files, continue work
- commit: update, stage the files I want to commit, commit them, verify the changes, push
At several stages some obscure thing could go wrong that I needed to look up in the manual or on the internet, or needed to ask someone who used it for longer. That doesn't mean I think GIT is bad, I just feel it takes more time to be fully productive with compared to older systems. And I miss a few minor things from svn, like keyword expansion or properties.
I've just gotten fluent with SVN versioning after using it for a few years now. I understand that part of what bothers me about it is what bothers Linus Torwalds aswell and had him write Git and I can follow his reasoning in his famous Google speech on Git *and* I understand the hype that ensues around Git and welcome it. That said, until Git has a neat set of stable mature GUI tools to use with it, I'm sticking with Subversion, TortoiseSVN and/or Subclipse. A mature working toolkit that I know how to handle and that works more or less flawlessly.
Even a toolset like that would've costed thousands ten years ago. That goes to show how far we have come in some areas of the software field.
We suffer more in our imagination than in reality. - Seneca
It's worse than that. The bitkeeper author at one point tried to extend that as a ban on anyone who works for a company that has a competing product with bitkeeper.
Try pushing a change into another Git repository, then navigate to that repository and run "git status" - git does not auto-checkout changes in the destination repository during a push. It's the user's responsibility to detect this and deal with it. I find that design approach - the idea that the user is expected to spot and deal with the internal behaviour of the tool - to be pretty bad.
At first, I found this pretty unintuitive, however, I now feel that automatically checking out the changes would be a Bad Thing(tm).
Most of the time, repositories you push to will be bare repositories, with no working tree. If there is a working tree present, then presumably someone is actively working on it. Imagine you are implementing some new feature in your local copy, which is all very experimental, and perhaps not even compiling yet. Now imagine someone pushes a major refactorisation directly onto your working tree, which does not merge cleanly. I would find this exceedingly annoying. [You have not yet committed any changes, and so the push does merge cleanly with the HEAD, but just not the working tree]
In addition, I think it would be quite distracting if bits of code changed behind my back, when I wasn't expecting it. Especially if I happened to be editing the file in question when it got updated.
Even Microsoft don't use Visual Source Sink for their own code. (They use Perforce by preference.)
http://rocknerd.co.uk
I'm working on the OpenJDK source tree through Mercurial. I couldn't be more satisfied. The tools are well structured, very easy to use, stable, fast and well documented. I don't miss any feature. Could anybody, who tried both and prefers Git, list some advantages of it over Mercurial? To me it just seems like a Git done right without the hype and too complex UI.
You're quite right. It seems to me that Git isn't intended to provide access to the latest versions of individual files. Git, like all DVCS's I know of, is essentially a version-control plugin for your filesystem. The filesystem itself provides the current version you're working with, and so it's only a matter of providing an http server over the directory or something like that. Which is exactly what GitWeb and Trac-Git provide, only they're better.
huh ? mercurial has no issue with revisions numbers, revisions are indexed with a hash just like git . The sequencial numbering is available as a convenient alternative for referring to a rev, that's all.
hg has both a revision number and a changeset id. The revision number is human readable and useful within a particular repo. The changeset id is unique across repos, the same as git does.
Do you have any better hostages?
The main difference between Bazaar and Git is that Bazaar only tracks files while Git tracks changes. Git therefore is technically able to merge changes where one developer has moved code from one file to another while another developer made modifications to that code. On the other hand Git can have difficulties keeping track of things if you rename and change a file in one commit.
I switched from CVS to GNU-arch (tla). When GNU-arch was discontinued I switched to Bazaar. The command-line interface of Bazaar is easy to use. For example the symmetry of "bzr push" and "bzr pull" is really nice.
Funny the OP only mentioned two projects and one is only planning. Sometimes /. can be a bigger hype machine than money grabbing corps. that we all love to bash.
Many of us develop on Windows, pushing out code to Linux platforms. Git just isn't an option. Poor support for IDEs, especially Eclipse.
Bazaar has been working great for me. Used Mercurial with success as well.
Check out GitForDarcsUsers. Also, you may be interested in why ghc switched to git.
The video is from almost two years ago. Back then, Git and Mercurial really were the main options.
Neither worked on my 18 yr old CVS repo (that was populated with 7 yr old RCS files). What I did find was fromcvs. I found a couple of bugs, with the author fixed very quickly. It is also fast. My 3.5G CVS repo was converted in about an hour. Both of the others took 10+ hours (and didn't produce usable output). The biggest reason I love it: it allows incremental updates from CVS to GIT. You can run it any number of times and it imports the new stuff. You do need to leave the git repo you are importing into alone (no commits other than the import commits).
I still have more testing to do before we go live, but it's looking very, very nice.
At work we're trying to get all our our repos moved to Git. We moved off of CVS to SVN a year or so (which was a huge improvement), but now that all of the non-programmers in the office are used to using TortoiseSVN, lack of a good windows GUI for Git has been a bit of a roadblock.
The msysgit folks started work on Tortoise-inspired GitCheetah GUI, but that project basically fizzled out. Lots of people wanted a Windows GUI, but no one had both the resources and drive to step up and do it.
Then, exactly two months ago, Frank Li started working on TortoiseGit. From what I can tell, this is a fork of TortoiseSVN with most of the Subversion guts pulled out and replaced with git commands. TortoiseGit is not done yet: 'git add' has some issues, Submodules don't seem to work at all, etc..., but development on the tool is in high gear and the primary developer is going the extra mile to help users.
If you're looking to deploy tools right now, gitk is a bit more powerful than the log in TortoiseGit, but might be more confusing for naive users.
coding is life
The thing I don't understand about any distributed VCS is how (for example) others could pull stuff from my repository if I don't have a static IP. Also note that I don't mean "don't have a static IP" only in the usual sense of "having a dynamic IP at home": I also mean in the sense that my development machine is my laptop and I often work at coffee shops and other places and so I'm often behind NATs.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
I curse more when I use git than when I use Windoz (and those are the only times I curse). Git's design is really that bad (from a user perspective).
Git is fully distributed (with no "authoritative" source), but it doesn't give you any tools understand/manage the distribution of files. If you have a work group with more than a few people, you are constantly asking what repo (and what access method to it), what branch, and what (bombastically long) revision. It's fine for 1-2 people, but then any version control system is fine for a small enough group.
The documentation helps little. When you do "git help merge", you don't @#$@# care that this is the front end to multiple merge methods. You just want the stupid thing to work. If it's a special case, then you'll look for an advanced technique; but you are stuck reading through all this crap trying to figure out what really matters. No offense to the people working on git docs. It used to be awful, now it just sucks. The problem is more in the user interface design than the docs.
There are over 100 git commands, and a command can do radically different things depending the the switches and target syntax. It's more confusing than any other revision control system that I have worked with.
I use git because I have to, not because I want to (like Windoz). After using it for months, I still routinely get stuck trying to figure out the right mix of commands, arguments, and target syntax needed to get common things done.
Git can do some (nice) things that subversion can't, but it creates so many problems that you haven't gained anything.
I've heard good things about mecurial and bazaar. I wouldn't recommend git to anyone I liked (but it's perfect for perl :-).
I'm using monotone. What I like about it is its utter paranoia about data loss. Even data loss because of lost bits on a hard disk. You just can't symc a broken repository with a working one; bit failures will not spread. That said, it's designed for use in development against a test suite -- there's a procedure whereby automated tests can be used to "certify" particular revisions, so that an actual user of the software under development can just automatically download the last certified revision and use that.
The complexity of git robs it of quite a bit of the value of it's features. For God only knows what reason, a 5-6 person project that i'm working on is using git instead of subversion, and only the person who setup the project actually has any idea how to use git. The rest of us are just cruising along, not really having any idea of what we are doing with it, and are stopped completely whenever it does anything weird.
It's awesome to have the whole thing where it merges all the changes in a same file together, fairly intelligently, but even the GUI version for Windows has no functional interface for how to deal with conflicts (which should be easily done as a "which bit of code is the proper piece to use here?" instead of jamming diffs into a file. Also, the Windows and Linux versions of GIT have several problems interoperating with each other.
In short, Git appears to have been designed entirely with features in mind, and not one bit of usability for anyone other than Linus himself. It is a nightmare for people who only have the need for version control and a handful of people working together. It reminds me very, very much of early Linux, before anyone else besides Linus had been hacking on it.
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
The main advantage of git over subversion for such uses is that git doesn't require a seperate repository, the repository sits right there in the .git/ directory in your projects directory. Doesn't sound like much, but its great convenience factor, since all you have to do to start using git is type 'git init' and you are done, you don't need to create a repository, you don't need to important your existing files and most importantly you can leave your working directory as is. With SVN this same process can get quite annoying, since you basically have to delete your working direcotry and replace it with a SVN checkout and then verify that all your files actually made it into the repository. With git its a single command and you don't have to think about anything, so its much easier and you can just version control any directory you like.