Git Adoption Soaring; Are There Good Migration Strategies?
Got To Get Me A Git writes "Distributed version control systems (DVCS) seem to be the next big thing for open source software development. Many projects have already adopted a DVCS and many others are in the process of migrating. There are a lot of major advantages to using a DVCS, but the task of migrating from one system to another appears to be a formidable challenge. The Perl Foundation's recent switch to Git took over a year to execute. The GNOME project is planning its own migration strategy right now after discovering that a significant majority of the project's developers favor Git. Perhaps some of the projects that are working on transitions from other mainstream version control systems can pool their resources and collaborate to make some standardized tools and migration best practices documentation. Does such a thing already exist? Are any folks out there in the Slashsphere working on migrating their own project or company to a DVCS? I'd appreciate some feedback from other readers about what works and what doesn't."
whygitisbettertanx.com claims that mercurial doesn't have cheap branching -- the only advantage he sees git having over hg if leaving aside github. I'm surprised by this statement because I use hg branches everyday. The things he describes can all be done straightforwardly with hg, so I'm asking: can anybody in the know tell me if and how git branches are in any way more powerful than hg branches?
FTR I love hg, and I see no reason to switch to git, even though the whole bandwagon movement seems to have jumped on the git train.
I think it's more popular for one of the same reasons that Bitkeeper initially became popular - it's being used by Linus for the kernel. Getting Linus to use one of your tools is one of the best marketing coups you can land. Outside of this, Bitmover is a small company and it's hard to see how they would have gotten the kind of exposure that they did with the kernel. That said, they seem to be surviving just fine today.
The other reason that it's popular is because it's free. This is fine for open source projects. In the commercial land, managers tend to underestimate the importance of good revision control tools and processes, and the importance of tools which make it easy to build and enforce those processes. Bitkeeper (and some of it's competitors) go to a lot of effort to provide both tools and processes. Git is not so good at this. Other tools that are not good at this include Clearcase (although UCM is an attempt, albeit a controversial one) and CVS.
And I wouldn't say that Bitkeeper and Git are the same. The underlying design concept - distributed version control, changesets, and the benefits that flow out of this eg proper merge tracking and a greater degree of determinism - are the same. Bitkeeper has much better GUI tools, and it's a lot more user-friendly; the command interface is coherent and consistent, and the commands are simpler and easier to remember, options that do similar things are the same across different commands. For example, the "-r" flag always refers to a changeset number, in any command that accepts this parameter. I used BK on a project with between 20 and 45 users; it never once corrupted the repository and there wasn't a single time when the server went down. There were a few times when things were weird when new users unfamiliar with the tool broke their repos, but that stopped after a couple of weeks. The real benefit is that it makes it very easy to see who broke what, and how - whether it was during development or during a merge.
Git isn't friendly or forgiving at all, and you need to really know what you are doing. There are operations that are very dangerous, like the rebase operation; BK does have an equivalent but it incorporates some basic measures to stop someone from messing up the repository they are pushing to.
Additionally, Git will break things in unexpected ways. Try pushing a change into another Git repository, then navigate to that repository and run "git status" - git does not auto-checkout changes in the destination repository during a push. It's the user's responsibility to detect this and deal with it. I find that design approach - the idea that the user is expected to spot and deal with the internal behaviour of the tool - to be pretty bad.
Linus says that anyone who thinks Git is hard to use is an idiot. Idiocy is not the problem here. The developers in the organization I work in do not want to have to know or care about how the internals of the tool work. They want to cut their code, merge it and integrate it as quickly and as effectively as possible. BK easily beats Git on this measure. On the other hand, Git is far and away the superior open-source revision control tool. Anyone who thinks that Subversion is better just doesn't get it.
Right, that was always the weakness of git, and although it's improved I still have problems with its usability (or lack of it). For all the dumping Linus does on Subversion/Perforce and its ilk, they are easy to understand and it's basically always clear what you're doing. I haven't used git for a while, but last time I did it was like a box of sharp knives. Although hard to mess up the remote copy, messing up your local copy was much easier.
I'm trying to use git as much as possible --- I'm still pretty crappy at doing anything even slightly complicated with it, but even with minimal skills it's brilliant at keeping track of changes to local directories.
The only problem is that I'd really, really like a decent Eclipse git plugin. I'm used to using Subclipse for SVN, which is fantastic: I can point at a file or directory, say 'Synchronise with repository', and then get a graphical diff of every change and the ability to quickly and easily revert or commit changes on a per-change, per-file, per-group-of-file basis, etc. (And you can do this with any revision, which makes backing out one specific change very easy.) Doing the same with git's command line tools seems terribly clunky by comparison, especially when I'm struggling to remember the syntax, and the fundamentally unfamiliar workflow.
I do use the Eclipse git plugin at git.or.cz, but it's still very crude. The file decoration is invaluable, which lets me see at a glance which files are new/changed/pristine in the Eclipse project view, but actually trying to *do* anything with it is deeply unpleasant --- no synchronise view, no graphical diff, and some weird behaviour like if you point at a file, say 'commit this', you get a dialogue prompting you to commit *all* files. Which is not what I want. And there's lots of UI clunkiness all round, due to simple immaturity.
I've had some luck with giggle, but the UI is pretty bad, and some changes (I forget what; new files, perhaps) don't show up in it, which is a bit awkward. I've had a play with some other GUI frontends but they're all pretty nasty by comparison with Subclipse. Still, the git plugin is getting better with time --- I'm just hoping that Synchronise shows up soon...
I used to use cvs, subversion and perforce. After switching to git, it feels a lot more powerful, at the cost of more things that can go wrong.
My workflow with subversion was:
- regular update: update, check/fix conflicts, continue work
- commit: update, pick files I want to commit with TortoiseSVN, verify the changes in the diff view, write log message, commit, continue work
On GIT:
- regular update: stash my changes, change to master branch, pull, check for errors or dirty files (mostly endian problems), switch to work branch, rebase from master, check for errors or dirty files, unstash my changes, check for errors or dirty files, continue work
- commit: update, stage the files I want to commit, commit them, verify the changes, push
At several stages some obscure thing could go wrong that I needed to look up in the manual or on the internet, or needed to ask someone who used it for longer. That doesn't mean I think GIT is bad, I just feel it takes more time to be fully productive with compared to older systems. And I miss a few minor things from svn, like keyword expansion or properties.
Does bzr provide any attempt to sanitize incremental revision numbers? I know that both Mercurial and SVK have issues where you need to figure out that "my r9342 is your r8929". Git avoids this issue entirely since my repository's 92a560f20e72e4296c782d3fbb4706e6946d6209 is always going to be your 92a560f20e72e4296c782d3fbb4706e6946d6209, assuming you have the same commit of course ;)
That is a fair point - git isn't good for looking at isolated parts or individual files in a repository. But I see it really as a matter of optimizing for the common case. Normally, I need to see the whole repository. Normally I don't need to just look at one file. Git will checkout an entire repository along with all the history faster than SVN will, in the tests that I did.
BTW if I just want to look at one file in Git I use the web interface. That gets around the problem by querying into the main repository.
VSS is probably one of the worst VCSs ever conceived, worse even than SVN. You clearly have a very limited experience of real-world team oriented software development.
Git works great for small project, no doubt. Not so much for very large projects. You obviously don't work in a corporate or government environment with large projects. Our projects are broken into submodules (independent SVN repositories currently) and that's what I was talking about. Some of our submodules are 20 times bigger than the Linux kernel and there is no way to subdivide them more than that. Our source base really is that big.
In fact, I would argue that your suggestion to create many submodules is a weakness of Git. Lots of submodules just makes things even more complicated.
I'm working on the OpenJDK source tree through Mercurial. I couldn't be more satisfied. The tools are well structured, very easy to use, stable, fast and well documented. I don't miss any feature. Could anybody, who tried both and prefers Git, list some advantages of it over Mercurial? To me it just seems like a Git done right without the hype and too complex UI.
I feel you pain. I'm in the same boat. You can't work with CC effectively without 20-30 helper scripts. Hijacked/checked-out files is major pain. Dynamic views are great feature yet are completely useless.
Though that still doesn't mean you can't use Git like local tool.
I used before RCS (ci and co command) to preserve history of my modifications locally. Now due to various circumstances I moved to use Git locally and it works quite well.
After "ct update" (alias ct=cleartool), you go to directory (and in my case to Linux server) where you plan to work and do "git init" and "got add" for the affected files. I'm type of person who like to commit dozen times a day and Git helps greatly to not to impose my deficiency on others.
Though I'm using Git for about year now, I'm pretty much n00b. Outside of the obvious - git init/add/commit/diff/pull/push/update + gitk - I know very little. That's why it is also very hard for me to understand the usual complain about Git that it is very arcane. Yes, documentation is very poor and still can't catch up with all the features, yet you rarely run into the need for some esoteric function or syntax. Basic commands are pretty much "intuitive".
All hope abandon ye who enter here.
For problem two: this isn't a real problem with git, but rather with your organization. Multiple projects don't belong in the same repository, it's as simple as that.
I have been wanting to start with Git, but I find it too hard to know what should go into different repositories and what should be in the same.
First example: I might be writing a book in book/ and keep all images in a subdirectory book/images/. I think it is not far-fetched that I might want to work on only the images without downloading all the other, possibly huge, subdirectories.
Second example: Say I write a scientific article for which I compute a lot of numerical data. Then I write a second article, which builds upon the same data. Should the two articles go into the same repository, so that I can easily pull and compile everything at once with all dependencies in place, or should they be kept separately, so that I can work on the first article without dragging the other one along?
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
Funny the OP only mentioned two projects and one is only planning. Sometimes /. can be a bigger hype machine than money grabbing corps. that we all love to bash.
Neither worked on my 18 yr old CVS repo (that was populated with 7 yr old RCS files). What I did find was fromcvs. I found a couple of bugs, with the author fixed very quickly. It is also fast. My 3.5G CVS repo was converted in about an hour. Both of the others took 10+ hours (and didn't produce usable output). The biggest reason I love it: it allows incremental updates from CVS to GIT. You can run it any number of times and it imports the new stuff. You do need to leave the git repo you are importing into alone (no commits other than the import commits).
I still have more testing to do before we go live, but it's looking very, very nice.
The thing I don't understand about any distributed VCS is how (for example) others could pull stuff from my repository if I don't have a static IP. Also note that I don't mean "don't have a static IP" only in the usual sense of "having a dynamic IP at home": I also mean in the sense that my development machine is my laptop and I often work at coffee shops and other places and so I'm often behind NATs.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
Except if one is simply reviewing a specific file or files - for a code review, debugging, or copying pieces to another project. I do this all the time when helping others on their projects. I don't need (or want) the whole hot-mess...
It must have been something you assimilated. . . .
Where do people get ideas like this? I use CC effectively with one trivial Perl script. It converts "my feature is on this branch off this label" descriptions into config specs -- raw config specs are too complicated to handle, so you need a layer above them which matches your CM process. Yes, IBM/Rational should explain that to their customers. Or maybe make UCM not suck.
Then you're not branching, like you're supposed to do, and a hijacked file is the *least* of your problems. You cannot use CC as if it was CVS; a dynamic view is not a sandbox if you set it up to silently show other people's possibly incomplete changes.
Use them correctly for a few years, then report back.
Where do people get ideas like this? I use CC effectively with one trivial Perl script. It converts "my feature is on this branch off this label" descriptions into config specs -- raw config specs are too complicated to handle, so you need a layer above them which matches your CM process. Yes, IBM/Rational should explain that to their customers. Or maybe make UCM not suck.
What about normal diff? CC still doesn't allow to use external diff program. And "ct diff" insists on two files - it can't diff hijacked file against original.
What about normal recursive diff for two branches?
What about patch generator? So that you can back up you unchecked-in changes.
What about change log? Recursive change log showing changes for all files in directory?
How about converting change history into set of patches? To allow easier investigation of regressions.
The moronism with R/O files? All extracted/"ct get -to" files are marked R/O.
And this is from top of my head. For all of that I have scripts. And with the scripts, I'd say, CC isn't half bad.
But to the point of original question, with Git I would not need any of the scripts.
Then you're not branching, like you're supposed to do, and a hijacked file is the *least* of your problems. You cannot use CC as if it was CVS; a dynamic view is not a sandbox if you set it up to silently show other people's possibly incomplete changes.
We do branching and hijacked files are not problem per se. It is just better half CC tools, when given as parameter hijacked file, would simply say "f-off, this is view private file."
In some situations checked-out files are even worse since CC treats checked out files like files on a special branch. Consequently half of CC tools accept the file as parameter, yet show dick but no information about the file.
Git doesn't draw any difference between the files and files in repo. At any time you can do whatever you like with any accessible file/revision.
Use them correctly for a few years, then report back.
Care to elaborate on "correct" usage pattern then?
People tried them in company few years ago and pretty much abandoned them. They are still accessible, yet generally unused. Our CC admins would be happy to know the "correct" usage for them.
You can't index dynamic view - because it contains all possible vobs and all possible files. And I do not want to deal with 150K files of the whole project, I need only 3.5K files belonging to my part.
You can't compile in dynamic view - because even if only dozen of people compile simultaneously, CC server simply dies under load.
Heck, simple "ls" spits on screen bunch of errors every time, because dynamic view can't properly show branch, but shows all files on all branches (readdir() lists all of them). And if file did happen to be not on the branch of the dynamic view, stat()ing it would give you an error.
If you can't do development with them, what else can you do with the dynamic views?
I used in past dynamic views solely for porting semi-automatically (with script) trivial fixes into many branches. For more than that dynamic views are useless.
Please, reveal me the secret: how do I use dynamic view "correctly"? Many people in my company would be happy to know it too.
All hope abandon ye who enter here.
I currently use Subversion to keep track of my private projects. Nobody else has access to my repository. It's solely so I can track changes I make to my own software. That said, is there an advantage to using git? I like having a central repository because I can start working on some changes to code, and if I don't like them, revert back. I also don't really care much about tagging or branching. Every now and then I'll use a tag if I want to take a project in a drastically new direction, so I can easily go back to the previous "good" version if I want to. That's about it.
So, is there a reason to switch from Subversion? I'm not tied to it at all; I just want to use the tool that's best for me.
Lots to talk about here!
The complexity of git robs it of quite a bit of the value of it's features. For God only knows what reason, a 5-6 person project that i'm working on is using git instead of subversion, and only the person who setup the project actually has any idea how to use git.
It sounds like the first person set up the project, and now expects everyone else to just "make it work", even if they're not programmers and have a good understanding of Git. Fair 'nuff.
Now I don't know your situation, but if you're actually in a work situation, the lead programmer (or user, if you're not storing code in this repository) should be giving you guys some kind of help or crash course in using Git. The Git model is quite a bit different than SVN, and it has taken me some time to wrap my head around it -- kind of like learning a functional programming language after working with imperative languages for several years.
It's awesome to have the whole thing where it merges all the changes in a same file together, fairly intelligently, but even the GUI version for Windows has no functional interface for how to deal with conflicts (which should be easily done as a "which bit of code is the proper piece to use here?" instead of jamming diffs into a file.
Which Windows GUI tool(s) are you using? Right now I can think of several -- gitk, git-gui, qgit, git extensions, CheetahGit, TortoiseGit, ...
I think that part of the problem right now is that there is no definitive Git GUI for Windows. Even if the TortoiseGit project gets more mature, users of TortoiseSVN or TortoiseCVS will have to learn a new version-control paradigm and understand some new terms before they'll be able to successfully use TortoiseGit.
Also, the Windows and Linux versions of GIT have several problems interoperating with each other.
Are you referring to line-ending problems? If so, take a look at the "core.autocrlf" attribute. If you're not talking about line endings, and you can't find any help online, I'd just go ahead and file a bug report or hop on the git mailing list.
In short, Git appears to have been designed entirely with features in mind, and not one bit of usability for anyone other than Linus himself.
Oh, I think most people would agree with that -- especially Linus. Of course, I think that this is partially a Plumbing vs. Porcelain issue: a number of geeks love to use a command-line shell, but most ordinary users feel much more comfortable with a GUI windowing environment. Many programmers really like the power they get from using Git on the command line, but some people want something a bit more user-friendly like Easy Git or a Git GUI.
It is a nightmare for people who only have the need for version control and a handful of people working together. It reminds me very, very much of early Linux, before anyone else besides Linus had been hacking on it.
Yes -- I can see that. The Git workflow is pretty different from that of a tool like SVN. Unless the team leader is willing to sit down with the group and work through examples -- and then also be ready to answer questions anytime during the workday for the first few weeks -- then it's going to be a really rough, potentially unproductive month. Even if they grumble about it, it's probably worth their time to train everyone up front.
You've probably seen this before, but for anyone who's moving from SVN to Git, there's a really good Intro to Git for SVN users.
Good luck!
coding is life
I'm a Bazaar fan. That isn't to say I'm not a Git fan, I just prefer Bazaar (by a small margin, for a handful of reasons).
That website makes a really good case, but I think they should remove "Bzr" from the "Cheap Local Branching" section. I could s/git/bazaar that entire section and it would still be almost correct.
Bazaar has a totally different view of branches, but it gives you all the same flexibility as Git. The only thing is that Bzr branches are full copies of the entire repository - so they aren't "cheap" by default. To mitigate this, you simply create a "repository" one directory level above the branch, and then all the branches share data and are very cheap and fast.
Well subclipse was started by a single person wanted to have a decent svn plugin it took the guy more than one year to get to a working level, so it takes time.
Git would be the perfect companion for eclipse, it could replace the local history and the remote version control system but that would mean a proper integration of GIT might be 10 times as complicated because it has to replace both version control subsystems eclipse has.
Also git has several unresolved issues, how are you going to do a server repo browser server side version browsing etc...