The Rise of Git
snydeq writes "InfoWorld takes a look at the rise of Git, the use of which has increased sixfold in the past three years. Buoyed in large part by interest among the Ruby community and younger developers, Git has been gaining share for open source development largely because of its distributed architecture, analysts note. And the version control system stands to gain further traction on Subversion in the years ahead, as Eclipse is making Git its preferred version control system, a move inspired by developers and members."
This may be due in part to the way github integrates social networking and coding-- I'm unaware of anything similar for SVN, Perforce, Bazaar, Mercurial, etc...
yes. it's better than everything except mercurial, of course.
Yet another DVCS article that doesn't mention Bazaar at all. And cursorily swats away Mercurial as "not as large a community."
It seems like just about every argument in favour of Git could be equally applied to any other DVCS. On top of that, the only thing it has going for it is a larger community (and being the creation of Torvalds).
I've argued to wit's end that Bazaar is superior to Git in a multitude of ways (branches as separate file-system directories, optional ability to work in bound mode as with Subversion, revision numbers, explicit notion of a 'trunk' versus merged branches, explicit moves/renames rather than heuristics, commit metadata). Basically Bazaar has a much richer data structure than Git. The last point (commit metadata) is crucial: because Git lacks commit metadata, it is impossible to meaningfully use any other revision control system in conjunction with Git -- what a selfish decision.
Yet all I ever hear is "Git is better than all the other revision control systems because [generic reasons why DVCSes are better than centralised ones]." Such is the case with Scott Chacon's site Why Git is Better Than X, which I wrote a rebuttal of at Why Git Ain't Better Than X.
Sometimes after spending a whole day amongst non-geeks, doing non-geeky things, I come here and read the names of some of the things these technologies are named.
Git? Ruby? Subversion? Eclipse?
I get this distinct hillbilly feeling after reading some of the names the open source community has come up with for their projects of late. Mental images of tie-clad programmers in a rusted pickup truck waving corded mice over their head while techno music plays kind of images. Then I hit page reload, and the feeling fades... until I think of Richard Stallman.
#fuckbeta #iamslashdot #dicemustdie
The lack of decent tools is going to slow adoption of git, particularly in corporations. I've yet to see a tool that can handle even a simple daily workflow so I've stuck to cli, gitk and git-gui which are all clunky. egit has definitely improved it still feels out of place and I believe is missing features (does it even support autocrlf yet?)
Corporate projects will almost certainly have a centralized repository and while git can deal with this, its possible to paint yourself into a corner where its painful to recover.
For reference, I've been a daily git user for ~16 months both at the company I work for and as a committer at Eclipse.
I'm not aware of any _code_ sets which span 50GB, and it seems unlikely that you could get to that size without a lot of machine-generated content. Such content wouldn't be ideal for git to manage, since git depends a lot on the capabilities of diff--- and machine-generated content might not diff as effectively as human-generated code. You can hardly fault a tool for doing a poorly at a job it wasn't designed to do.
Is the content you are managing that you describe as "50GB+" actually human-generated _code_? Or is it _data_? There is a big difference to git.
On the other hand, git manages the complete Android source code. It isn't "50GB+", but it is still substantially larger than the Linux kernel--- and git does fine. However, Google breaks that code base up into 150+ sub-repositories, which is actually a quite sane thing to do. I haven't tried to place Android into a single git repository, so I can't say how well git would deal with something that large. But it wouldn't be the best way to use git, anyway.
So I think your negative review of git is uninteresting, to say the least.
b.g.
Exactly who the fuck has 50GB in one source code tree?
--
BMO
Those who store data, images, other binaries like built executables and other artifacts alongside the code.
You can argue that you shouldn't do that, but there are times when it's difficult to avoid, and if you need to be able to keep versions, it can be done easily with something like SVN.
I think GIT has it's advantages, but to reject all predcessors and raise it up as the only way to go is foolish.
These posts express my own personal views, not those of my employer
My main issue with Git was it's Unicode filename support under Windows. Quite frankly it's broken. You add and commit your "Unicode Filenames" fine in Linux or Windows, but if you ever check them out under Windows they are renamed to a weird character set and require being re-added and checked in with their new path.
SVN doesn't have this problem (using TortoiseGit and mSysGit vs TortoiseSVN). I stopped using Git after I encountered this and have reverted to using Subversion until this has been resolved. If I had some time I might look into it, but seeing as it seems to be a known issue in mSysGit and the underlying cygwin(?) libraries I doubt I'd ever manage to resolve this myself without breaking something else.
Personally I like the Git Workflow, especially the commit-early-commit-often mentality that I never managed to get to in svn. It's just a shame that I can't use it seamlessly on Windows to complement my use on Linux.
Cheers, Chris
Mercurial is not really superior, it is a subset of about 80% of hits functionality baked into a nicer command line set. Btw. Mercurials strong side is really the relatively clean command line outside of that both systems are so close it is eery.
and saturnial is even better of course although I hear that aluminiumal is doing well too ;-)
Everything I write is lies, read between the lines.
Try using carat (^) in a recent SVN client. If you're in a working directory, it's a stand-in for the base repository URL. so svn+ssh://foo.bar.biz/svn/widget/trunk could be written as: ^/trunk
Maxim: People cannot follow directions.
Increases in truth directly with the length of time spent explaining them
Since the idea behind Git is that since it is distributed, and doesn't need a master repository, I guess it didn't make sense to have revision numbers when it was created (for the Linux kernel). This is because when two people make separate revisions at the same time on their local repositories, a linear revision number would conflict.
However, I've never actually used any Git project/repository which didn't have a master repository. This is both local repositories for my own projects on my Dropbox folder, and professional repositories I've used (Android and the various repositories at the company I work at), And especially at work, it has been annoying that we didn't have revision numbers.
I wish Git would get a new feature added: the ability to assign a repository as the "master" repository, and in turn the ability for the master repository to assign revision numbers. If people are wondering how that would work considering people make commits on their local repository and then push them to the master causing possible conflicts, the revision numbers wouldn't get assigned until they hit the master branch and they also split it up for merges:
5
/ \
4.1 4.2
\ /
3
(or something similar to the above)
Lots of people who use an alternative VCS like Mercurial, Bazaar, etc., bitch about Git because the lack of revision numbers. To those who are unfamiliar, each commit in Git has a SHA1 hash which is used as an identifier instead of a revision numbers. Unfortunately, they are very unwieldy to communicate to others. At work we always use the name and date-time instead, but that has problems as it doesn't convey the branch for instances when it matters.
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.
With git, you have no option to pull the entire repository, and all of its data, and all of its history. Aptly described by the command, you have your own local clone of the whole thing. As such, with larger projects, it becomes necessary to break the repository up into smaller, more manageable submodules. If using subversion, or some other version control system where you 'check out' rather than 'clone', it becomes possible to simply pull the current version of just the directory you want to work on. In essence each folder is automatically made a submodule.
Both strategies have their advantages and disadvantages. Every programmer is going to have their own style of work, which will be better suited towards one VCS or another. Claiming git is the perfect VCS for all occasions, as the OP did, is simply naive.
I had a small role in getting the PostgreSQL project to convert from CVS to git. There's a good summary of what happened at Lessons from PostgreSQL's Git transition. With a pretty conservative development community, the bar for converting from CVS to git was set pretty high: the entire CVS repository had to come through, such that every single release ever tagged could be checked out and get exactly the same files as checking it out of CVS (a little binary diff tool was used to confirm). With around 15 years of history in there, that took some upstream fixes to the cvs2git tool to finally accomplish; it took just over a year to work out all the details to everyone's satisfaction. My checked out copy of the current repo is 272MB right now, so neither small nor giant.
I would say that everyone who works regularly on the code is at least a little bit more productive than they used to be, with the older CVS experts having seen the least such improvement. But some people are a whole lot more productive. I'd put myself in that category--my patch contribution rate is way up now that it's so much easier to pop out a branch to hack on some small thing and then submit the result for review.
And the conversion seems to have improved the uptake of new developers getting involved in working on the code. Having to deal with CVS was a major drag for younger developers in particular, and Subversion is equally foreign to most of them now. As suggested in the article, anyone under 25 will only touch a corporate style CVS or Subversion repo if dragged kicking and screaming into it. As more of that generation rises through IT, old style repos will continue to get shredded at a good rate every year. It could have been any of the DVCS systems that ended up in this position, but git was the one that got the right balance of feature, innovation rate, and publicity. Now that it's got such a wide user base, too, I don't see any of the other VCS software options competing with it successfully in the near future.
Meh, call me when there's tortoisegit, and by then it will be too late.
You missed the call, it was a while ago. I considered TortoiseGit mature enough to use around V1.3, which was January of 2010. The upward spike in downloads shown on their page, which really took off around V1.2, shows quite a few people agree.
Well, you should post with your real name if you are going to make such an encompassing statement, instead of anonymously. I'm kinda wondering what repo management tools you are using that can handle 50GB+ data sets that you are trying to compare against something like git?
From my experience handling large data sets is less a function of the repo and more a function of disk bandwidth and memory. Putting, say, a million files into a repo (any repo) is not a big deal but managing it will definitely be dependent on the operations you are trying to do, memory, and storage.
I've found, in general, for the DragonFly project which manages ~500MB and ~1GB repos, that regardless of the repo if you want operations to be efficient you need to have a high speed caching layer helping out the filesystem. For DragonFly, of course, that means having a SSD in the system and using the swapcache feature to cache filesystem data and meta-data on the SSD. Then repo operations run fast regardless of the repo system used.
A nominally priced SSD can cache 100G of data fairly cheaply, so handling a large repo isn't a big issue. In our case we have two machines which keep about a dozen repos from different projects synchronized, in order to make them available to our developers, as well as perform incremental translations from CVS to GIT for pkgsrc (which is an ultra-nasty script). The scripts run twice a day and have a run-time of around an hour with the SSD caching layer. Without the SSD caching layer those scripts will take 6+ hours of time to run (12 hours a day of run time if I run them twice a day). The SSD makes a huge difference in manageability.
In terms of trying to manage fewer larger files, such as images... large numbers of binary files are best managed outside the repo infrastructure. Sure, a few here and there (such as a web site's icons) can easily be managed inside a repo, but trying to manage large amounts of bulk data in a repo generally just results in a lot of unnecessary pain. It's better to manage bulk data in a filesystem capable of performing snapshots.
Similarly for backups... repos aren't good mechanisms for making backups. You want something more closely integrated with the filesystem (and the filesystem's snapshot capabilities presuming you are using a filesystem with snapshot capabilities) to do LAN and off-site backups. Not a repo.
So the question here is: Are you complaining about the amount of time it takes to do an operation due to being disk-bound, or are you talking about bugs in the repo system causing the program(s) to crash or eat too much memory? I haven't had any significant memory issues with git myself though I can definitely see needing a 64-bit VM space if the repo becomes large enough for certain operations.
-Matt
I wasn't talking about the command line. The command line is probably roughly equal, annoying wise that is..
I'm talking abou the GUI tools that are currently available. They suck, and doing tasks like cherry picking files is a pain in the but. Of course, the fact that there's a term called "Cherry Pick Commit" that has nothing to do with "Cherry picking" files for commit.. might be part of the reason... You are right, though.. not having to checkin all files in one command is nice.
My beefs with GIT include some of yours. The huge amount of time to download. With SVN, you just download the latest.. but with Git, you have to download the entire change tree locally. Also, i find the git terminology to be (what seems to me) deliberately obscure. Terminology that has been in use for decades is changed for no apparent reason, other than to say "Hey, we're different". This leads to making mistakes when you confuse similar terminology between systems that do different things.
My other major beef is that, while it's nice to be able to do version control disconnected, I dislike having my check ins local.. version control is also a "save my ass", and if my laptop takes a trip down a flight of stairs, anything that's not pushed is lost as well.
If you need web hosting, you could do worse than here
I'm talking abou the GUI tools that are currently available. They suck, and doing tasks like cherry picking files is a pain in the but. Of course, the fact that there's a term called "Cherry Pick Commit" that has nothing to do with "Cherry picking" files for commit.. might be part of the reason... You are right, though.. not having to checkin all files in one command is nice.
So I can't speak to GUI tools on anything but Windows, but there's a TortoiseGit that functions nearly identically to TortiseSVN. It even (at least mostly) hides the index from you.
My other major beef is that, while it's nice to be able to do version control disconnected, I dislike having my check ins local.. version control is also a "save my ass", and if my laptop takes a trip down a flight of stairs, anything that's not pushed is lost as well.
That sort of gets back to the "it's easy to forget to push" problem. If you're not subject to this problem, then I disagree that there's much of a difference: if I lose work because I deliberately didn't push, that's because I don't have repository access, and then I'd have "lost" that work under Subversion anyway because I wouldn't have done it in the first place.
As for remembering to push, there is a problem there. Tortoise is nice because on the "yes, you've committed" dialog there's a nice "push" button staring you in the face, so it's pretty easy to remember there, especially if you get in the habit of pushing after every commit.
For the command line, I haven't found a perfect solution... I think I want to write a shell alias that will run git as normal, but if I said "git commit" will print out "don't forget to push!" when it's done. I haven't gotten to that yet.
And one of the two biggest repository tangles I've had to unravel had at its root the fact that I forgot to push from one copy of a repository, developed in another, and then tried to sync everything up. That took some time to even figure out what happened, and rather longer to figure out the best way to fix it.
That said, I've also had a time when I've left dirty copies of files sitting around in a Subversion working copy for months without noticing, and that caused a problem too.
TLDR I do think that this is a drawback of Git, but for me it's so drastically outweight by being able to work disconnected that it almost doesn't register.
Yes, but at the same time I only recall a few minor instances where I ever wanted to extract just a portion of a CVS archive, and the only reason was because, at the time, the system I was running on wasn't all that fast.
These days extracting a repo, even a large one, doesn't take all that much time, nor is disk space that big an issue. I just extract the whole thing (git, cvs, whatever) and then pick out what I want.
It only takes ~3 seconds or so to switch branches on a checked out repo of around ~100,000 files, and certainly less than ~10 seconds to do an initial checkout of such a repo. Not to mention the fact that 2TB hard drives are $100 these days so there's no real excuse to be tight on disk space.
When I first started using git I did worry somewhat about disk space. I quickly came to the conclusion that a few extra gigabytes didn't matter in today's world of cheap multi-terrabyte hard drives. I typically have 4-5 copies of the DragonFly source base broken out, each with its own copy of the .git repo. A simple git pull is all I need to synchronize whatever directory I've decided to work in (since I'm often reviewing other developer's branches I have multiple independent copies). That's how little I care these days.
That said, it *is* possible to tell git to hard links or otherwise share repo files in order to reduce the size of the .git/ subdirectory in the checkout directories. We do this on our developer box (where each account is given its own private repo which syncs against the DragonFly master repo). I don't bother optimizing my own personal copies though.
And one final thing to note... if the filesystem can de-duplicate data, having a lot of copies lying around is even less of an issue. I've never had to depend on de-dup... it's kinda hard to actually run a 2TB drive that isn't being used to archive media files out of space... but it does work particularly well on backup machines.
-Matt
Mercurial's most touted advantage is that it's easier to learn, but this is a joke. If you develop, you interact with the version control system all day. A tiny advantage in learning it faster is nothing compared to not being able what you want to do afterwards, or having to redo something because the version control works against you instead of with you.
I work at a company that has used Git professionally. My team isn't dumb people, but they have fucked up with Git dozens of times. What I quoted is an okay argument at a personal level. However, there is something to be said as an organization that having an easy-to-use tools is better.
I am not making the argument that either Mercurial or Git is better; I am making the argument that tools which are easier to use will lead to less fuck-ups in an organization.
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.
see "git clone --depth"
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
I find SmartGit more useful for day-to-day stuff.
I have TurtoiseGit installed (and it works) but i never use it. Having the correct icons show up in Explorer is nice though.
Hivemind harvest in progress..
Tom Lord, developer of rival Arch must be spitting blood at the success of Git.
I followed Arch's development back in 2004 and quickly lost interest. The last crazy thing I remember was Tom trying to build a home-brew LISP derivative *into* his version control system. It was going to revolutionise everything. He even wrote a long manifesto-cum-design document in three parts. At that point I gave up and moved to Subversion. I just wanted a modern version control system that worked.
In Eclipse you now also have top tier support for Git through the EGit plugin. This is sitting over a pure Java implementation of git called JGit (i.e. no need for msysgit). It works pretty well and in the manner you would expect if you've ever used a VCS with Eclipse before. JGit and also powers Gerrit which is a git server and web app that slots a code review & approval system into the workflow.
I checked out the full repository of an open source project I have been tinkering with in both SVN and Git (libgdx). The SVN was MUCH larger than the Git repository on my hard drive (i think 33% more, but I can't remember).
I think the point being made was that, in Subversion, you can check out just a small part of the repository if you want to do so, rather than the whole thing. I'm not aware of that possibility in Git.
Mercurial has 95% of Git's functionality and is far easier to use. The extra features are simply not worth the headache.
Git's Windows support is atrocious. The installation process is an easy indication of that. Mercurial is packed of "just works" moments.
Well, I certainly was not expecting you to use RCS as a comparison point. RCS is utterly horrible when dealing with large data sets. Any modification to a file requires rewriting the entire rcs file and doing something like, oh, tagging, requires rewriting every single file in the repo. Every single one.
RCS is a very filesystem-heavy repo management system. Updates, checkouts, pretty much everything you do *except* single-file log displays are expensive. Such operations have to scan or access nearly every file in the repo and at least stat every file in the checked out tree. For large repos with hundreds of thousands of files RCS/CVS is nasty as hell.
Nor can you can you reliably mirror or replicate a RCS or CVS repo. Neither rsync nor cvsup are capable of reliably replicating a live, heavily used RCS/CVS repo. I've tried many times... I have to mirror the NetBSD CVS repo to get their pkgsrc into a git mirror and it takes a complex script to try to detect a point where the entire CVS repo is quiescent. Even with the quiescence check my script *still* has to do a full cvs checkout and an actual diff -r between the checked out CVS repo and the checked out git repo to catch occasional failures.
In short RCS/CVS is a mess. GIT is not a mess. With git you just use git-daemon and git:// URLs and you can get massive, reliable replication of the repo.
The only other issue involved here seems to be one of machine resources. But in today's world machine resources are cheap. Even a large 50G+ repo trivially fits on a sub-$100 2TB hard drive, and it takes only a moderately-sized SSD caching layer (~100G) to make the repo operations efficient. That's cheap enough that every developer can keep multiple full repos on their workstations.
In many respects the GIT concept has grown into its own by virtue of the greatly improved storage resources available on today's machines. In the 80's and the early 90's a centralized repo would have been far more important simply by virtue of the relative disk space required. In 2011 the relative disk space required for even a large repo is tiny.
-Matt