The Rise of Git

Bazaar by mgiuca · 2011-07-26 16:16 · Score: 5, Insightful

Yet another DVCS article that doesn't mention Bazaar at all. And cursorily swats away Mercurial as "not as large a community."

It seems like just about every argument in favour of Git could be equally applied to any other DVCS. On top of that, the only thing it has going for it is a larger community (and being the creation of Torvalds).

I've argued to wit's end that Bazaar is superior to Git in a multitude of ways (branches as separate file-system directories, optional ability to work in bound mode as with Subversion, revision numbers, explicit notion of a 'trunk' versus merged branches, explicit moves/renames rather than heuristics, commit metadata). Basically Bazaar has a much richer data structure than Git. The last point (commit metadata) is crucial: because Git lacks commit metadata, it is impossible to meaningfully use any other revision control system in conjunction with Git -- what a selfish decision.

Yet all I ever hear is "Git is better than all the other revision control systems because [generic reasons why DVCSes are better than centralised ones]." Such is the case with Scott Chacon's site Why Git is Better Than X, which I wrote a rebuttal of at Why Git Ain't Better Than X.

Re:Bazaar by nschubach · 2011-07-26 16:27 · Score: 3, Informative

The main problem I've had with Bazaar is the lack of tool options. Of course, that's not really Bazaar's fault...
With Mercurial, I have tortoisehg for Windows and a very nice plugin for Eclipse. Bazaar's Eclipse integration has been rather lacking and the Windows tool chains have been slowly filing in, but it still needs time to level the field. (I'd work in Linux at work if it was an option... but it's not.) I still use Mercurial on my Linux laptop for local version management though.. mainly because it's what I use at work and there's no jumping between different keywords and methodologies.

--
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Re:Bazaar by euroq · 2011-07-26 18:39 · Score: 3, Interesting

Your cite branches in separate folders as a superior feature? Subversion has that crap and it's messy.
High five brah! I hated that shit when I had to use SVN.

You also mention "revision numbers" as a feature. Who gives a crap about revision numbers?
As a person who has professionally used Git (and likes it) in many repositories, both large and small, I miss revision numbers. When I e-mail my colleagues about commits (i.e. revisions in another syntax), it is annoying to communicate it: I have to copy and paste the entire commit metadata (name, author, date-time, and SHA) in order to reflect it properly. I would much rather say "15.1" or "newfeaturebranch.3".
It also has been a problem when using the "Blame..." tool to track down why something was changed, as you have to do a double-lookup to find the SHA of the changed line and then the commit of the SHA. Well, my bad because really you'd have to do the same thing with a revision number, but trust me... I'd rather be working with revision numbers than SHAs because it would be easy to know that this line was changed because of "newfeaturebranch" rather than e93f02a09f9fe092a039a923.
Obviously, we don't need revision numbers... but boy would I love to have them.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.

Re:It's because by bgat · 2011-07-26 16:30 · Score: 3, Informative

I'm not aware of any _code_ sets which span 50GB, and it seems unlikely that you could get to that size without a lot of machine-generated content. Such content wouldn't be ideal for git to manage, since git depends a lot on the capabilities of diff--- and machine-generated content might not diff as effectively as human-generated code. You can hardly fault a tool for doing a poorly at a job it wasn't designed to do.

Is the content you are managing that you describe as "50GB+" actually human-generated _code_? Or is it _data_? There is a big difference to git.

On the other hand, git manages the complete Android source code. It isn't "50GB+", but it is still substantially larger than the Linux kernel--- and git does fine. However, Google breaks that code base up into 150+ sub-repositories, which is actually a quite sane thing to do. I haven't tried to place Android into a single git repository, so I can't say how well git would deal with something that large. But it wouldn't be the best way to use git, anyway.

So I think your negative review of git is uninteresting, to say the least.

--
b.g.

Re:Github? by mgiuca · 2011-07-26 16:37 · Score: 4, Funny

Well that's really what GitHub is ... much like Facebook treats every "object" (status update, photo, event) as a commentable, likable object, so does GitHub for VCS objects such as commits.

It's quite funny to see a commit with a comment thread attached to it. I saw one that went viral and ended up with 88 comments including meme images.

Data, Images, Binary builds etc. by syousef · 2011-07-26 16:48 · Score: 4, Informative

Exactly who the fuck has 50GB in one source code tree?

--
BMO

Those who store data, images, other binaries like built executables and other artifacts alongside the code.

You can argue that you shouldn't do that, but there are times when it's difficult to avoid, and if you need to be able to keep versions, it can be done easily with something like SVN.

I think GIT has it's advantages, but to reject all predcessors and raise it up as the only way to go is foolish.

--
These posts express my own personal views, not those of my employer

Re:Data, Images, Binary builds etc. by siride · 2011-07-26 17:03 · Score: 3, Informative

I store binary data in my Git repos and it doesn't seem to balloon as badly these days as people make it out to be.
In fact, Git seems to be good enough at it that I use it to do application releases. It's faster for me to build an app, commit it to a special Git repo and push the new commit, than to send it via SFTP over the VPN (a few hundred KB vs dozens of MB). In that repo, I have hundreds of new versions, but it's only a few hundred megabytes. The equivalent in file storage would be easily ten times as much.
I don't doubt that other VCSes do better, but Git is not awful.
Re:Data, Images, Binary builds etc. by 0123456 · 2011-07-26 17:30 · Score: 3, Insightful

Storing large volumes of binary data in an archived fashion is a job for a filesystem, not a CVS. A CVS is not intended as a backup solution, nor should it be used as such.
So when I want to release a critical patch to an old version of our software I shouldn't just be able to extract everything from the repo, make the change and build the release installer, I should also have to find where any required binary files for that release were stored and copy them to my machine and hope that no-one deleted them in the meantime?
I really know very little about git, but from the numerous 'Git doesn't do X, Y, Z', 'But you shouldn't be doing X, Y, or Z!' posts here I don't see a reason why I would I want to.
Re:Data, Images, Binary builds etc. by gbjbaanb · 2011-07-26 21:09 · Score: 3, Insightful

and you've just screwed the entire configuration process there.
The whole point of a SCM is that you put your sources in there so you can check it out and get the same set of sources from any point in history. The moment you say "oh that's too big, we'll put it somewhere else" is the day you lose control of that reproducability.
Your images used in the app are part of the source. While there's a place for storing data elsewhere, it still has to be controlled in a way that you can get it back out again for a particular version.

Re:Mercurial by MemoryDragon · 2011-07-26 17:08 · Score: 3, Interesting

Mercurial is not really superior, it is a subset of about 80% of hits functionality baked into a nicer command line set. Btw. Mercurials strong side is really the relatively clean command line outside of that both systems are so close it is eery.

Re:It's because by ls671 · 2011-07-26 17:09 · Score: 3, Funny

and saturnial is even better of course although I hear that aluminiumal is doing well too ;-)

--
Everything I write is lies, read between the lines.

Git could use revision numbers by euroq · 2011-07-26 17:11 · Score: 3, Interesting

Since the idea behind Git is that since it is distributed, and doesn't need a master repository, I guess it didn't make sense to have revision numbers when it was created (for the Linux kernel). This is because when two people make separate revisions at the same time on their local repositories, a linear revision number would conflict.

However, I've never actually used any Git project/repository which didn't have a master repository. This is both local repositories for my own projects on my Dropbox folder, and professional repositories I've used (Android and the various repositories at the company I work at), And especially at work, it has been annoying that we didn't have revision numbers.

I wish Git would get a new feature added: the ability to assign a repository as the "master" repository, and in turn the ability for the master repository to assign revision numbers. If people are wondering how that would work considering people make commits on their local repository and then push them to the master causing possible conflicts, the revision numbers wouldn't get assigned until they hit the master branch and they also split it up for merges:
5
/ \
4.1 4.2
\ /
3
(or something similar to the above)

Lots of people who use an alternative VCS like Mercurial, Bazaar, etc., bitch about Git because the lack of revision numbers. To those who are unfamiliar, each commit in Git has a SHA1 hash which is used as an identifier instead of a revision numbers. Unfortunately, they are very unwieldy to communicate to others. At work we always use the name and date-time instead, but that has problems as it doesn't convey the branch for instances when it matters.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.

Re:Git could use revision numbers by mfwitten · 2011-07-26 22:15 · Score: 3, Informative

Now, Git works around this mostly, because you can say 483b3ced^ to go to the previous revision (and actually SVN supports this too because you can say HEAD^). But it's not a full solution. What's the next revision? Git doesn't have a way of getting you that information.
That question doesn't make any sense, because in absolute terms, there are an indeterminate number of immediate children across every repository and across time.
There are a number of things you could do to narrow the search (such as something like git log --reverse -1 483b3ced..master, but ultimately you'd have to account for merge commits there, too; perhaps the --children flag for git rev-list and git log might be useful).
Re:Git could use revision numbers by minkie · 2011-07-27 00:31 · Score: 3, Interesting

Revision numbers are easier for humans to deal with. For example, here's a common flow I use every day:
$ hg pull
$ hg log | less { see some change I'm curious about and note the change number }
$ hg export 3742 | less
With change numbers, it's easy to remember 4 (or even 5) digits for the time it takes to type the export command. If I only had hashes, I'd have to copy-paste the string. Things like this matter less to people who only use GUIs. For command line folks, however, being able to easily read, pronounce, remember, and type change numbers is essential. Even if you're just talking with other people, it's a lot easier to say, "Oh, I see what happened, in change 2456, you did..." than to refer to hash strings.
I've used rcs, cvs, clearcase, perforce, dabbled in svn, dabbled in git, and am currently using hg. Of the centralized bunch, perforce is my favorite (not free, but reasonably priced and amazingly excellent tech support). I can't see anybody wanting to use svn for any new projects today. When it first came out, it was a a significant improvement over cvs and people naturally flocked to it, but there's just so many better alternatives today.
Clearcase is an interesting beast. For sure, it's overpriced bloatware that's on life support, being kept alive mostly by big legacy customers with neanderthalic IT and Release Engineering departments who still believe IBM can do no wrong. But, it did have some interesting ideas. That every revision of every file exists simultaneously in the file system namespace is really powerful.
Between git and hg, I'd say they are fundamentally identical in capability, but I find the hg command set easier to get my head around. All the people who say, "X is the best possible vcs. I used to use cvs and when I switched to X my sex life improved overnight", fail to understand that "X is way better than cvs" is true for pretty much any value of X, and says nothing about the relative merits of the various X instances.

Re:It's because by wagnerrp · 2011-07-26 17:13 · Score: 3, Informative

With git, you have no option to pull the entire repository, and all of its data, and all of its history. Aptly described by the command, you have your own local clone of the whole thing. As such, with larger projects, it becomes necessary to break the repository up into smaller, more manageable submodules. If using subversion, or some other version control system where you 'check out' rather than 'clone', it becomes possible to simply pull the current version of just the directory you want to work on. In essence each folder is automatically made a submodule.

Both strategies have their advantages and disadvantages. Every programmer is going to have their own style of work, which will be better suited towards one VCS or another. Claiming git is the perfect VCS for all occasions, as the OP did, is simply naive.

Re:Tower, GitHub, GitX client (Some Mac only) by syzler · 2011-07-26 17:42 · Score: 3, Interesting

Let's not forget that Xcode 4 uses Git by default and is tightly integrated into the interface. Examples being

* Xcode creates a git repository by default when creating a new project
* When saving a file, Xcode will place a "M" marker next to a file to indicate it needs to be committed
* Re-naming a file in Xcode will perform the rm and add operations automatically in Git
* Xcode allows you to view the current version and past versions side by side in the editor

PostgreSQL CVS-git conversion by greg1104 · 2011-07-26 17:46 · Score: 4, Interesting

I had a small role in getting the PostgreSQL project to convert from CVS to git. There's a good summary of what happened at Lessons from PostgreSQL's Git transition. With a pretty conservative development community, the bar for converting from CVS to git was set pretty high: the entire CVS repository had to come through, such that every single release ever tagged could be checked out and get exactly the same files as checking it out of CVS (a little binary diff tool was used to confirm). With around 15 years of history in there, that took some upstream fixes to the cvs2git tool to finally accomplish; it took just over a year to work out all the details to everyone's satisfaction. My checked out copy of the current repo is 272MB right now, so neither small nor giant.

I would say that everyone who works regularly on the code is at least a little bit more productive than they used to be, with the older CVS experts having seen the least such improvement. But some people are a whole lot more productive. I'd put myself in that category--my patch contribution rate is way up now that it's so much easier to pop out a branch to hack on some small thing and then submit the result for review.

And the conversion seems to have improved the uptake of new developers getting involved in working on the code. Having to deal with CVS was a major drag for younger developers in particular, and Subversion is equally foreign to most of them now. As suggested in the article, anyone under 25 will only touch a corporate style CVS or Subversion repo if dragged kicking and screaming into it. As more of that generation rises through IT, old style repos will continue to get shredded at a good rate every year. It could have been any of the DVCS systems that ended up in this position, but git was the one that got the right balance of feature, innovation rate, and publicity. Now that it's got such a wide user base, too, I don't see any of the other VCS software options competing with it successfully in the near future.

Re:It's because by m.dillon · 2011-07-26 18:39 · Score: 3, Informative

Yes, but at the same time I only recall a few minor instances where I ever wanted to extract just a portion of a CVS archive, and the only reason was because, at the time, the system I was running on wasn't all that fast.

These days extracting a repo, even a large one, doesn't take all that much time, nor is disk space that big an issue. I just extract the whole thing (git, cvs, whatever) and then pick out what I want.

It only takes ~3 seconds or so to switch branches on a checked out repo of around ~100,000 files, and certainly less than ~10 seconds to do an initial checkout of such a repo. Not to mention the fact that 2TB hard drives are $100 these days so there's no real excuse to be tight on disk space.

When I first started using git I did worry somewhat about disk space. I quickly came to the conclusion that a few extra gigabytes didn't matter in today's world of cheap multi-terrabyte hard drives. I typically have 4-5 copies of the DragonFly source base broken out, each with its own copy of the .git repo. A simple git pull is all I need to synchronize whatever directory I've decided to work in (since I'm often reviewing other developer's branches I have multiple independent copies). That's how little I care these days.

That said, it *is* possible to tell git to hard links or otherwise share repo files in order to reduce the size of the .git/ subdirectory in the checkout directories. We do this on our developer box (where each account is given its own private repo which syncs against the DragonFly master repo). I don't bother optimizing my own personal copies though.

And one final thing to note... if the filesystem can de-duplicate data, having a lot of copies lying around is even less of an issue. I've never had to depend on de-dup... it's kinda hard to actually run a 2TB drive that isn't being used to archive media files out of space... but it does work particularly well on backup machines.

-Matt

Re:Mercurial by euroq · 2011-07-26 19:01 · Score: 4, Insightful

Mercurial's most touted advantage is that it's easier to learn, but this is a joke. If you develop, you interact with the version control system all day. A tiny advantage in learning it faster is nothing compared to not being able what you want to do afterwards, or having to redo something because the version control works against you instead of with you.

I work at a company that has used Git professionally. My team isn't dumb people, but they have fucked up with Git dozens of times. What I quoted is an okay argument at a personal level. However, there is something to be said as an organization that having an easy-to-use tools is better.

I am not making the argument that either Mercurial or Git is better; I am making the argument that tools which are easier to use will lead to less fuck-ups in an organization.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.

Re:It's because by buchner.johannes · 2011-07-26 19:54 · Score: 5, Informative

see "git clone --depth"

--
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.

Mercurial for the Win by cowwoc2001 · 2011-07-27 02:36 · Score: 3, Informative

Mercurial has 95% of Git's functionality and is far easier to use. The extra features are simply not worth the headache.

Git's Windows support is atrocious. The installation process is an easy indication of that. Mercurial is packed of "just works" moments.

21 of 442 comments (clear)