The Rise of Git

Github? by Sorthum · 2011-07-26 16:08 · Score: 2

This may be due in part to the way github integrates social networking and coding-- I'm unaware of anything similar for SVN, Perforce, Bazaar, Mercurial, etc...

Re:Github? by Jimbookis · 2011-07-26 16:17 · Score: 2

This may be due in part to the way github integrates social networking and coding-- I'm unaware of anything similar for SVN, Perforce, Bazaar, Mercurial, etc...
What's that you say? Social networking and Git?! Now there's an idea! I'll go set up a new site - I'll call it GitFace! Who's in with me on the IPO in 2 years?
Re:Github? by mgiuca · 2011-07-26 16:37 · Score: 4, Funny

Well that's really what GitHub is ... much like Facebook treats every "object" (status update, photo, event) as a commentable, likable object, so does GitHub for VCS objects such as commits.
It's quite funny to see a commit with a comment thread attached to it. I saw one that went viral and ended up with 88 comments including meme images.

Re:It's because by shadowrat · 2011-07-26 16:13 · Score: 2

yes. it's better than everything except mercurial, of course.

Bazaar by mgiuca · 2011-07-26 16:16 · Score: 5, Insightful

Yet another DVCS article that doesn't mention Bazaar at all. And cursorily swats away Mercurial as "not as large a community."

It seems like just about every argument in favour of Git could be equally applied to any other DVCS. On top of that, the only thing it has going for it is a larger community (and being the creation of Torvalds).

I've argued to wit's end that Bazaar is superior to Git in a multitude of ways (branches as separate file-system directories, optional ability to work in bound mode as with Subversion, revision numbers, explicit notion of a 'trunk' versus merged branches, explicit moves/renames rather than heuristics, commit metadata). Basically Bazaar has a much richer data structure than Git. The last point (commit metadata) is crucial: because Git lacks commit metadata, it is impossible to meaningfully use any other revision control system in conjunction with Git -- what a selfish decision.

Yet all I ever hear is "Git is better than all the other revision control systems because [generic reasons why DVCSes are better than centralised ones]." Such is the case with Scott Chacon's site Why Git is Better Than X, which I wrote a rebuttal of at Why Git Ain't Better Than X.

Re:Bazaar by nschubach · 2011-07-26 16:27 · Score: 3, Informative

The main problem I've had with Bazaar is the lack of tool options. Of course, that's not really Bazaar's fault...
With Mercurial, I have tortoisehg for Windows and a very nice plugin for Eclipse. Bazaar's Eclipse integration has been rather lacking and the Windows tool chains have been slowly filing in, but it still needs time to level the field. (I'd work in Linux at work if it was an option... but it's not.) I still use Mercurial on my Linux laptop for local version management though.. mainly because it's what I use at work and there's no jumping between different keywords and methodologies.

--
Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
Re:Bazaar by mgiuca · 2011-07-26 16:44 · Score: 2

From what I know about git-notes, they are a relatively new concept and aren't very well supported. Apparently they don't survive being pushed and pulled by default (that's worse than it may seem -- it isn't sufficient for me to ensure they are pushed; everybody who takes a copy of the repo has to as well).
Integrating one VCS persistently with another relies on storing foreign VCS data in the metadata of the primary system -- if the primary system loses the metadata you are hosed. So for now, using Bazaar (or anything else) as a persistent Git client doesn't work, despite the best efforts of the Bazaar folks (who wrote bzr-git). I hope that one day git-notes is mature enough to support this, but it seems like it would require a backwards-incompatible change to the Git system.
Re:Bazaar by mgiuca · 2011-07-26 17:14 · Score: 2

An important distinction with Subversion (which I think handles branches atrociously): In Subversion, branches are separate folders within the repository; a repository has a single linear history with branch directories inside of it. In Bazaar, branches are separate folders outside of the repository; each branch has its own history and branches are treated as separate objects. Sorry if I didn't make that clear.
As for revision numbers, on the contrary: they aren't really meaningful until you start doing non-linear history (merging). Once you start merging, it is very nice to be able to see "oh, this is revision 173 and its parent is revision 172, but it also had a merge from a branch revision 168.1.32," as opposed to Git where all you can say is "oh, this is revision 0f3bc3df and it has two parents: 83c7ac39 and e337cb8a; I don't know which one is the actual trunk history and which one is the branch commit."
Re:Bazaar by petermgreen · 2011-07-26 17:26 · Score: 2

Why? DVCS systems are great for bazaar style open source projects like linux but I don't think they are appropriate for every case. At least with hg anyone who wants to work on the code has to download the entire history of the entire repositry. That is fine if the codebase is relatively small and the users can find a fast connection for initial checkout. Not so great if you are trying to track the complete history of a large project including all the tooling needed to successfully build it.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Bazaar by euroq · 2011-07-26 18:39 · Score: 3, Interesting

Your cite branches in separate folders as a superior feature? Subversion has that crap and it's messy.
High five brah! I hated that shit when I had to use SVN.

You also mention "revision numbers" as a feature. Who gives a crap about revision numbers?
As a person who has professionally used Git (and likes it) in many repositories, both large and small, I miss revision numbers. When I e-mail my colleagues about commits (i.e. revisions in another syntax), it is annoying to communicate it: I have to copy and paste the entire commit metadata (name, author, date-time, and SHA) in order to reflect it properly. I would much rather say "15.1" or "newfeaturebranch.3".
It also has been a problem when using the "Blame..." tool to track down why something was changed, as you have to do a double-lookup to find the SHA of the changed line and then the commit of the SHA. Well, my bad because really you'd have to do the same thing with a revision number, but trust me... I'd rather be working with revision numbers than SHAs because it would be easy to know that this line was changed because of "newfeaturebranch" rather than e93f02a09f9fe092a039a923.
Obviously, we don't need revision numbers... but boy would I love to have them.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.
Re:Bazaar by mfwitten · 2011-07-26 21:45 · Score: 2

Why do you NEED to copy information other than the SHA1 in order to reflect the commit properly? Also, why can't you just use shared tags to make it easier to discuss particular commits?
Moreover, you don't need to type in the entire SHA1; any abbreviation will do as long as it is unique among the other SHA1s known in the repository.

Names and such by girlintraining · 2011-07-26 16:19 · Score: 2

Sometimes after spending a whole day amongst non-geeks, doing non-geeky things, I come here and read the names of some of the things these technologies are named.

Git? Ruby? Subversion? Eclipse?

I get this distinct hillbilly feeling after reading some of the names the open source community has come up with for their projects of late. Mental images of tie-clad programmers in a rusted pickup truck waving corded mice over their head while techno music plays kind of images. Then I hit page reload, and the feeling fades... until I think of Richard Stallman.

--
#fuckbeta #iamslashdot #dicemustdie

Lack of tooling by Luthair · 2011-07-26 16:27 · Score: 2

The lack of decent tools is going to slow adoption of git, particularly in corporations. I've yet to see a tool that can handle even a simple daily workflow so I've stuck to cli, gitk and git-gui which are all clunky. egit has definitely improved it still feels out of place and I believe is missing features (does it even support autocrlf yet?)

Corporate projects will almost certainly have a centralized repository and while git can deal with this, its possible to paint yourself into a corner where its painful to recover.

For reference, I've been a daily git user for ~16 months both at the company I work for and as a committer at Eclipse.

Re:It's because by bgat · 2011-07-26 16:30 · Score: 3, Informative

I'm not aware of any _code_ sets which span 50GB, and it seems unlikely that you could get to that size without a lot of machine-generated content. Such content wouldn't be ideal for git to manage, since git depends a lot on the capabilities of diff--- and machine-generated content might not diff as effectively as human-generated code. You can hardly fault a tool for doing a poorly at a job it wasn't designed to do.

Is the content you are managing that you describe as "50GB+" actually human-generated _code_? Or is it _data_? There is a big difference to git.

On the other hand, git manages the complete Android source code. It isn't "50GB+", but it is still substantially larger than the Linux kernel--- and git does fine. However, Google breaks that code base up into 150+ sub-repositories, which is actually a quite sane thing to do. I haven't tried to place Android into a single git repository, so I can't say how well git would deal with something that large. But it wouldn't be the best way to use git, anyway.

So I think your negative review of git is uninteresting, to say the least.

--
b.g.

Data, Images, Binary builds etc. by syousef · 2011-07-26 16:48 · Score: 4, Informative

Exactly who the fuck has 50GB in one source code tree?

--
BMO

Those who store data, images, other binaries like built executables and other artifacts alongside the code.

You can argue that you shouldn't do that, but there are times when it's difficult to avoid, and if you need to be able to keep versions, it can be done easily with something like SVN.

I think GIT has it's advantages, but to reject all predcessors and raise it up as the only way to go is foolish.

--
These posts express my own personal views, not those of my employer

Re:Data, Images, Binary builds etc. by siride · 2011-07-26 17:03 · Score: 3, Informative

I store binary data in my Git repos and it doesn't seem to balloon as badly these days as people make it out to be.
In fact, Git seems to be good enough at it that I use it to do application releases. It's faster for me to build an app, commit it to a special Git repo and push the new commit, than to send it via SFTP over the VPN (a few hundred KB vs dozens of MB). In that repo, I have hundreds of new versions, but it's only a few hundred megabytes. The equivalent in file storage would be easily ten times as much.
I don't doubt that other VCSes do better, but Git is not awful.
Re:Data, Images, Binary builds etc. by 0123456 · 2011-07-26 17:30 · Score: 3, Insightful

Storing large volumes of binary data in an archived fashion is a job for a filesystem, not a CVS. A CVS is not intended as a backup solution, nor should it be used as such.
So when I want to release a critical patch to an old version of our software I shouldn't just be able to extract everything from the repo, make the change and build the release installer, I should also have to find where any required binary files for that release were stored and copy them to my machine and hope that no-one deleted them in the meantime?
I really know very little about git, but from the numerous 'Git doesn't do X, Y, Z', 'But you shouldn't be doing X, Y, or Z!' posts here I don't see a reason why I would I want to.
Re:Data, Images, Binary builds etc. by gbjbaanb · 2011-07-26 21:09 · Score: 3, Insightful

and you've just screwed the entire configuration process there.
The whole point of a SCM is that you put your sources in there so you can check it out and get the same set of sources from any point in history. The moment you say "oh that's too big, we'll put it somewhere else" is the day you lose control of that reproducability.
Your images used in the app are part of the source. While there's a place for storing data elsewhere, it still has to be controlled in a way that you can get it back out again for a particular version.

Re:I don't Git it.... by dwarfsoft · 2011-07-26 17:02 · Score: 2

My main issue with Git was it's Unicode filename support under Windows. Quite frankly it's broken. You add and commit your "Unicode Filenames" fine in Linux or Windows, but if you ever check them out under Windows they are renamed to a weird character set and require being re-added and checked in with their new path.

SVN doesn't have this problem (using TortoiseGit and mSysGit vs TortoiseSVN). I stopped using Git after I encountered this and have reverted to using Subversion until this has been resolved. If I had some time I might look into it, but seeing as it seems to be a known issue in mSysGit and the underlying cygwin(?) libraries I doubt I'd ever manage to resolve this myself without breaking something else.

Personally I like the Git Workflow, especially the commit-early-commit-often mentality that I never managed to get to in svn. It's just a shame that I can't use it seamlessly on Windows to complement my use on Linux.

--
Cheers, Chris

Re:Mercurial by MemoryDragon · 2011-07-26 17:08 · Score: 3, Interesting

Mercurial is not really superior, it is a subset of about 80% of hits functionality baked into a nicer command line set. Btw. Mercurials strong side is really the relatively clean command line outside of that both systems are so close it is eery.

Re:It's because by ls671 · 2011-07-26 17:09 · Score: 3, Funny

and saturnial is even better of course although I hear that aluminiumal is doing well too ;-)

--
Everything I write is lies, read between the lines.

Re:I don't Git it.... by Chibi+Merrow · 2011-07-26 17:10 · Score: 2

but to me the pure syntactic overhead of dealing with full repo URLs makes it a much bigger pain than it should be.

Try using carat (^) in a recent SVN client. If you're in a working directory, it's a stand-in for the base repository URL. so svn+ssh://foo.bar.biz/svn/widget/trunk could be written as: ^/trunk

--
Maxim: People cannot follow directions.
Increases in truth directly with the length of time spent explaining them

Git could use revision numbers by euroq · 2011-07-26 17:11 · Score: 3, Interesting

Since the idea behind Git is that since it is distributed, and doesn't need a master repository, I guess it didn't make sense to have revision numbers when it was created (for the Linux kernel). This is because when two people make separate revisions at the same time on their local repositories, a linear revision number would conflict.

However, I've never actually used any Git project/repository which didn't have a master repository. This is both local repositories for my own projects on my Dropbox folder, and professional repositories I've used (Android and the various repositories at the company I work at), And especially at work, it has been annoying that we didn't have revision numbers.

I wish Git would get a new feature added: the ability to assign a repository as the "master" repository, and in turn the ability for the master repository to assign revision numbers. If people are wondering how that would work considering people make commits on their local repository and then push them to the master causing possible conflicts, the revision numbers wouldn't get assigned until they hit the master branch and they also split it up for merges:
5
/ \
4.1 4.2
\ /
3
(or something similar to the above)

Lots of people who use an alternative VCS like Mercurial, Bazaar, etc., bitch about Git because the lack of revision numbers. To those who are unfamiliar, each commit in Git has a SHA1 hash which is used as an identifier instead of a revision numbers. Unfortunately, they are very unwieldy to communicate to others. At work we always use the name and date-time instead, but that has problems as it doesn't convey the branch for instances when it matters.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.

Re:Git could use revision numbers by mgiuca · 2011-07-26 17:23 · Score: 2

That would certainly help. I was extolling the merits of Bazaar's revision numbering scheme on my post above (titled "Bazaar"). The problem as I see it is that unlike Bazaar, Git doesn't assign any significance to the master parent of a commit. I'll shamelessly rip some text from my own explanation above:
For example, in Git, when you commit a merge (say, from a feature branch to master), you create a commit object with two parents, in no particular order: a) the most recent commit on the master, and b) the last commit on the feature branch. Looking back at that commit, all you see is a commit called 0f3bc3df with two parents: 83c7ac39 and e337cb8a: you can't say which of those two parents was the previous "stable" version and which was from the feature branch. In Bazaar, in the same scenario, the parents are specified explicitly as the primary parent and the merged commit. So you would see a commit called 173 with two parents: 172 and 168.1.32 -- you know that the parent called 172 is the version immediately proceeding 173 in the trunk, while 168.1.32 was the last commit of a feature branch.
The difference is that in Bazaar, a merge is asymmetrical: you merge from one branch to another, and it makes a difference: the "to" branch (typically the trunk) gets to keep its original revision numbering scheme, while the "from" branch gets relegated to the x.y.z scheme. In Git you merge between two branches (the order does not matter) and then commit the result to one of them (typically the master).
I would like a scheme where Git places significance on the to branch like Bazaar does, but then the Git fanatics would cry that it's possible to break things by merging the wrong way (which is certainly possible in Bazaar, but I think it's better than not having that metadata at all).
Re:Git could use revision numbers by m.dillon · 2011-07-26 17:32 · Score: 2

Yes, for a git user the sha key is effectively the commit id / revision number, and it works incredibly well. I don't miss the crazy multi-dotted revision numbers from e.g. CVS, or even the simplified version numbers from svn, or anything else. The sha commit id works so well in git that our kernels include the first few digits of it in their version string printed out in the dmesg, which makes figuring out the basis for a bug report very easy.
Our use of git effectively has a master repo as well, and it is kept very clean relative to developer's local repos which have all of their local development branches.
But I think the most important feature is the utterly trivial incremental replication git supports. When we ship a new release we just include the current git repo in the disk image. If someone installs from that image they can then update their on-disk repo incrementally using the shipped repo as a base and only pull down a small amount of data over the network verses having to download the entire repo.
The incremental replication is also extremely reliable when using the git server feature (git://...), night and day compared to trying to distribute a CVS repo. My CVS repo syncnronization scripts are ridiculously complex. rsync, find the most recent change, rsync again over and over again until the repo is found to be stable and even then there is no guarantee that you have a stable copy. (and no, cvsup doesn't work too well either).
Being able to have a chain of git repos from a single master which are incrementally updated in a reliable fashion makes distributing code bases utterly trivial, and being able to ship a git repo and then incrementally update it over the net to bring it up to current is priceless. It's impossible to replicate with server-only repos.
I have no regrets switching the DragonFly project over to git.
Even dealing with pkgsrc is a lot easier in git than it is with CVS. We want to make pkgsrc available to our users but the master repo (which is in NetBSD's CVS repo) is impossible to keep synchronized using CVS, let alone be able to distribute to our user base in an efficient fashion. So what we do instead is track the CVS with a cron job and dump it into a git repo which we distribute to our userbase instead. The scripts are complex, but they work quite well and we can use the same trick of shipping out the current pkgsrc set as a git repo and have users simply do a small, simple, short, low-bandwidth incremental update to synchronize it with the latest available data.
-Matt
Re:Git could use revision numbers by EvanED · 2011-07-26 18:25 · Score: 2

That's precisely why most DVCS don't use version numbers, but you'll also notice that the poster who started this thread proposed having a master repository which sets the numbers.
You'll also notice I didn't say "DVCS should have version numbers" in my post, I said "here are the drawbacks with the fact that DVCSs don't (usually) have version numbers."
Also you could look at Bzr... another poster in this thread has elaborated on the way it does numbering in a distributed setting.
Re:Git could use revision numbers by mfwitten · 2011-07-26 22:15 · Score: 3, Informative

Now, Git works around this mostly, because you can say 483b3ced^ to go to the previous revision (and actually SVN supports this too because you can say HEAD^). But it's not a full solution. What's the next revision? Git doesn't have a way of getting you that information.
That question doesn't make any sense, because in absolute terms, there are an indeterminate number of immediate children across every repository and across time.
There are a number of things you could do to narrow the search (such as something like git log --reverse -1 483b3ced..master, but ultimately you'd have to account for merge commits there, too; perhaps the --children flag for git rev-list and git log might be useful).
Re:Git could use revision numbers by minkie · 2011-07-27 00:31 · Score: 3, Interesting

Revision numbers are easier for humans to deal with. For example, here's a common flow I use every day:
$ hg pull
$ hg log | less { see some change I'm curious about and note the change number }
$ hg export 3742 | less
With change numbers, it's easy to remember 4 (or even 5) digits for the time it takes to type the export command. If I only had hashes, I'd have to copy-paste the string. Things like this matter less to people who only use GUIs. For command line folks, however, being able to easily read, pronounce, remember, and type change numbers is essential. Even if you're just talking with other people, it's a lot easier to say, "Oh, I see what happened, in change 2456, you did..." than to refer to hash strings.
I've used rcs, cvs, clearcase, perforce, dabbled in svn, dabbled in git, and am currently using hg. Of the centralized bunch, perforce is my favorite (not free, but reasonably priced and amazingly excellent tech support). I can't see anybody wanting to use svn for any new projects today. When it first came out, it was a a significant improvement over cvs and people naturally flocked to it, but there's just so many better alternatives today.
Clearcase is an interesting beast. For sure, it's overpriced bloatware that's on life support, being kept alive mostly by big legacy customers with neanderthalic IT and Release Engineering departments who still believe IBM can do no wrong. But, it did have some interesting ideas. That every revision of every file exists simultaneously in the file system namespace is really powerful.
Between git and hg, I'd say they are fundamentally identical in capability, but I find the hg command set easier to get my head around. All the people who say, "X is the best possible vcs. I used to use cvs and when I switched to X my sex life improved overnight", fail to understand that "X is way better than cvs" is true for pretty much any value of X, and says nothing about the relative merits of the various X instances.
Re:Git could use revision numbers by mfwitten · 2011-07-27 12:54 · Score: 2

how do I get a list of every commit ever made to a Git repository, on any branch? I have no clue. With Subversion, it's easy -- you just ask for the log higher up the tree.
Like this: git log --all
RTFM :-)

Second, what if I'm only interested in a single branch? There there can't be more than one child, because the second child is on another branch.
What you call a `branch' here is not what git calls a `branch'. In git, a branch is simply a pointer (as a nice human-readable name) that points to a particular commit object at any given time, and this pointer's value can be changed at any time; it just provides a convenient way to refer to a commit, and in practical terms it is all that is necessary to be productive (tags take care of the rest).
What you're describing as a `branch' is a much more absolute concept, something that would better be called a `linear ancestry' or `line [of development]', and calculating that line is the crux of the problem; you're begging the question by assuming you already know exactly what you want to calculate.
Consider the history in Figure 0 (I would have inlined it, but Slashdot's commenting system erroneously flags it as spam).
In git's terminology, branch `master' as shown in the diagram just [currently] means commit `D0', and branch `some-branch' just [currently] means commit `E'. There are 3 linear ancestries or lines (see Figures 1, 2, and 3 at the same link).
As you can see, the first 2 lines are traversabe from the same git branch, `some-branch'.
Now, let's say you want to know "the" child of commit `B'. Firstly, which line are you talking about? Well, you must already know which line (or set of lines, as you'll see) that you want to consider. As per my last comment, you could do something like the following (where I've simply replaced `483b3ced' in the original with `B'):

git log B..master --reverse -1
The command says:
* git log: show me log information for...
* B..master: the set of commit objects reachable from `master' but not reachable from `B' (this would be {D0,C0}),
* --reverse: order that set from oldest ancestor (C0) to the newest descendant (D0), but...
* -1: display only the first in that ordering (C0).
Now, in this case, you can actually be a bit more ambiguous with your lines and still get the same answer:

git log B..some-branch --reverse -1
The command says:
* git log: show me log information for...
* B..master: the set of commit objects reachable from `some-branch' but not reachable from `B' (this would be {E,D0,D1,C0}),
* --reverse: order that set from oldest ancestor (C0) to the newest descendant (E), for a total ordering of something like (C0,D0,D1,E), but...
* -1: display only the first in that ordering (C0).
What about the third line of development, though? Assuming you know (unlikely) that `D2' comes after `B' in a line, consider:

git log B..D2 --reverse -1
The command says:
* git log: show me log information for...
* B..master: the set of commit objects reachable from `D2' but not reachable from `B' (this would be {D2,C1}),
* --reverse: order that set from oldest ancestor (C1) to the newest descendant (D2), but...
* -1

Re:It's because by wagnerrp · 2011-07-26 17:13 · Score: 3, Informative

With git, you have no option to pull the entire repository, and all of its data, and all of its history. Aptly described by the command, you have your own local clone of the whole thing. As such, with larger projects, it becomes necessary to break the repository up into smaller, more manageable submodules. If using subversion, or some other version control system where you 'check out' rather than 'clone', it becomes possible to simply pull the current version of just the directory you want to work on. In essence each folder is automatically made a submodule.

Both strategies have their advantages and disadvantages. Every programmer is going to have their own style of work, which will be better suited towards one VCS or another. Claiming git is the perfect VCS for all occasions, as the OP did, is simply naive.

Re:Tower, GitHub, GitX client (Some Mac only) by syzler · 2011-07-26 17:42 · Score: 3, Interesting

Let's not forget that Xcode 4 uses Git by default and is tightly integrated into the interface. Examples being

* Xcode creates a git repository by default when creating a new project
* When saving a file, Xcode will place a "M" marker next to a file to indicate it needs to be committed
* Re-naming a file in Xcode will perform the rm and add operations automatically in Git
* Xcode allows you to view the current version and past versions side by side in the editor

PostgreSQL CVS-git conversion by greg1104 · 2011-07-26 17:46 · Score: 4, Interesting

I had a small role in getting the PostgreSQL project to convert from CVS to git. There's a good summary of what happened at Lessons from PostgreSQL's Git transition. With a pretty conservative development community, the bar for converting from CVS to git was set pretty high: the entire CVS repository had to come through, such that every single release ever tagged could be checked out and get exactly the same files as checking it out of CVS (a little binary diff tool was used to confirm). With around 15 years of history in there, that took some upstream fixes to the cvs2git tool to finally accomplish; it took just over a year to work out all the details to everyone's satisfaction. My checked out copy of the current repo is 272MB right now, so neither small nor giant.

I would say that everyone who works regularly on the code is at least a little bit more productive than they used to be, with the older CVS experts having seen the least such improvement. But some people are a whole lot more productive. I'd put myself in that category--my patch contribution rate is way up now that it's so much easier to pop out a branch to hack on some small thing and then submit the result for review.

And the conversion seems to have improved the uptake of new developers getting involved in working on the code. Having to deal with CVS was a major drag for younger developers in particular, and Subversion is equally foreign to most of them now. As suggested in the article, anyone under 25 will only touch a corporate style CVS or Subversion repo if dragged kicking and screaming into it. As more of that generation rises through IT, old style repos will continue to get shredded at a good rate every year. It could have been any of the DVCS systems that ended up in this position, but git was the one that got the right balance of feature, innovation rate, and publicity. Now that it's got such a wide user base, too, I don't see any of the other VCS software options competing with it successfully in the near future.

Re:Eclipse has adopted Git [for] for Eclipse proje by greg1104 · 2011-07-26 17:54 · Score: 2

Meh, call me when there's tortoisegit, and by then it will be too late.

You missed the call, it was a while ago. I considered TortoiseGit mature enough to use around V1.3, which was January of 2010. The upward spike in downloads shown on their page, which really took off around V1.2, shows quite a few people agree.

Re:It's because by m.dillon · 2011-07-26 17:54 · Score: 2

Well, you should post with your real name if you are going to make such an encompassing statement, instead of anonymously. I'm kinda wondering what repo management tools you are using that can handle 50GB+ data sets that you are trying to compare against something like git?

From my experience handling large data sets is less a function of the repo and more a function of disk bandwidth and memory. Putting, say, a million files into a repo (any repo) is not a big deal but managing it will definitely be dependent on the operations you are trying to do, memory, and storage.

I've found, in general, for the DragonFly project which manages ~500MB and ~1GB repos, that regardless of the repo if you want operations to be efficient you need to have a high speed caching layer helping out the filesystem. For DragonFly, of course, that means having a SSD in the system and using the swapcache feature to cache filesystem data and meta-data on the SSD. Then repo operations run fast regardless of the repo system used.

A nominally priced SSD can cache 100G of data fairly cheaply, so handling a large repo isn't a big issue. In our case we have two machines which keep about a dozen repos from different projects synchronized, in order to make them available to our developers, as well as perform incremental translations from CVS to GIT for pkgsrc (which is an ultra-nasty script). The scripts run twice a day and have a run-time of around an hour with the SSD caching layer. Without the SSD caching layer those scripts will take 6+ hours of time to run (12 hours a day of run time if I run them twice a day). The SSD makes a huge difference in manageability.

In terms of trying to manage fewer larger files, such as images... large numbers of binary files are best managed outside the repo infrastructure. Sure, a few here and there (such as a web site's icons) can easily be managed inside a repo, but trying to manage large amounts of bulk data in a repo generally just results in a lot of unnecessary pain. It's better to manage bulk data in a filesystem capable of performing snapshots.

Similarly for backups... repos aren't good mechanisms for making backups. You want something more closely integrated with the filesystem (and the filesystem's snapshot capabilities presuming you are using a filesystem with snapshot capabilities) to do LAN and off-site backups. Not a repo.

So the question here is: Are you complaining about the amount of time it takes to do an operation due to being disk-bound, or are you talking about bugs in the repo system causing the program(s) to crash or eat too much memory? I haven't had any significant memory issues with git myself though I can definitely see needing a 64-bit VM space if the repo becomes large enough for certain operations.

-Matt

Re:I don't Git it.... by man_of_mr_e · 2011-07-26 18:08 · Score: 2

I wasn't talking about the command line. The command line is probably roughly equal, annoying wise that is..

I'm talking abou the GUI tools that are currently available. They suck, and doing tasks like cherry picking files is a pain in the but. Of course, the fact that there's a term called "Cherry Pick Commit" that has nothing to do with "Cherry picking" files for commit.. might be part of the reason... You are right, though.. not having to checkin all files in one command is nice.

My beefs with GIT include some of yours. The huge amount of time to download. With SVN, you just download the latest.. but with Git, you have to download the entire change tree locally. Also, i find the git terminology to be (what seems to me) deliberately obscure. Terminology that has been in use for decades is changed for no apparent reason, other than to say "Hey, we're different". This leads to making mistakes when you confuse similar terminology between systems that do different things.

My other major beef is that, while it's nice to be able to do version control disconnected, I dislike having my check ins local.. version control is also a "save my ass", and if my laptop takes a trip down a flight of stairs, anything that's not pushed is lost as well.

--
If you need web hosting, you could do worse than here

Re:I don't Git it.... by EvanED · 2011-07-26 18:22 · Score: 2

I'm talking abou the GUI tools that are currently available. They suck, and doing tasks like cherry picking files is a pain in the but. Of course, the fact that there's a term called "Cherry Pick Commit" that has nothing to do with "Cherry picking" files for commit.. might be part of the reason... You are right, though.. not having to checkin all files in one command is nice.

So I can't speak to GUI tools on anything but Windows, but there's a TortoiseGit that functions nearly identically to TortiseSVN. It even (at least mostly) hides the index from you.

My other major beef is that, while it's nice to be able to do version control disconnected, I dislike having my check ins local.. version control is also a "save my ass", and if my laptop takes a trip down a flight of stairs, anything that's not pushed is lost as well.

That sort of gets back to the "it's easy to forget to push" problem. If you're not subject to this problem, then I disagree that there's much of a difference: if I lose work because I deliberately didn't push, that's because I don't have repository access, and then I'd have "lost" that work under Subversion anyway because I wouldn't have done it in the first place.

As for remembering to push, there is a problem there. Tortoise is nice because on the "yes, you've committed" dialog there's a nice "push" button staring you in the face, so it's pretty easy to remember there, especially if you get in the habit of pushing after every commit.

For the command line, I haven't found a perfect solution... I think I want to write a shell alias that will run git as normal, but if I said "git commit" will print out "don't forget to push!" when it's done. I haven't gotten to that yet.

And one of the two biggest repository tangles I've had to unravel had at its root the fact that I forgot to push from one copy of a repository, developed in another, and then tried to sync everything up. That took some time to even figure out what happened, and rather longer to figure out the best way to fix it.

That said, I've also had a time when I've left dirty copies of files sitting around in a Subversion working copy for months without noticing, and that caused a problem too.

TLDR I do think that this is a drawback of Git, but for me it's so drastically outweight by being able to work disconnected that it almost doesn't register.

Re:It's because by m.dillon · 2011-07-26 18:39 · Score: 3, Informative

Yes, but at the same time I only recall a few minor instances where I ever wanted to extract just a portion of a CVS archive, and the only reason was because, at the time, the system I was running on wasn't all that fast.

These days extracting a repo, even a large one, doesn't take all that much time, nor is disk space that big an issue. I just extract the whole thing (git, cvs, whatever) and then pick out what I want.

It only takes ~3 seconds or so to switch branches on a checked out repo of around ~100,000 files, and certainly less than ~10 seconds to do an initial checkout of such a repo. Not to mention the fact that 2TB hard drives are $100 these days so there's no real excuse to be tight on disk space.

When I first started using git I did worry somewhat about disk space. I quickly came to the conclusion that a few extra gigabytes didn't matter in today's world of cheap multi-terrabyte hard drives. I typically have 4-5 copies of the DragonFly source base broken out, each with its own copy of the .git repo. A simple git pull is all I need to synchronize whatever directory I've decided to work in (since I'm often reviewing other developer's branches I have multiple independent copies). That's how little I care these days.

That said, it *is* possible to tell git to hard links or otherwise share repo files in order to reduce the size of the .git/ subdirectory in the checkout directories. We do this on our developer box (where each account is given its own private repo which syncs against the DragonFly master repo). I don't bother optimizing my own personal copies though.

And one final thing to note... if the filesystem can de-duplicate data, having a lot of copies lying around is even less of an issue. I've never had to depend on de-dup... it's kinda hard to actually run a 2TB drive that isn't being used to archive media files out of space... but it does work particularly well on backup machines.

-Matt

Re:Mercurial by euroq · 2011-07-26 19:01 · Score: 4, Insightful

Mercurial's most touted advantage is that it's easier to learn, but this is a joke. If you develop, you interact with the version control system all day. A tiny advantage in learning it faster is nothing compared to not being able what you want to do afterwards, or having to redo something because the version control works against you instead of with you.

I work at a company that has used Git professionally. My team isn't dumb people, but they have fucked up with Git dozens of times. What I quoted is an okay argument at a personal level. However, there is something to be said as an organization that having an easy-to-use tools is better.

I am not making the argument that either Mercurial or Git is better; I am making the argument that tools which are easier to use will lead to less fuck-ups in an organization.

--
Just because the U.S. is a republic does not mean it is not a democracy. Democracy/republic are not mutually exclusive.

Re:It's because by buchner.johannes · 2011-07-26 19:54 · Score: 5, Informative

see "git clone --depth"

--
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.

try SmartGit by Barryke · 2011-07-26 20:18 · Score: 2

I find SmartGit more useful for day-to-day stuff.
I have TurtoiseGit installed (and it works) but i never use it. Having the correct icons show up in Explorer is nice though.

--
Hivemind harvest in progress..

Whatever happened to to Tom Lord's Arch? by Relyx · 2011-07-26 20:18 · Score: 2

Tom Lord, developer of rival Arch must be spitting blood at the success of Git.

I followed Arch's development back in 2004 and quickly lost interest. The last crazy thing I remember was Tom trying to build a home-brew LISP derivative *into* his version control system. It was going to revolutionise everything. He even wrote a long manifesto-cum-design document in three parts. At that point I gave up and moved to Subversion. I just wanted a modern version control system that worked.

Re:Whatever happened to to Tom Lord's Arch? by paskie · 2011-07-26 22:24 · Score: 2

Actually: http://wiki.bazaar.canonical.com/HistoryOfBazaar Bazaar pretty much evolved from GNU Arch, though it is of course a very different beast now and there is AFAIK no shared code - but the developers migrated there from a GNU Arch branch and they took some ideas with them, so it still can be seen as a spiritual successor. :-)

--
It's not the fall that kills you. It's the sudden stop at the end. -Douglas Adams

Re:Eclipse has adopted Git [for] for Eclipse proje by DrXym · 2011-07-26 20:18 · Score: 2

Git on Windows has gotten a lot better. You need to install msysgit and then TortoiseGit on top. With those two things you're more or less free to do everything visually. I think TortoiseGit is still rough around the edges compared to TortoiseSVN though.

In Eclipse you now also have top tier support for Git through the EGit plugin. This is sitting over a pure Java implementation of git called JGit (i.e. no need for msysgit). It works pretty well and in the manner you would expect if you've ever used a VCS with Eclipse before. JGit and also powers Gerrit which is a git server and web app that slots a code review & approval system into the workflow.

Re:because the others still suck by bigpresh · 2011-07-26 21:33 · Score: 2

I checked out the full repository of an open source project I have been tinkering with in both SVN and Git (libgdx). The SVN was MUCH larger than the Git repository on my hard drive (i think 33% more, but I can't remember).

I think the point being made was that, in Subversion, you can check out just a small part of the repository if you want to do so, rather than the whole thing. I'm not aware of that possibility in Git.

Mercurial for the Win by cowwoc2001 · 2011-07-27 02:36 · Score: 3, Informative

Mercurial has 95% of Git's functionality and is far easier to use. The extra features are simply not worth the headache.

Git's Windows support is atrocious. The installation process is an easy indication of that. Mercurial is packed of "just works" moments.

Re:It's because by m.dillon · 2011-07-27 06:26 · Score: 2

Well, I certainly was not expecting you to use RCS as a comparison point. RCS is utterly horrible when dealing with large data sets. Any modification to a file requires rewriting the entire rcs file and doing something like, oh, tagging, requires rewriting every single file in the repo. Every single one.

RCS is a very filesystem-heavy repo management system. Updates, checkouts, pretty much everything you do *except* single-file log displays are expensive. Such operations have to scan or access nearly every file in the repo and at least stat every file in the checked out tree. For large repos with hundreds of thousands of files RCS/CVS is nasty as hell.

Nor can you can you reliably mirror or replicate a RCS or CVS repo. Neither rsync nor cvsup are capable of reliably replicating a live, heavily used RCS/CVS repo. I've tried many times... I have to mirror the NetBSD CVS repo to get their pkgsrc into a git mirror and it takes a complex script to try to detect a point where the entire CVS repo is quiescent. Even with the quiescence check my script *still* has to do a full cvs checkout and an actual diff -r between the checked out CVS repo and the checked out git repo to catch occasional failures.

In short RCS/CVS is a mess. GIT is not a mess. With git you just use git-daemon and git:// URLs and you can get massive, reliable replication of the repo.

The only other issue involved here seems to be one of machine resources. But in today's world machine resources are cheap. Even a large 50G+ repo trivially fits on a sub-$100 2TB hard drive, and it takes only a moderately-sized SSD caching layer (~100G) to make the repo operations efficient. That's cheap enough that every developer can keep multiple full repos on their workstations.

In many respects the GIT concept has grown into its own by virtue of the greatly improved storage resources available on today's machines. In the 80's and the early 90's a centralized repo would have been far more important simply by virtue of the relative disk space required. In 2011 the relative disk space required for even a large repo is tiny.

-Matt

46 of 442 comments (clear)