Slashdot Mirror


10 Years of Git: An Interview With Linus Torvalds

LibbyMC writes Git will celebrate its 10-year anniversary tomorrow. To celebrate this milestone, Linus shares the behind-the-scenes story of Git and tells us what he thinks of the project and its impact on software development. From the article: "Ten years ago this week, the Linux kernel community faced a daunting challenge: They could no longer use their revision control system BitKeeper and no other Software Configuration Management (SCMs) met their needs for a distributed system. Linus Torvalds, the creator of Linux, took the challenge into his own hands and disappeared over the weekend to emerge the following week with Git. Today Git is used for thousands of projects and has ushered in a new level of social coding among programmers."

24 of 203 comments (clear)

  1. And yet, no one understands Git. by Anonymous Coward · · Score: 3, Informative

    I hear from so many people who love git, and also from so many people who see it as needlessly complicated to the point of getting in the way of getting things done. If that latter view didn't have any truth to it, this page wouldn't exist:

    http://git-man-page-generator.lokaltog.net/

    So, which is it? A useful tool, or simply a way for the the brightest technology people to feel smarter than everyone?

    1. Re:And yet, no one understands Git. by jones_supa · · Score: 5, Informative

      It isn't complicated. Check out Git - The Simple Guide.

    2. Re:And yet, no one understands Git. by kthreadd · · Score: 3, Interesting

      Very few people actually know their version control software. Most people know the basic commands, and that's the case for pretty much all of them. Git is not much different in that regard.

    3. Re:And yet, no one understands Git. by angel'o'sphere · · Score: 3, Insightful

      Use a GUI frontend like SourceTree from Atlassian, there is not much different in Git versus other systems.
      The difference only shows when you actually look into the file layout of a git repository ... and who is doing stuff like this (besides me) anyway?

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    4. Re:And yet, no one understands Git. by gbjbaanb · · Score: 3, Funny

      Its true, git is complex like Linux is - it suits the needs to Torvalds, but I think its popularity exceeds its ability, and many people use it without using it properly - for example a previous company I worked for used git for their SCM and I asked where the backups were I was told they didn't need backups because it was distributed and everyone had a copy of the repo... of course, that relies on everyone having a copy of each repo, or at least 1 other person having an up-to-date copy of each repo which wasn't the case. This kind of thinking wouldn't happen if there was more of a concept of distributed-but-from-a-central-repo. It needs the concept of a golden root from where everything else is sourced (and I know you can have this, but its more convention due to the distributed nature)

      Still, it ushered in a new style of version control that wasn't catered for before.

      Now we're seeing easier, more accessible systems, such as fossil by that attempts to bridge the gap between DVCS freedoms and centralised repositories and includes other useful features such as bugtracker in the SCM and still geared towards branches that are more collaborative than gits 'private playground' branches. (ie git is designed for people to work on their own and hopefully merge changes back, many other SCMs are designed for branches that are for common code worked on by several people and thus requiring less merging). Git works well because of how the Linux project is structured - a very large hierarchy, but starts to fall down in a small team where people don't have that arms-length working environment, or where they work on multiple branches at the same time (eg at work, I have my big feature and I have bug fixes that come and go regularly - git doesn't help in that environment unless I have multiple repos checked out)

    5. Re:And yet, no one understands Git. by Anrego · · Score: 3, Interesting

      As someone mostly in the "I dun get it" crowd, I'll say the problem for me is that I feel like while I can use it, I don't have a great deal of understanding as to what it's actually doing outside of the basics. I feel like I'm following a bunch of recipes that I know work.

      With svn (which admittedly I've used for many years and on sizable projects vs git which I've used for months and on small stuff), I feel like I have a really good grasp of the whole thing. Sure there are some subtle bits I don't know because I've never needed, but I know the important bits, and I feel like from that I can solve just about any problem I run into by understanding what svn is trying to do and why it's not working.

      I get that at least some of this is just inexperience, but I think even with experience, git seems far more complex and nuanced than svn, which has a relatively consistent way of working and a seems to have a much smaller set of features. I feel like I got comfortable with svn way faster, and at that point I was only mildly familiar with version control in general.

      I know I'm gonna get flamed for this, but just wanted to provide some insight into the mind of someone who hasn't jumped on the git bandwagon yet.

    6. Re:And yet, no one understands Git. by bobbied · · Score: 3, Interesting

      I hear from so many people who love git, and also from so many people who see it as needlessly complicated to the point of getting in the way of getting things done. If that latter view didn't have any truth to it, this page wouldn't exist:

      http://git-man-page-generator.lokaltog.net/

      So, which is it? A useful tool, or simply a way for the the brightest technology people to feel smarter than everyone?

      Personally, I'm in both camps. I both hate and love Git. I hate that I have to explain and provide "scripts" to developers that explain how the project uses Git and I love that I can manage my project in multiple ways, depending on my needs.

      Git's problems stem from it's Unix like command line basic user interface. It's not a surprise that they decided to go with this kind of interface, they where basically Linux developers after all. In the tradition of good CLI's, Git is full featured, meaning it does a LOT of things, or really it supports doing things in a lot of different ways. I love the flexibility. But, unless you understand what Git is doing for you under the covers you may not know which of the confusing commands in Git you need to use. If you don't understand how your project is using Git, it may be difficult for the newbie come up with the necessary commands to get things done.

      Personally, I end up writing scripts for my developers. I force them into following a set procedure to "check out" the source, do their local development and get their changes though the review cycles and into the main repository again. Developers don't like scripts like this and because it's a script that describes how they use git, they think they don't like git. What they really hate is being told exactly what to do...

      I love git because it allows me to control my project's source. It keeps local backups and my history on MY machine, but doesn't expose others to my mindless rambling commits unless I decide to push them. I love the flexibility to manage my source how I want too locally....

      So, IMHO, git is great and a curse at the same time. Much like AWK and SED where a huge boost to the Unix CLI (if you understand them) git is wonderfully complex and thus frustrating to learn. Git doesn't force you into a configuration management model, but lets you roll your own process. Being flexible is great but git doesn't stop you from shooting yourself in the foot so be careful and know how you want to manage your repo, figure out how to make git do that, document how it's done, and test your process.

      Remember, it's the PROCESS you need to have straight in your head. Just googling Git is going to cause you trouble because how THEY use git is unlikely to match how YOU want to use it. You got to know your tools and git is no exception but you REALLY need to know what you are trying to do with git.

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    7. Re:And yet, no one understands Git. by Guy+Harris · · Score: 5, Funny

      You have to understand the data-structure, how files, directories and commits are all content-addressable objects. The linkage of the commits by means of their id's must be understood.

      Git: the best file system anybody ever confused with a version control system. :-)

    8. Re:And yet, no one understands Git. by Yunzil · · Score: 4, Insightful

      You have to understand the data-structure, how files, directories and commits are all content-addressable objects. The linkage of the commits by means of their id's must be understood.

      See, here's the thing. Why should I have to understand internal data structures in order to use a piece of software? Imagine if you made a word processor and people found it difficult to understand, and you said, "It's easy once you understand that the words in the text are stored in a hash map along with a structure with various flags that encode things like whether it's italic or not." People would look at you funny and go back to using Word.

    9. Re:And yet, no one understands Git. by swillden · · Score: 4, Interesting

      I worked for used git for their SCM and I asked where the backups were I was told they didn't need backups because it was distributed and everyone had a copy of the repo

      This is only tangentially-related, but a good story, and it's been a few years since I posted it.

      About 20 years ago, I worked for a company which I shall not name, which used CVS as its source repository. All of the developers' home directories were NFS mounted from a central Network Appliance shared storage (Network Appliance was the manufacturer of the NAS device), so everyone worked in and built on that one central storage pool. The CVS repository also lived in that same pool. Surprisingly, this actually worked pretty well, performance-wise.

      One of the big advantages touted for this approach was that it meant that there was a single storage system to back up. Backing up the NA device automatically got all of the devs' machines and a bunch more. Cool... as long as it gets done.

      One day, the NA disk crashed. I don't know if it was a RAID or what, but whatever the case, it was gone. CVS repo gone. Every single one of 50+ developers' home directories, including their current checkouts of the codebase, gone. Probably 500 person-years of work, gone.

      Backups to the rescue! Oops. It turns out that the sysadmin had never tested the backups. His backup script hadn't had permission to recurse into all of the developers' home directories, or into the CVS repo, and had simply skipped everything it couldn't read. 500 person-years of work, really gone.

      Almost.

      Luckily, we had a major client running an installation of our hardware and software that was an order of magnitude bigger and more complex than any other client. To support this big client, we constantly kept one or two developers on site at their facility on the other side of the country. So those developers could work and debug problems, they had one of our workstations on-site, and of course *that* workstation used local disk. The code on that machine was about a week old, and it was only the tip of the tree, since CVS doesn't keep a local copy of the history, only a single checked-out working tree.

      But although we lost the entire history, including all previous tagged releases (there were snapshots of the releases of course... but they were all on the NA box), at least we had an only slightly outdated version of the current source code. The code was imported into a new CVS repo, and we got back to work.

      In case you're wondering about the hapless sysadmin, no he wasn't fired. That week. He was given a couple of weeks to get the system back up and running, with good backups. He was called on the carpet and swore on his mother's grave to the CEO that the backups were working. The next day, my boss deleted a file from his home directory and then asked the sysadmin to recover it from backup. The sysadmin was escorted from the building two minutes after he reported that he was unable to recover the file.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:And yet, no one understands Git. by pz · · Score: 2

      The team I was on was using cvs for a long time (quite successfully) and then switched to git. I could never use git without having a page of cheat-sheet notes in front of me. There were some good things about it, some really good things (the code merger was magic), but you had to stay on top of the state of your code in a way that CVS never required.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    11. Re:And yet, no one understands Git. by MechaStreisand · · Score: 2

      The problem with what you're saying is that Mercurial exists, and it can do everything that git can do with an easy to use interface.

      --
      Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.
    12. Re:And yet, no one understands Git. by Penguinisto · · Score: 2

      Step 2 - if you're not a working developer or in DevOps, you really shouldn't be using this thing, so, like, stop there. ;)

      Okay, just kidding. In all seriousness, Git can have a steep learning curve to the uninitiated. Then again, so can CG compositing/modeling, systems administration on a CLI-only install of any UNIX/Linux flavor you care to name, or even to beginners of Powershell on Windows.

      But then, like most things, I've found that after *using* the thing, it goes from impossible to tolerable, then to easy, then drop-simple. If you're a *nix sysadmin, many of the commands should already be familiar (git rm, git add, git mv, etc).

      Besides: You can always alias a lot of those commands and save yourself a lot of trouble/time...

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    13. Re:And yet, no one understands Git. by multi+io · · Score: 2

      Oh come on. The hex revision numbers are there because the programmer was too stupid or too lazy to figure out something people could actually use. Typical programmer attitude---code for other nerds, not normal people.

      No. git is a distributed version control system, which means that, among other things, operations like "commit" and "merge" that create new commits must operate purely locally, without synchronizing with any remote copy of the repository, and then, much later, when the user decides to push those commits to a remote copy, and other users push their new commits to the same remote copy, the remote copy must be able to tell which of the incoming commits it already had locally, which ones are actually new to it, and whether or not multiple incoming commits from different source repositories represent the same commit (for example because those two source repositories pushed to each other before) and thus must be collapsed into one new local commit, and which ones are different commits and this must be imported as separate local commits. The fundamental problem that the DVCS has to solve here is merging/synchronizing multiple directed acyclic graphs coming from different remote sources, all of which can independently add new nodes to their local version of the graph at any time without communicating with any of the other copies or any other sort of "central repository".

      This means that you have to have some sort of globally unique identifier for the nodes of the graph, and those identifiers must be creatable locally, using only information that's available in the local copy of the graph, and then still be unique across all copies of the graph that might exist elsewhere. That's what the SHA1 checksums achieve. They also have the nice feature that they're not random numbers, but actual checksums over the entire contents of the graph up to and including that node. But the fundamental issue is that you can't have human-readable commit identifiers like "1.2" or "1.4.1" because there is no central authority that could generate those names and guarantee that they're unique across all copies. Mercurial uses the same solution (they have a linearly increasing "commit number" on top of that, but those numbers are only valid locally, i.e. they might be different in each copy of the graph).

  2. Like Coca Cola, git is the real thing by Johnny+Loves+Linux · · Score: 5, Interesting

    As a software developer who's been a git user for 7 years, I don't know how I could have written any serious code without git. Branching and merging is trivial. Cloning is trivial. The staging area makes choosing what to commit trivial. git rebase makes life much easier when it comes to reordering/editing/removing commits out of the history. git blame --- such a nice tool. Binary searching to find bugs is trivial. Every git tool is documented to within an inch of its life.

    And the icing on the cake? Code cowboy hates git. Like sunlight or garlic to a vampire, Code cowboy abhors git. He can't hold the source code hostage to his every brain damaged whim. He can't hose anybody with a distributed version control system. It's no wonder why Code Cowboy is always yapping away at git -- he can't show off his genius if his code can be ignored.

  3. Let's not forget Mercurial by Digana · · Score: 3, Informative

    Let's not forget the other contender for replacing Bitkeeper: Mercurial. We will also be celebrating its 10th year anniversary next week during the Pycon sprints.

  4. this is really a story about.. by Anonymous Coward · · Score: 5, Insightful

    how bitkeeper fucked up and was swiftly relegated to irrelevance... you have to wonder how many of these are even still using bk......

  5. Re:The real story by TheCarp · · Score: 2

    I think you missed the part where one developer reverese engineered how the protocol worked and the developer had enough of a shit fit that it became apparent that continuing to use their software was going to be problematic AND it was already the case that many people didn't want to use it for issues of licensing.

    Even so, nobody is under any obligation to keep using a tool someone else makes, even if they like it. He put in the work. Nobody has some right to have others continue to use a service that they don't need and can handle for themselves.

    --
    "I opened my eyes, and everything went dark again"
  6. Re:The real story by Anonymous Coward · · Score: 3, Informative

    Also, Git is and was an improvement over BitKeeper from a purely technical standpoint. Merging was easier, particularly with file renames. And Git was more performant. Here's a contemporaneous comparison from 2005, only one month after Git was publicly released:

    http://www.selenic.com/pipermail/mercurial/2005-May/000334.html

    BitKeeper sucked compared to both Git and Mercurial at the time. BitKeeper was definitely an improvement when it came out, but it was quickly surpassed by the open source alternatives.

  7. Re:The real story by gmack · · Score: 3, Informative

    It's worse than that. Linus would tell everyone not to worry and go on about how Bitkeeper was a great improvement and Larry would prove him wrong by throwing public tantrums and generally playing stupid licensing games. Ex banning IBM from using the free version since they had a competing SCM being built by another (far removed) department. Banning anyone who worked directly on a competing SCM from using Bitkeeper at all. And responding to said developer reverse engineering one of the export interfaces by discontinuing the free version of Bitkeeper.

    The best part of it all was that Linus helped him design the thing in the first place.

  8. Command non-orthogonality is its weak point by Traf-O-Data-Hater · · Score: 3, Insightful
    Like others, I both love git and hate it. The bit I dislike the most is the inconsistency in commands and their opposites.
    For instance, it is easy to add files to staging:
    git add .
    Oops! A bunch of other things got added, because I'm a newbie and haven't yet tuned my .gitconfig. Fine, I'm still learning.
    OK, have a guess at undoing it:
    git unadd
    wtf?...nope..
    Frustrating searching to find that git reset is really unadd. Yeah, I could guess that! not.
    And that's the crux of it. Sure you can add git aliases, but an xxx/unxxx pattern could have been built in right from the (ahem) git-go for any sensible command. Git commit/uncommit... merge/unmerge... etc etc.

    And the great thing about git: Linus realised disk space was becoming to cheap to meter. Why bother crunching a delta on something when it was easier to just store compressed blobs. Thus the advantage of simple, fast and cheap (pick any three) branching.

  9. Git Internals by Tenebrousedge · · Score: 2

    Why should I have to understand internal data structures in order to use a piece of software?

    Because you're not used to thinking about source code the way Git thinks about source code. Git is very much like a database from a usability standpoint, and you will probably get into bad trouble trying to use either without understanding both the problem that they are trying to solve and the implementation. If you do read about these things, you will understand that git's internals make sense, the decisions it makes are logical, and the user interface is (mostly) transparent and simple. Revisions are harder to manipulate than a Word document, though there are plenty of ways to manage them that are conceptually simpler. Git however was made to manage them efficiently. More specifically, it was designed to be efficient for Linus Torvald's workflow. That happens to be very effective for a large number of other software projects, and no worse than any other solution for many others. There are other workflows for which other RCS systems are better (particularly when working with binary files). If you don't need git's features, by all means use something else. However, your decision to use it or not should probably be informed by knowledge of what exactly it does and why: again, this is no different than choosing a database.

    --
    Those who advocate genocide deserve every protection afforded by law, and none afforded by common human decency.
  10. Git is its own worst enemy by Foresto · · Score: 3, Insightful

    Git is its own worst enemy

    Sigh... Git. Ten years later, and it's still making people suffer with its unforgivably awful user interface. Seriously. I like the command line, and git is my primary version control system, but git's UI is the single most user-hostile example of human-computer interaction that I have had the misfortune to encounter in years. Maybe decades.

    Git's command structure is a train wreck of inconsistencies, some of its most important terminology is worse than worthless, and its man pages and built-in help text are idiotically obtuse. I have been following its development closely enough to understand how it got this way. A lot of it has to do with placeholder terms that were never updated, synonyms that were never reconciled, features that were grafted onto existing commands and never properly organized, and its origin as a set of low-level components rather than a tool intended for humans. In other words, a pattern of evolution much like any other software, except for one thing: Even after years of being relatively stable, its mantainers still haven't addressed its glaring usability problems.

    These aren't just minor warts that only affect a few people, either. There are countless articles, blog posts, and forum threads expressing frustration with git and detailing specific improvements that could transform it from a usability nightmare to an elegant piece of work. Sadly, the maintainers either ignore them or respond with some half-witted reason to resist change. Frankly, I am embarrassed to see my fellow software developers failing so miserably to recognize the importance of usability, and failing to fix it.

    What is the cache? It's a place where you're expected to manually arrange your data before you commit it. Does it function like a person would expect a cache to function? No, but we call it that anyway. What is the index? It's the same thing. Does it function like a person would expect an index to function? No, but we call it that anyway. You're referring to the same thing in both cases? Yes, for the most part. Does it function like anything that might be familiar to anyone? Yes, it's essentially a staging area. Why don't you call it a staging area? We do, but only in the minority of cases. You mean you have three names for the same thing, and the most accurate name is the one that you use the least? Yes. Why? Because the meaningful name might be harder to translate into other languages. So you deliberately use a confusing variety of misleading names when writing in English, the single most widely used language in computer science, because one of your translators didn't want to describe a staging area in another language? Yes. Well, that's probably okay, because this thing is probably some obscure piece of git that most people don't have to use, right? No, it's actually one of git's most distinguishing features, and interacting with it is absolutely required in order to use git. I see.

    Newcomers shouldn't have to be encouraged to "take the time to learn git." It should be easy. A programmer familiar with version control systems should be able to pick up a new one in five minutes, and find the answer to most intermediate-to-advanced problems in maybe ten or fifteen. They should be able to walk away for a month or two, come back, and still remember how to use it. That doesn't generally happen with git. One has to invest quite a bit of time and patience to confidently use anything beyond its most basic operations without screwing something up, and stay in practice with it, or else end up having to learn most of it all over again.

    The ridiculous thing is that it doesn't have to be this way. Mercurial is real-world proof of that.

    I hate git for these reasons. It's a cantankerous bastard of a tool that will just as soon kneecap you as handle your data. I only use it because of github (which is brilliant, by the way.) If you want to see an example of how version control should be done, get to know mercurial. Its internal de

  11. Re:BitKeeper was fine - Slashdot summary wrong aga by Anonymous Coward · · Score: 2, Informative

    Note that they guaranteed BitKeeper would always be free for Kernel developers. So as usual the Slashdot summary is wrong. They could in fact continue to use BitKeeper just fine.

    That's not correct. McVoy pulled the free-licensed version of BitKeeper (which Linus and OSDL were using) because of Tridge's work on a compatible client, and refused to sell a BK license to OSDL because they had allegedly broken the terms of the free license. This despite the fact that Tridge was an OSDL contractor, not an employee, was working on an unrelated project, and never used the BK client itself to do his work.

    Effectively, Linus and a bunch of other kernel devs would have had to quit OSDL (possibly forking Linux) to keep using BK.