Slashdot Mirror


Interview with Tom Lord of Arch Revision System

comforteagle writes "Every revision control system has its supporters and detractors, but none is as polar as Arch. Either you hate it or think it is the best thing in revision control ever. Built more around what our beloved kernel hackers use (BK), Arch is definitely a departure from CVS and Subversion. I've interviewed Tom Lord, Arch's daddy, about the application, and he has some -ahem- interesting answers and opinions."

16 of 334 comments (clear)

  1. Re:I'm left out... by Curtman · · Score: 4, Interesting

    They forget those of us who have never heard of it before.

    And those of us who have heard of it, but have no idea if its a good thing or not.

    I noticed freedesktop.org has started using it to some degree. But like I say, I have no idea if thats a good thing. It is slightly inconvenient in that I have to go read yet some more docs to use it. :(

  2. Most polar? by Sean+Starkey · · Score: 3, Interesting

    I think the most polar source control system is Rational's ClearCase. You really love it or really hate. It's a very complex software package, but very powerful.

    Personally, I really like ClearCase. Too bad its so expensive, otherwise I'd use it for all my open source work.

    1. Re:Most polar? by wintermute42 · · Score: 4, Interesting

      Cost issues aside, I think that perception of ClearCase is effected by whether you have to set ClearCase up yourself or not.

      The first time I used ClearCase I had to set up the ClearCase environment. I did not like the ClearCase documentation much. Rather that just telling you what you need to know to get the system set up they provide their grand vision of the world. I could care less about their grand vision, I want to get the source control system working. After this experience I was not a big fan of ClearCase.

      I used ClearCase again in an environment where the release engineering group managed ClearCase, along with the releases. They would "freeze" the branches for release (and let you in when you had a bug fix). They would also create new development branches and they managed the main line branch. In this environment ClearCase was really nice. I liked it a lot and prefer it over CVS.

      In summary I'd say that ClearCase is a higher cost source control system. You not only have to pay for the software license for ClearCase but part of someone's time to manage it as well. For small projects and software development groups this does not make sense. But once a group reaches a certain size, the cost can be justified and ClearCase is nice.

      I am currently working on a project where there there is a core set of software that is used by three different groups, each of which will probably want their own changes. In this environment I think that a release engineering group and ClearCase would be justified (of course that does not mean that we're going to get a relase engineering group and ClearCase).

    2. Re:Most polar? by bheading · · Score: 3, Interesting

      Clearcase, when you couple it with the UCM product and Multisite, unlimited budget, and big machines to run it on along with a dedicated crew, is an outstanding product and it's impossible to beat. You basically can't get anything better. The trouble is that it's extremely, extremely expensive (in $$$ terms) and requires big-ass hardware, and it falls to bits when you've got developers who are on the road a lot (snapshot views just don't hack it, and to support them properly you have to eliminate all reliance on Clearcase's fancy build avoidance features). I worked with Clearcase in a 60-developer Multisite environment with two large SMP Sun boxes running the Clearcase servers, with multiple gigabytes of RAM between them, and even then the Clearcase environment would freeze up during the day from time to time simply because it was trying to handle everyone working on it at once.

      The killer feature of Clearcase is it's automatic dependency computation and build avoidance capabilities. These are possible because Clearcase provides an entire version-controlled filesystem called MVFS which you mount like a regular FS; it's completely transparent to the tools running on it. Thus, their custom build tool (clearmake) can watch what's going on during your build on the MVFS and automatically track build dependencies in a completely reliable manner. Of course, the flip side of this is that performance sucks; each time you stat() or otherwise access a file on that MVFS you're talking to the server which has to look up the version you're accessing (by reading your configuration spec - the powerful mechanism which defines which files you want to read out of the database) and pull it out of it's database.

      Clearcase works pretty well if you've large, and separated, teams of developers who basically don't move around, and you've a team of people to babysit the servers. In this post-dot.com era, the cost and expense of such a system is hard to justify. Smaller consulting business who have developers, who need to build, constantly on the road will find Clearcase more of a hinderance than a help.

  3. I disagree... by CaptainPinko · · Score: 4, Interesting

    don't we get enough marketing droids that can't ever say what they mean? I agree he was upfront, blunt, and brutal but in the end he didn't seem crazy or wild or unreasonable. He even backed up some of his more inflammatory statements. I think he was a very good interviewee. He did seem to be a little too forgiving to his project own weaknesses but that's is not unexpected and relatively forgiveable.

    --
    Your CPU is not doing anything else, at least do something.
  4. No people skills. by Ectospheno · · Score: 5, Interesting

    Tom Lord has tried to work more closely with other revision control packages before (including the subversion team) but he has been hampered by his complete and total lack of people skills. I don't think he tries to, but he ends up offending everyone he tries to have a "discussion" with. Its comical and sad at the same time.

  5. Re:All that and he doesn't explain... by tlord · · Score: 4, Interesting

    > As to svn backends... I think it is prudent to
    > point out a false statement made by Lord.
    > [Hey, FSFS exists.]

    I agree it is good to point out FSFS. The
    interview is, indeed, misleading in that
    respect.

    As far as I know, back when the interview was
    conducted, FSFS did not exist or at least was
    not on many radars.

    A separate question is whether or not FSFS
    really makes the server-side of svn all nice
    now or not --- but certainly that is not going
    to be worked out in /. comments.

    -t

  6. darcs by The+Pim · · Score: 4, Interesting

    Ok, I admit I just want to get darcs mentioned here, but I really want to know what Tom (as well as Larry McVoy) thinks about darcs. In particular, whether the theory will stand up to real use and scale to large projects. I have a hunch that David Roundy has discovered much of what Larry McVoy said was a dozen PhD theses worth of research behind BitKeeper.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
    1. Re:darcs by Anonymous Coward · · Score: 5, Interesting

      Hi,

      I (Larry McVoy) have looked over Darcs, Monotone, Arch, Codeville, and I think some others that I can't remember and I can easily say that no, they haven't discovered much of what we have done.

      Let's take darcs as an example. It's a cool system if you are a math or physics person. You can write proofs about how it works, much like BitKeeper. We like that and applaud anyone who is thinking that hard (and if you are looking for a job please come talk to us, we are always hiring). However, darcs suffers from the math problem. It's all about math and not at all about being pragmatic. Here's a for instance. The BitKeeper tree holding the 2.6 kernel has about 55,000 changesets. A null update using BK is 4 seconds (which is insanely slow in our opinion). Try doing the same thing with darcs and you will wait and wait and wait... That's just the first example of how it doesn't scale. The openlogging tree for linux is somewhere north of 110,000 changesets. *All* other systems die with that sort of load. We're slow but we work and we know how to fix the slow part.

      This problem space is strange, it is part math and part pragmatism. You have to do both and darcs does one of them. And it does it in only one of the areas, there are many many more. Repository synchronization, rename handling, merging, user interface, installation tools, working well on Windows as well as Unix, etc., etc.

      Our payroll is higher than any open source SCM system has generated by a factor of 50. It's higher than the reiserfs payroll, it's higher than lots of well known little companies doing useful stuff. It's high because there are lots and lots of corner cases *in addition* to the hard math stuff which needs to be done.

      Since we're talking about Arch, here's another example: we recently got a commercial customer who tried out arch on windows and came back and told us BK was at least 10x faster. And we told him that we think BK is way too slow on Windows. He liked that. The point being is that it isn't just about architecture, or licensing, or features, it's about a lot of not-so-fun stuff and that's why a commercial answer will always be better than a free answer. It costs a lot of money to solve the non-fun problems. Open source solves the fun problems (extremely well, I might add) but unless the project is very visible (i.e., the kernel) it starts to fall down when you hit the non-fun problems. Think about it - if noone is paying you money or telling that you rock while you are doing the grunt work - how long are you going to do that? Not very long, just look at 90% of the "projects" on sourceforge, all talk, no code.

      It's worth repeating that last bit. SCM is an undervalued field. Every engineer thinks that they can reproduce what BK does with a few scripts wrapped around CVS or RCS. While they may think that it flies in the face of the over 100 man years we have in BK and we know we are nowhere near good enough. The bummer is that the perception is that this stuff is easy but the reality is that it is hard. Both technically hard and detail hard. It's way more work than people think. But precisely because people don't value it, that's why the only real answer is a commercial answer. Yeah, yeah, you all love to give me crap because BK isn't GPLed but *none* of you have put in 1/10th as much effort as I have or have made 1/10th as much of a difference in this space. Talk is cheap, show me a better answer and I'll be impressed. It won't happen because it costs way way way too much money to deliver a better answer. How's the arch installer on windows? Graphical? Is it careful about not screwing up the registry? Can you have two different versions installed at the same time? What about the transport layers? Works over http? Really? Through all the wacky proxies out there? You get the idea, right?

      That's why all this discussion of arch or darcs or whatever is just nonsense. You all think this stuff is easy so you are never going to cough up the $30M or so it will take to solve it right. Sad but true. I guess it's good for us, it means we have a market, but it would be nice if you knew a bit more about the topic. I love it every time it comes up, the world is definitely becoming more aware at least.

      --lm

    2. Re:darcs by The+Pim · · Score: 4, Interesting
      (Someone mod parent up--this is really Larry.)

      I agree that "darcs suffers from the math problem", at least in that the implementation has focused on getting the semantics right and not on performance. (And unfortunately, the semantics are still not all right.) David maintains a kernel tree in darcs as a reminder of all the ways it doesn't scale. However, he also thinks most of them are fixable "post 1.0", and given how smart and capable he's proven to be, I give that claim some respect. Alas, I haven't had time to learn the math well enough to really be sure.

      Regarding the economics, I don't think SCM is an undervalued field. Or at least, the free software community can find a way to value any field it needs to to make progress. (And for SCM, you're helping!) People said we didn't value desktops, or help, or installers, or web browsers, or couldn't do webdav or other protocols "at the top of the stack". "No fun" is what people have said about all of these. (And we're still not great at all these, but I think we're on a clear path to get there.)

      What does this mean for darcs? It already has good semantics, is easy to use, and has a solid theoretical foundation. I think that free software folks will increasingly value distributed SCM and it will get more development man-power (if not as much as bk). These are excellent growth factors, and I suspect darcs will be able to handle 90% of projects out there in a few years. Unless the foundation is found to be weak (which is why I asked about that). Unless David loses interest before someone else steps up. Unless, unless, unless, but I like its chances.

      Put it this way: I agree that open source does not solve things that are too hard or no fun. But the second is actually a non issue: when we need something, powerful economic and selective forces will make it fun for someone. So I really care about the first, and I'm trying to gauge whether distributed SCM is too hard for David and others attracted to darcs. I suspect that it's not too hard, at least to get to the 90% mark.

      Thanks for taking the time to reply. I do enjoy reading what you have to say.

      --

      The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
  7. Re:Argument by Slashdot(r) ? by legLess · · Score: 3, Interesting
    This actually convinced me to read the linked article.
    No greater praise for a /. comment :) I feel beatified.
    --
    This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
  8. Arch's biggest bug by dozer · · Score: 4, Interesting

    A quote from an email conversation with an unnamed Arch user in January: "I think Arch's biggest bug is the one up the developer's collective asses."

    This article is a good example. Tom Lord just hand-waves his way past every question. Subversion sucks!!! CVS users are teh stupid!!! If he tones it down a bit, he definitely has a future in politics. But I don't think he's a very good software architect.

    OK, it's true that CVS and Subversion have problems. But, gak, so does Arch. Good God is it slow for big projects (something they've been promising to fix for years). And it's got some horrifying naming conventions: "tla--devo--1.3". And the files! "{arch}", "++default-version", ",,inode-sigs". Whatever Lord was smoking, it must have been good. The branching and merging operators are powerful but, thanks to all the punctuantion, they are also ugly. It's like the entire UI goes out of its way to be downright unfriendly.

    Every time someone mentions these deficiencies on the mailing list, they just get flamed for not truly understanding Arch. "Namespaces! Namespaces! Namespaces!" "Win32 is for lusrs!" Whatever. I just want a tool that helps me get the job done.

    Personally, I'm in the middle of transitioning to Subversion. It's better than CVS, and it is faster and nicer to use than Arch. Works for me.

  9. Re:I don't like CVS, Subversion, or Arch by iabervon · · Score: 3, Interesting

    The article says that Tom Lord claims that a comprehensible interface for arch should be ready by the end of the year. Arch really is the right design, and will be ideal once there's a sane interface.

  10. Distributed development under arch? by iabervon · · Score: 3, Interesting

    I'd be interested to hear if anyone has actually gotten happy with distributed development under arch. I tried a reasonably simple case a few weeks ago, and couldn't get it to feel right.

    What I was trying to do was to have a two-layer revision control system, where I have a private archive in addition to the project archive, and I check into the private one all the time, and transfer changesets to the project archive when I'm happy with it. That way, I can be halfway through refactoring a big chunk of code, have it completely broken, but have the work so far revision controlled so that, if I accidentally wipe out my build tree, I can recover it.

    The problem I ran into was that I couldn't get the two archives to agree exactly on the current status: whenever I transferred my changes up from the private archive, it added a log message to the project archive, and my private archive wasn't up to date, because it didn't have the message. When I updated my private archive from the project archive (either to pick up the message or to get other people's changes), I had to put in a log message, which the project archive then didn't have.

    It seems like arch really ought to support getting two archives in perfect sync, as well as disregarding a commit to a remote archive that only adds changesets already in the local archive (as well as disregarding the changesets themselves, which it does do).

  11. Tom Lord was my roommate in '88 by js7a · · Score: 3, Interesting

    I have a huge amount of respect for him. He taught me that compromise is way overvalued.

  12. Re:Does he know ANYTHING about Subversion? by catenos · · Score: 3, Interesting
    Give me an example scenario that shows me just how fucked I would be with svn and how Arch would ride in on a white horse and save the day. Then give me four or five more. Write a couple of whitepapers explaining how Arch is fundamentally much better than Subversion in its theoretical design
    It's funny and sad to know he (mostly) already did this. Search the subversion-devel list back to 2002. ;)

    Huh? Did you read the same mails as I? Back then, Tom Lord's ramblings on the svn-dev mailing list had the same problem as this interview. And also those the grandparent complained about:

    What exactly is bad about Subversion? Give me an example scenario that shows me just how fucked I would be with svn and how Arch would ride in on a white horse and save the day.

    TL talked big about how Subversions design was broken but when asked to give concrete examples he always kept talking about theories.

    IMHO, it's not much unlike saying that Linux sucks because it isn't a micro-kernel architecture. And when being asked about details, being unable or unwilling to come up with an example how a micro-kernel design would fix an existing major flaw (without sacrificing the existing good points of the software).

    For example, I like QNX's design very much. But that doesn't imply that Linux is broken or sucks. Both have their strong and their week points dependend on the task at hand. (And for my daily desktop work I would fall into a crises if I had to use QNX instead of Mandrake due to some QNX usuability issues... oh wait, that reminds me of arch! ;-)
    --
    Keep an eye on which arguments are silently dropped in replies. Not always, but often times it's very telling.