Slashdot Mirror


Interview: David Roundy of Darcs Revision Control

comforteagle writes "In the aftermath of our last interview with Tom Lord, regardless of personalities, it became apparent that the idea of decentralizing CVS is a big deal. Many mentioned darcs as an alternative to Arch. Mark Stosberg has interviewed project head-hancho David Roundy about darcs, his 'theory of patches,' what's next, and on using Haskell for the project."

18 of 173 comments (clear)

  1. CVS-style development with darcs by Ashish+Kulkarni · · Score: 3, Informative

    There was a good post on this on the mailing list a while ago.

  2. Re:Haskell just won't cut it by QuantumG · · Score: 5, Insightful

    So basically you didn't read the article. He gets more developers because it is written in Haskell than he would otherwise because it's one of the few real applications that are written in Haskell - which means if you're someone who just learnt Haskell for the hell of it you've got somewhere to apply those skills.

    --
    How we know is more important than what we know.
  3. Re:Haskell just won't cut it by PhYrE2k2 · · Score: 5, Interesting

    I did read the article, however I do DISAGREE with that comment in the article. People won't learn a language for one program, and there is not a large enough body who know the language to really truly UNDERSTAND the program and enough about it to make modifications and additions to it. Compare that to a bunch of C/C++/Java/Perl developers with a massive community body, it's a lot easier to get people to contribute. -M

    --

    when you see the word 'Linux', drink!
  4. Re:Haskell just won't cut it by pnot · · Score: 5, Insightful
    You wrote:
    While "Darcs is written in a Haskell, a functional language that is relatively unknown compared to C or Perl", this really does hurt it's common use.

    How will the choice of language hurt darcs's use? Why on earth would the users of a piece of software care about the language it's written in?

    You wrote:
    Not being able to get a larger group of developers such as C, C++, or even some interpreted projects means that it becomes one or a few developers working on this project

    From the article:
    I've been surprised by the number and quality of contributors darcs has had. There seem to be quite a few people out there just looking for somewhere to use Haskell! :) And in fact, there have also been developers who learned Haskell expressly for the purpose of contributing to darcs. It's such a pleasant language to work with that I think it's more of a draw to developers than a put-off.

    So perhaps you should attempt to assimilate some facts before trotting out your tedious, ill-informed prejudices, hmmm?

    Furthermore, it's not just about the sheer number of developers, it's about the power of the language. A million monkeys writing code are still only monkeys, and the more developers you have on a project, the more co-ordination is required (read Fred Brooks' The Mythical Man-Month if you don't believe me).

    If "number of potential developers" were the only criterion for choosing a project's programming language, everything would be written in BASIC. And Paul Graham makes a good case for coding in less common languages: you'll get people smart enough to learn unusual languages for the hell of it, rather than a mass of monkeys who have little interest in building great software and just want to learn this week's marketable language to improve their employment prospects.
  5. Re:Haskell just won't cut it by cduffy · · Score: 4, Informative

    Some people will learn a language because they want to know a language that has that specific set of features, regardless of what applications have already been written in that language.

    It's a small group, but if you've the only game in town (in terms of OSS projects for them to work with)... well, that works out pretty well for you!

    No, if Darcs has any major issues, it's the RAM and CPU time requirements, some of which the design makes inherently unresolvable.

  6. Darcs is KISS by Earlybird · · Score: 5, Informative
    Among the plethora of emerging version control systems -- Subversion, Arch, Monotone and so on -- Darcs stands out for its simplicity and thoughtful design.

    Like CVS, you can get productive within minutes; the same cannot be said for Arch or even Subversion. Let's see:

    john@somewhere$ cd ~/myproject
    john@somewhere$ darcs init

    You now have a Darcs repository! Let's do something with it:

    john@somewhere$ darcs add -r *
    john@somewhere$ darcs record -am "Initial import."
    Finished recording patch 'Initial import.'

    Now your repository contains all your files. Let's look at the changelog:

    john@somewhere$ darcs changes
    Thu Nov 25 06:26:19 CET 2004 johndoe@example.com
    * Initial import.

    Now, where's the server? You need a server to share your repository, right? Nearly -- every repository is a potential server, as long as it's accessible either through the file system, through SSH/SFTP, HTTP or email. Let's go to another machine and check out the repository we just made:

    jane@elsewhere$ darcs get john@somewhere:~/myproject
    Copying patches...
    .
    Finished getting.

    We now have a repository on Jane's box. Let's make a modification:

    jane@elsewhere$ echo "#include <foo.h>" >>foo.c
    jane@elsewhere$ darcs whatsnew --summary
    M ./foo.c +1
    jane@elsewhere$ darcs whatsnew
    {
    hunk ./foo.c 2
    +#include <foo.h>
    }

    This last output, by the way, is Darcs' patch format. A "hunk" is a line-based diff. Other types of changes that may be contained in a changeset include renames, moves and binary changes. (Yes, you can also get a GNU-patch-compatible output similar to "cvs diff".)

    Now let's commit and push the changes back to John's repository:

    jane@elsewhere$ darcs record -am "Added a missing include."
    jane@elsewhere$ darcs push -a
    [...]
    Finished applying...

    Now we can go back to John's machine and look:

    john@somewhere$ darcs changes
    Thu Nov 25 06:26:10 CET 2004 janedoe@example.com
    * Added missing include.

    Thu Nov 25 06:26:19 CET 2004 johndoe@example.com
    * Initial import.

    (Note how Darcs generates a GNU-style changelog for you automatically.)

    Where are the revision numbers, you ask? Well, they don't exist, because they're not needed. Darcs is changeset-oriented, not file-oriented. You can refer to a changeset by name, date, or a special hash identity.

    Darcs changesets aren't just GNU patches; they have context, which means, for example, that someone can check out a repository, move a file "foo.c" into the directory "bar" and commit; meanwhile, another person, working on an older copy of the same repository, edits foo.c (which is still in its old location) and commits that. Darcs know that this edit should apply to foo.c in the new location -- and unlike CVS, you don't need to do anything similar to "cvs update" if you're committing files that have been changed on the server. In other words, people can freely commit changes, and the only kind of visible "conflict" will occur when you actually edit the exact same line.

    Unlike CVS and Subversion, but like Arch and Monotone, Darcs is a distributed version control system. Repositories are islands which are constantly out of sync with each other, and Darcs' patch commutation system takes care of integration the changes that flow between them.

    This system has several extremely useful effects:

    • Offline mode. You can commit changes even if you're on the road with no access to the server. That's because your own working directory is a repository in its own righ
    1. Re:Darcs is KISS by Earlybird · · Score: 4, Interesting
      • One of the nice things about subversion (recently converted user, very happy so far) is the support for multiple url formats and communications methods.

      Darcs and Arch both have this. (Arch undoubtedly has the most extensive protocol support of any revision control system.)

      • Another notable thing (for windows users) is TortoiseSVN, (an explorer shell extension) which is just great.

      Tortoise is quite nice indeed -- I used TortoiseCVS for years.

      • I can see how the distributed, multi-repo model of bitkeeper/darcs/arch is superior but svn looks good if you only need single-repo.

      Need is just one aspect of the development process; right now CVS gives most people what they need, despite the cracks in the lacquer. Darcs doesn't just erase the cracks, but improves the process.

      For example, I occasionally submit patches to certain open-source projects. The easiest way to do this is to check out the CVS repository, make my changes, and do "cvs diff -u" to get the patches in that format, which I tend post to some Bugzilla server or email to somebody. But I can't commit them. I don't mean to the master repository -- I mean locally. There's no way I can bundle my file patches in a changeset and keep its history. I'm basically managing a CVS working directory where my changes are never checked in.

      With Darcs, I just do "darcs get" to get the master repository, make my changes, commit them locally. I can use "darcs send" to submit my changes to the project maintainer. Anyone else can grab my patches with "darcs get" or "darcs pull". I can be Alan Cox to some Linus without breaking my back over patch management.

  7. Self-hosting by Earlybird · · Score: 3, Informative
    I believe the term you're looking for is self-hosting. Subversion, for example, was originally maintained in CVS, and has a CVS gateway for maintaining redundant systems.

    For pilot-testing a migration to Darcs, there are scripts available that convert other repository formats (Subversion, CVS, possibly others) into Darcs (and back, actually), so you avoid losing history when making the transition.

  8. Re:Haskell just won't cut it by Mr.+Slippery · · Score: 3, Insightful
    Why on earth would the users of a piece of software care about the language it's written in?

    If your users are FOSS developers, they quite likely care about the ability to modify the tool, which includes caring about the languate in which it is written.

    --
    Tom Swiss | the infamous tms | my blog
    You cannot wash away blood with blood
  9. Re:Haskell just won't cut it by Pseudonym · · Score: 4, Interesting
    The problem with it is that because it's functional you often end up restructuring half the program for what would have been a trivial change in an imperative language.

    While I don't disagree with this, there are some counter-arguments too:

    • If you find yourself doing that, you may have written your original program in an imperative style in the first place. Alan Holub's argument about getter/setter methods applies to declarative programming too. If you wrote in a more language-ideomatic style, you might not be facing a huge restructure at all.
    • Much the same problem can happen in imperative languages, only the class of changes which would trigger such a restructure are different. For example, in a non-GC'd language, you may end up restructuring your program if some critical data lifetime changes. Or, instead of restructuring your program, you might prefer to hack it up instead, making it less maintainable. (It might be argued that languages like Haskell, which discourage this kind of hackery, might be a good thing in the hands of a certain kind of programmer.)
    • Even if you do have to restructure half the program, tools like Haskell's type system make this a less painful task than it would otherwise be.

    Knowing a language also means knowing what kinds of changes are painful and what kinds of changes are not. Knowing this in advance helps you write your programs to be more future-proof.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  10. Its the clients and API that matter by monkeyboy87 · · Score: 3, Informative

    no matter what the change set "theory" are implemented into the product if it's not easy to do things with it, it will languish. VSS and CVS are still used widely becuase there are lots of clients and tools that make it useful. the draw of SVN (loved or hated) is that it has a good client and the command line client is easy to drive with scripting tools

  11. Re:Haskell just won't cut it by pnot · · Score: 3, Interesting

    me: Why on earth would the users of a piece of software care about the language it's written in?

    you: If your users are FOSS developers, they quite likely care about the ability to modify the tool, which includes caring about the languate in which it is written.

    Interesting point. Certainly FOSS developers care about being legally allowed to modify code, but I'm not sure that they care, on the whole, about the language.

    emacs, for example, is largely written in elisp -- hardly a mainstream language. Yet it's extremely popular, even among people who don't know any lisp. People who find the need to extend it get a good excuse to learn lisp in a well-motivated, incremental way.

    Speaking personally, I'd be ''more'' inclined to hack on a project written in something interesting like Haskell, ML, Smalltalk, or Lisp. (In fact I chose my current main project partly as an excuse to learn Lisp.) A lot of people like having a motivation for learning a new language, or a practical use for an "academic" language they happen to know. (I learned Haskell in university, so I'd be quite keen to get to use it in the "real world".)

    I think the only time I'd care about the language of a program I'm using would be if it were written in something particularly horrible -- "urgh, if I ever want to modify this I'll have to learn befunge!" But perhaps that's the way some people view Haskell ;-).

    It's a matter of taste, I suppose. I do acknowledge that I'm a bit of a language nut.

    I still maintain that quality is more important than quantity, though. I've been teaching C and Java to second-year undergraduates this year -- having seen some of their code, I can safely say that if I were starting an OSS project, I'd rather have one seasoned Haskell hacker on board than the entire lot of 'em :-).

  12. Needs wider adoption by haeger · · Score: 4, Interesting
    While darcs is nice it needs wider adoption. When it comes to a project that people are working on, you have almost as many boxes as you have developers and for a revision control program to be adopted and used there has to be binaries for all those devels. AFAIR there are some issues with the win32 binary? One of our devels had major problems with it and now we're living with both a cvs and a darcs repository, and noone really knows where to send patches. I think it's safe to say that our project is dying, if not dead already.

    Not that I blame darcs or anything, just that one need to be sure that darcs work for everyone before commiting to it. CVS works on all platforms and is well tested. Darcs will hopefully get there.

    And yes, I did my part and created a package for my platform. It's linked from the binary download page.

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
  13. Re:Theory of patches by Earlybird · · Score: 3, Insightful
    • Let me summarize the "theory of patches": you reverse patches in the opposit order of applying them.

    No. Darcs can, and will, apply patches out of order. From the Darcs manual:

    • The development of a simplified theory of patches is what originally motivated me to create darcs. This patch formalism means that darcs patches have a set of properties, which make possible manipulations that couldn't be done in other revision control systems. First, every patch is invertible. Secondly, sequential patches (i.e. patches that are created in sequence, one after the other) can be reordered, although this reordering can fail, which means the second patch is dependent on the first. Thirdly, patches which are in parallel (i.e. both patches were created by modifying identical trees) can be merged, and the result of a set of merges is independent of the order in which the merges are performed. This last property is critical to darcs' philosophy, as it means that a particular version of a source tree is fully defined by the list of patches that are in it, i.e. there is no issue regarding the order in which merges are performed.

    A distributed version control system that required all patches to be applied in order would be painful indeed to use.

    • I have to agree with many other comments: the use of haskell eliminated it as a choice for me.

    Why? Are you a Subversion contributor?

  14. Our experience CVS vs. DARCs by ites · · Score: 4, Informative

    We are looking at something to replace our ageing CVS system. We have large OSS projects, worked on by teams of 3-10 people. CVS is very good for what it does but we are feeling its limitations. The biggest problems are that forks are too delicate to use, so we don't use them, and that in order to work you need access to the central archive.

    Darcs looked like the best choice. We converted and imported some of our archives. Then we tried checking them out. With CVS, 2-3 minutes. With darcs, 30 minutes.

    Our conclusion: darcs is not scalable. Admittedly our code base is large and has a huge history, but in order to use darcs we would have had to break our projects into many small pieces, each with their own repository.

    Darcs looks good. But it needs to be made much, much faster if it's to work with large projects.

    --
    Sig for sale or rent. One previous user. Inquire within.
    1. Re:Our experience CVS vs. DARCs by David+Roundy · · Score: 5, Informative

      Darcs get (equivalent to CVS checkout) is the single least efficient command in darcs. People keep telling me I need to fix this, since it's the first thing users see, but it's really not an important command to optimize (apart from first impressions issues). When run locally (to create a new branch) it's fast.

      And comparing darcs get with cvs checkout really isn't fair, since darcs gives you a copy of the full history of the repository, a separate branch on which to record changes before committing them to the centralized repository, and the ability to browse the history offline.

      If you want a fast get, just run optimize --checkpoint on the parent repository (assuming you've tagged recently--if not, then tag the current state first), and then use the --partial flag when running darcs get. It'll still give you more flexibility than a cvs checkout, and will be much faster.

  15. Re:Here: by boa13 · · Score: 3, Interesting

    One important feature is missing from the page:

    Support for signing patches and archives

    Allows to verify who created/commited the patches. Allows to verify the integrity of a repository in case of compromise.

    * Arch: Excellent. Each patch can be signed, repositories can be fully verified.
    * Darcs: Incomplete. Patches sent by email can be signed so the recipient can verify the identity of the submitter. No support for verifying repository integrity. [1]
    * Subversion: N/A

    [1] Problem is: You can only sign something that will not vary once distributed, Darcs patches vary once distributed.

  16. Re:Interesting app. non-troll questions by David+Roundy · · Score: 5, Informative

    1. It's actually hard to use the patch commutation code to do any good outside the concept of a darcs repository.

    1.5 I've thought about creating a C library for manipulating/querying darcs repositories, but haven't gotten around to it. The hard part would be of course designing the API. Ideally I'd like the interface to be such that programs using the library couldn't accidentally corrupt the repository.

    2. Darcs requires ghc, since it uses some library code only available in ghc to do more efficient IO, string manipulation and to access zlib. It turns out to be a pain on many systems to link with the necesary libaries when using the interpereted version of ghc. So probably accessing darcs from perl will have to go through the executable until a C library is written (which could of course have perl bindings).

    3. Rewriting darcs in perl (or parts of it) would be possible, but would be a pain. In particular, the commutation of patches which have conflicts is pretty complicated.