Interview: David Roundy of Darcs Revision Control
comforteagle writes "In the aftermath of our last interview with Tom Lord, regardless of personalities, it became apparent that the idea of decentralizing CVS is a big deal. Many mentioned darcs as an alternative to Arch. Mark Stosberg has interviewed project head-hancho David Roundy about darcs, his 'theory of patches,' what's next, and on using Haskell for the project."
So basically you didn't read the article. He gets more developers because it is written in Haskell than he would otherwise because it's one of the few real applications that are written in Haskell - which means if you're someone who just learnt Haskell for the hell of it you've got somewhere to apply those skills.
How we know is more important than what we know.
I did read the article, however I do DISAGREE with that comment in the article. People won't learn a language for one program, and there is not a large enough body who know the language to really truly UNDERSTAND the program and enough about it to make modifications and additions to it. Compare that to a bunch of C/C++/Java/Perl developers with a massive community body, it's a lot easier to get people to contribute. -M
when you see the word 'Linux', drink!
How will the choice of language hurt darcs's use? Why on earth would the users of a piece of software care about the language it's written in?
You wrote:
From the article:
So perhaps you should attempt to assimilate some facts before trotting out your tedious, ill-informed prejudices, hmmm?
Furthermore, it's not just about the sheer number of developers, it's about the power of the language. A million monkeys writing code are still only monkeys, and the more developers you have on a project, the more co-ordination is required (read Fred Brooks' The Mythical Man-Month if you don't believe me).
If "number of potential developers" were the only criterion for choosing a project's programming language, everything would be written in BASIC. And Paul Graham makes a good case for coding in less common languages: you'll get people smart enough to learn unusual languages for the hell of it, rather than a mass of monkeys who have little interest in building great software and just want to learn this week's marketable language to improve their employment prospects.
Like CVS, you can get productive within minutes; the same cannot be said for Arch or even Subversion. Let's see:
You now have a Darcs repository! Let's do something with it:
Now your repository contains all your files. Let's look at the changelog:
Now, where's the server? You need a server to share your repository, right? Nearly -- every repository is a potential server, as long as it's accessible either through the file system, through SSH/SFTP, HTTP or email. Let's go to another machine and check out the repository we just made:
We now have a repository on Jane's box. Let's make a modification:
This last output, by the way, is Darcs' patch format. A "hunk" is a line-based diff. Other types of changes that may be contained in a changeset include renames, moves and binary changes. (Yes, you can also get a GNU-patch-compatible output similar to "cvs diff".)
Now let's commit and push the changes back to John's repository:
Now we can go back to John's machine and look:
(Note how Darcs generates a GNU-style changelog for you automatically.)
Where are the revision numbers, you ask? Well, they don't exist, because they're not needed. Darcs is changeset-oriented, not file-oriented. You can refer to a changeset by name, date, or a special hash identity.
Darcs changesets aren't just GNU patches; they have context, which means, for example, that someone can check out a repository, move a file "foo.c" into the directory "bar" and commit; meanwhile, another person, working on an older copy of the same repository, edits foo.c (which is still in its old location) and commits that. Darcs know that this edit should apply to foo.c in the new location -- and unlike CVS, you don't need to do anything similar to "cvs update" if you're committing files that have been changed on the server. In other words, people can freely commit changes, and the only kind of visible "conflict" will occur when you actually edit the exact same line.
Unlike CVS and Subversion, but like Arch and Monotone, Darcs is a distributed version control system. Repositories are islands which are constantly out of sync with each other, and Darcs' patch commutation system takes care of integration the changes that flow between them.
This system has several extremely useful effects:
Darcs get (equivalent to CVS checkout) is the single least efficient command in darcs. People keep telling me I need to fix this, since it's the first thing users see, but it's really not an important command to optimize (apart from first impressions issues). When run locally (to create a new branch) it's fast.
And comparing darcs get with cvs checkout really isn't fair, since darcs gives you a copy of the full history of the repository, a separate branch on which to record changes before committing them to the centralized repository, and the ability to browse the history offline.
If you want a fast get, just run optimize --checkpoint on the parent repository (assuming you've tagged recently--if not, then tag the current state first), and then use the --partial flag when running darcs get. It'll still give you more flexibility than a cvs checkout, and will be much faster.
1. It's actually hard to use the patch commutation code to do any good outside the concept of a darcs repository.
1.5 I've thought about creating a C library for manipulating/querying darcs repositories, but haven't gotten around to it. The hard part would be of course designing the API. Ideally I'd like the interface to be such that programs using the library couldn't accidentally corrupt the repository.
2. Darcs requires ghc, since it uses some library code only available in ghc to do more efficient IO, string manipulation and to access zlib. It turns out to be a pain on many systems to link with the necesary libaries when using the interpereted version of ghc. So probably accessing darcs from perl will have to go through the executable until a C library is written (which could of course have perl bindings).
3. Rewriting darcs in perl (or parts of it) would be possible, but would be a pain. In particular, the commutation of patches which have conflicts is pretty complicated.