Slashdot Mirror


Usenix President - Linux Needs Better Paper Trail

Anonymous Coward writes "Usenix Association president Marshall Kirk McKusick is a veteran of BSD's intellectual property scuffle with AT&T in the 1990s, and he's got some thoughts and advice for the keepers of the Linux kernel going forward, commenting: 'There isn't a well-documented ownership trail with Linux. So, they have opened themselves up to a swamp of 'he said-she said' about where code came from'."

21 of 166 comments (clear)

  1. Good timing for this then. by Anonymous Coward · · Score: 2, Informative

    http://news.com.com/Linux+contributors+face+new+ru les/2100-7344_3-5218724.html?tag=nefd.top

  2. News at 11 by joib · · Score: 2, Informative

    It's not like this is some surprising new insight, see another article posted today: here.

  3. Coming soon...Pamela Jones' Grokline.... by LouisvilleDebugger · · Score: 3, Informative

    Is intended to allow the developers of Linux, as well as the various UNI*es, to register and tell what they know of their own roles, as well as the development of each feature of each version of UNIX flavored operating system. Stay tuned to Groklaw for the official announcement...we're working on getting the site up within the next couple of days.

  4. Site's a little slow by karmatic · · Score: 2, Informative

    Site's a little slow already (darn subscribers), so here's a Mirror.

    Note: This doesn't mean I agree with this crap. As a coder, I can certainly understand their wanting to write code more than document everything. Really, shouldn't CVS logs be as much "proof" you wrote it as you need? It's far more work to try to fake writing it by changing other's code, than it is to just do the work itself.

    1. Re:Site's a little slow by GridPoint · · Score: 2, Informative

      One of the main points of the article is that the (earlier) Linux versions lack any kind of CVS logs. The situation has been remedied when Linus started using BitKeeper, but there are years of development that cannot be tracked using a single source revision control system. This makes things quite complicated as the developers must dig through mailing lists and other means of communication to find out who really wrote what. "[...] they will have to dig themselves out of the swamp [...]", as said by McKusick.

      (Oh yes, and just so you know, Marshall Kirk McKusick isn't just some law-monkey, he is one of the leading BSD developers and has, among a lot of other stuff, written stuff such as the SoftUpdates FreeVSD filesystem extension which allows for running fsck as a background process during normal system operation.)

  5. Re:The only problem with that quote is... its enti by Anonymous Coward · · Score: 3, Informative

    Ehh. Linux /always/ had a version number. Since day one, with v0.01, back in 1991.

  6. Funny how this coincides with... by yarrick · · Score: 5, Informative

    Slashdot: Process Improvements Wasn't Linus just talking about authors signing kernel submissions?

  7. Re:Ownership trail? by chromatic · · Score: 2, Informative

    No. It's that no one can take away your right to fork the software, your right to use the software as you see fit, your right (or your proxy's right) to examine and change the software if you desire, and your right to redistribute the software, as long as you allow other people the same rights.

  8. Re:The only problem with that quote is... its enti by imp · · Score: 5, Informative

    The changelog is insufficient documentation. It contains vague attributions that something changed somewhere in the code. It isn't specific as to what lines of code changed. Later, when you go back and try to find where a set of lines came from, a changelog doesn't help much.

    With a source code control system, you know that so and so added on such and such a date. You can then go to that person and ask them where they got it from if there's ever any question.

    In the BSD world (I do a lot with FreeBSD), this has come in very handy when code disputes come up. Being able to talk to the actual people that inserted the code into FreeBSD has helped to clear up what otherwise might have been viewed as something improper.

    I've tried to do similar things with versions of linux in the past, only to discover that I could, at best, find what version they came into the tree at, and who collected the patch and sent it to Linus. I wasn't able to track it further without searching public mailing lists for the information (with mixed results).

    while you might believe that it will take 20 minutes to identify the code in question, my guess is that's overly optimistic, unless the code in question was contributed since bk. It usually takes me at least 5 minutes to find out where code comes from in FreeBSD when there's a question, and cvs annotate makes the process *MUCH* faster.

    I'm not sure I'd disagree with your comments about SCO being able to come up with where the code came from relative to Linux.

  9. Re:Ownership trail? by julesh · · Score: 2, Informative

    No, the purpose of the GPL is to provide everyone with access to the code and allow them to use it in their own GPL programs.

    All contributors to Linux still own the sections that they contributed. Some projects are run differently, for instance the FSF owns the code to all of the official gnu projects, because they ask contributors to assign copyright to them.

    The ownership is important if you later want to change the license, for example by granting somebody permission to do something that isn't usually allowed by the GPL (e.g. distribute a modified version that isn't under the GPL).

    If ownership of the code is restricted to a few well-known people this can be done, in the case of the linux kernel it couldn't, because if any contributor couldn't be contacted/refused (there'll be quite a few, I suspect), then their code would have to be removed. If it were important it would then have to be replaced.

  10. Good time for a link, then. by Anonymous Coward · · Score: 1, Informative
  11. Re:A new agreement by Anonymous Coward · · Score: 4, Informative

    You are using digital signatures that aren't based on a standard, documented algorithm like SHA1? Better make sure your closed-source Windows implementation isn't snake oil... You should read what Schnierer has to say about unpublished proprietary encryption algorithms (for example in 'Applied Cryptography 2nd Ed'). FWIW, there are Linux implementations of just about every significant published digital signature standard.

  12. Don't worry about it, except for big additions by Animats · · Score: 2, Informative
    I wouldn't worry about it. Look how much effort SCO has put into finding infringements, how unsuccessful they've been, and how much trouble they're in now. Once the SCO case is over, nobody is going to challenge Linux for a long time.

    Meanwhile, SCOX is down to 4.74 today. Volume is about a third of the 3-month average; they're falling off the investment radar. IBM's latest set of legal moves put SCO in worst shape than they've been since the litigation started. SCO has an earnings call and webcast on June 2. Tune in and hear Darl try to talk his way out of this one.

  13. Re:The only problem with that quote is... its enti by hughk · · Score: 4, Informative
    In theory, you need a CVS diff list at least. However, unless the commit comments are linked to a meaningful entry somewhere that shows where a change come from, you will have problems. It doesn't matter whether you use CVS or BK, you still need underlying mechanisms. One issue with Linux, is that it has a lot more contributors than *BSD, which tends to make things more complicated.

    In the commercial world, you have change numbers which link to a documentation trail which shows who implemented something and why and who approved it. Linus is trying at least to improve the code provenance by looking at a certification chain between the patch generator, the maintainer and eventually Linus as release manager. Unfortunately, it still looks like a hunt through LKML for the documentation as you suggest.

    --
    See my journal, I write things there
  14. Commercial SW needs better paper trails too. by Anonymous Coward · · Score: 1, Informative
    The paper-trails of Linux are far better than most corporations.

    Just because a corporation has a SourceSafe system doesn't mean people actually enter into the comments when they steal GPL'd code.

  15. [RFD] Explicitly documenting patch submission by Thoron · · Score: 4, Informative

    Linus has already acted.

    Date: Sun, 23 May 2004 06:48:09 GMT
    From: Linus Torvalds <torvalds@osdl.org>
    To: Kernel Mailing List <linux-kernel@vger.kernel.org>
    Subject: [RFD] Explicitly documenting patch submission

    Hola!

    This is a request for discussion..

    Some of you may have heard of this crazy company called SCO (aka "Smoking
    Crack Organization") who seem to have a hard time believing that open
    source works better than their five engineers do. They've apparently made
    a couple of outlandish claims about where our source code comes from,
    including claiming to own code that was clearly written by me over a
    decade ago.

    People have been pretty good (understatement of the year) at debunking
    those claims, but the fact is that part of that debunking involved
    searching kernel mailing list archives from 1992 etc. Not much fun.

    For example, in the case of "ctype.h", what made it so clear that it was
    original work was the horrible bugs it contained originally, and since we
    obviously don't do bugs any more (right?), we should probably plan on
    having other ways to document the origin of the code.

    So, to avoid these kinds of issues ten years from now, I'm suggesting that
    we put in more of a process to explicitly document not only where a patch
    comes from (which we do actually already document pretty well in the
    changelogs), but the path it came through.

    Why the full path, and not just originator?

    These days, most of the patches in the kernel don't actually get sent
    directly to me. That not just wouldn't scale, but the fact is, there's a
    lot of subsystems I have no clue about, and thus no way of judging how
    good the patch is. So I end up seeing mostly the maintainers of the
    subsystem, and when a bug happens, what I want to see is the maintainer
    name, not a random developer who I don't even know if he is active any
    more. So at least for me, the _chain_ is actually mostly more important
    than the actual originator.

    There is also another issue, namely the fact than when I (or anybody else,
    for that matter) get an emailed patch, the only thing I can see directly
    is the sender information, and that's the part I trust. When Andrew sends
    me a patch, I trust it because it comes from him - even if the original
    author may be somebody I don't know. So the _path_ the patch came in
    through actually documents that chain of trust - we all tend to know the
    "next hop", but we do _not_ necessarily have direct knowledge of the full
    chain.

    So what I'm suggesting is that we start "signing off" on patches, to show
    the path it has come through, and to document that chain of trust. It
    also allows middle parties to edit the patch without somehow "losing"
    their names - quite often the patch that reaches the final kernel is not
    exactly the same as the original one, as it has gone through a few layers
    of people.

    The plan is to make this very light-weight, and to fit in with how we
    already pass patches around - just add the sign-off to the end of the
    explanation part of the patch. That sign-off would be just a single line
    at the end (possibly after _other_ peoples sign-offs), saying:

    Signed-off-by: Random J Developer <random@developer.org>

    To keep the rules as simple as possible, and yet making it clear what it
    means to sign off on the patch, I've been discussing a "Developer's
    Certificate of Origin" with a random collection of other kernel
    developers (mainly subsystem maintainers). This would basically be what
    a developer (or a maintainer that passes through a patch) signs up for
    when he signs off, so that the downstream (upstream?) developers know
    that it's all ok:

    Developer's Certificate of Origin 1.0

    By making a contribution to this project, I certify that:

    (a) The contribution was created in whole or in part by me and I
    have the

  16. It's out now :] by Xenographic · · Score: 2, Informative
  17. Re:And ironically... by Brandybuck · · Score: 2, Informative

    When I said "you Linux advocates", I was not referring to you specifically. I was referring to the masses of posts here saying that there isn't a problem. The fact is that there is a problem. Even Linus Torvalds admits it.

    I apologize if I singled out your specific instance of this attitude.

    --
    Don't blame me, I didn't vote for either of them!
  18. Need To Show No One Previously Owned The Code by reallocate · · Score: 2, Informative

    His point is that you need to be able to document that no one else owned the code before it was merged into the kernel. If someone did own it, you need to document that they legally passed rights to the code to your project.

    What the GPL says is not pertinent to that issue. Put the SCO hysteria aside momentarily. This guy is speaking from his own experience in a very similar environment: When someone gets a lawyer and says they owned some of the code in your project, you'd better come up with documentation that proves them wrong. If you can't, it is your word against theirs.

    --
    -- Slashdot: When Public Access TV Says "No"
  19. Re:The only problem with that quote is... its enti by imp · · Score: 4, Informative

    In theory, you need a CVS diff list at least. However, unless the commit comments are linked to a meaningful entry somewhere that shows where a change come from, you will have problems. It doesn't matter whether you use CVS or BK, you still need underlying mechanisms. One issue with Linux, is that it has a lot more contributors than *BSD, which tends to make things more complicated.

    cvs annotate is an excellent first start to see where code came into the tree. Other tools allow one to see where the code really came from in the face of formatting changes and the like.

    Like I've said in prior posts, having this information is invaluable. It also allows one to more easily back out changes that might be tainted, reguardless of where they come from, since you know all the parts to that change, which is impossible with the changelog data. In this respect, bk is better than cvs since bk's change mechanism links multiple files that have changed, while CVS does not.

    In the commercial world, you have change numbers which link to a documentation trail which shows who implemented something and why and who approved it. Linus is trying at least to improve the code provenance by looking at a certification chain between the patch generator, the maintainer and eventually Linus as release manager. Unfortunately, it still looks like a hunt through LKML for the documentation as you suggest.

    You *MAY* have this, or you may not. There are many shops that don't have this level of beaurocracy. However, I've never worked for any place that has had this independent of an underlying source code control system (and many places that didn't have source code control systems, let alone change numbers).


    The issue can be further complicated if there's been a cross fertilization between projects for things like device drivers. Project A figures out how to do feature Z and project B integrates it. B then figures out Y and project A integrates that. Project C takes code from a data sheet and includes that under license X and Project A then takes it and incldues it under license Y and then Project B wants to bring it it, but is unsure if they can because they see substantially similar code under both X and Y licenses, not being aware of the common datasheet code example being present and gets confused. In situations like this, a clear SCM trail can help sort out who to talk to and how to resolve what might appear to be something bad.


    I've seen many organic patches/drivers grow up over the years in linux that are litterally impossible to track down who wrote what originally. Some have email addresses, some do not, some have had them removed, some email addresses are stale, etc. In such a chaotic enviornment, it can be difficult to know where code came from. There are many strengths to this model, but code history isn't one of them.

    Warner

  20. Re:foo by Anonymous Coward · · Score: 1, Informative
    This is why the old 4-clause bsd license enforced the notion of not being able to remove the copyright notice itself, and always giving credit for authorship of the code, plus the normal lack of warranty bits.
    This is misleading. The problem with the old BSD licence was to do with advertising not copyright attribution. In fact, what you're saying with regards to "not being able to remove the copyright notice" hasn't changed one jot from the old licence to the new. The relevent paragraph in both licences:

    "1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer."
    RMS has quotes on the internet and his fsf.org site about this, and to summarize he says that it is too much of a burden to mark the names of each and every contributor to the code.
    Any quotes you may have will be to do with advertising. It is incredibly misleading to drag this issue up again in a story about copyright attribution.