Slashdot Mirror


ESR to Shred SCO Claims?

webmaven writes "According to this article in eWEEK, ESR has released a utility called comparator for analyzing the similarity of source code trees. The technical details are interesting, in that ESR says he is using an implementation of a refined version of the 'shred' algorithm, with higher performance (on machines with enough RAM) than other versions. ESR won't say whether he intends the comparator to be used to compare older Unix code to Linux so as to be able to refute SCO's claims, but it's obviously well suited for such a purpose. Interestingly, as the shred algorithm can run reports on source trees using only the MD5 signature shreds (once generated), it is possible to use it to compare trees without direct access to the source code itself, leading to a possible use in comparing various proprietary source trees with each other and with Freely available code bases such as Linux and *BSD without requiring actual disclosure of the proprietary source code (a neutral third party could generate the shreds on a company's premises, and leave without taking a copy of the source with them). I'll be interested to see if (or which of) the proprietary vendors allow their source trees to be 'shredded' for such comparisons, and whether this becomes a standard forensic technique in source-code copyright and trade-secret disputes."

17 of 554 comments (clear)

  1. Re:maybe... by jmv · · Score: 5, Interesting

    Actually, combine this with the "shared source" program from MS and it would be easy to see if MS did (or did not) copy GPL code into Windows as some suggest.

  2. Other uses? by Not_Wiggins · · Score: 4, Interesting

    It might be interesting to see how different families of Linux/Unix compare... maybe generate a veritable "family tree" of relationships.

    Of course, that also depends more on how differences are actually calculated. Still, could make an interesting project to relate OSes based on how much shared code they still retain and show it in a graphical tree format, ala "family tree." 8)

    --
    Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.
  3. What respect? by Anonymous Coward · · Score: 3, Interesting

    Most people *I* know consider ESR to be a bloated windbag with a penchant for fanatical gunrights. He's regarded as pretty much being on the same level as the late Jon Katz.

  4. Re:Can Someone Explain? by stratjakt · · Score: 4, Interesting

    Perhaps if you parsed them both, and compared the resulting object code, right before compilation?

    That way if your variable is called numOfPorts and mine is called countOfPorts, the parsed code is the same for both, when stuff like that becomes meaningless.

    Even if not, SCO seems to be saying that much of the code is copy-n-paste anyways.

    --
    I don't need no instructions to know how to rock!!!!
  5. Be careful... by nolife · · Score: 4, Interesting

    The more points you discover and disprove now with SCO's claims.. the higher quality, more refined, and detailed SCO's evidence will be when this setup finally gets to a court in front of a judge. If they went to court two months ago or even today, they would have been sent home quickly with bascially easy to disprove evidence. With the help of the open source community, they are slowly changing their weapon of choice from a shotgun to a rifle.

    --
    Bad boys rape our young girls but Violet gives willingly.
  6. Re:Nah... by jmv · · Score: 3, Interesting

    That's true in general. However, SCO has explicitly stated that thousands of lines of code have been illegaly copied *verbatim* from System V. This tool could at least prove that they lied (because of the verbatim copy allegation).

  7. Re:fire the "laser" by be-fan · · Score: 3, Interesting

    You know the sad thing about all this? I can't tell the difference between the auto-generator or your average Slashdotter. Does this mean that the auto-generator passes the Turing Test, or that the average Slashdotter doesn't?

    --
    A deep unwavering belief is a sure sign you're missing something...
  8. IBM has a project called History Flow by TedTschopp · · Score: 5, Interesting

    This is perhaps a better project and it would be interesting to see this tool run against the source.

    History Flow The following is from their website:

    history flow
    visualizing dynamic, evolving documents and the interactions of multiple collaborating authors:

    Motivation
    Most documents are the product of continual evolution. An essay may undergo dozens of revisions; source code for a computer program may undergo thousands. And as online collaboration becomes increasingly common, we see more and more ever-evolving group-authored texts. This site is a preliminary report on a simple visual technique, history flow, that provides a clear view of complex records of contributions and collaboration.

    --
    Fantasy remains a human right; we make in our measure and in our derivative mode... -- JRR Tolkien
  9. Its been around for years by Anonymous Coward · · Score: 3, Interesting

    check out this research project coming out of berkeley CAP

    Drop in the code you are interested in and it will tell you where its found in a bunch of open source stuff, including the linux kernel.

  10. Who says SCO gets to court first? by JoeBuck · · Score: 4, Interesting

    If we can show that SCO's violating the BSD license, maybe we can convince some BSD copyright holder to sue them first, and demand as part of discovery the MD5 checksums from "shred", showing duplicated BSD code but no duplicated BSD copyright.

  11. Re:Slim to None by JoeBuck · · Score: 4, Interesting

    But IBM already has a copy of SCO's code; they licensed it after all. They can release the output of "shred" without violating their agreements with SCO.

  12. derivative work? by donutz · · Score: 4, Interesting

    Presumably any of the many people with legal rights to SCO source code can publish the hash list without divulging any of SCO's (ahem) "IP".

    Would these hashes of SCO source code be considered derivative works? That could have copyright implications...

  13. Open Source by digidave · · Score: 3, Interesting

    THIS is exactly why Open Source works. It's not because of IBM or Red Hat or geeks from Finland. It's because people in the community are willing to step up to any challenge.

    Thanks, ESR.

    --
    The global economy is a great thing until you feel it locally.
  14. Re:No source = no copyright by poptones · · Score: 3, Interesting

    Apparently your reading comprehension skills are right on par with the dolt who modded the post down.

  15. Nobody has mentioned this yet ... by Mostly+a+lurker · · Score: 3, Interesting
    As currently designed, Shred would obviously not defeat deliberate source misappropriation. If (big if) the method could adapted such that it could not be easily fooled by a determined violator (and without revealing how the code works) then I believe registration of the results should be required by law. BUT ...

    In order that the method should not be fooled by simple changes, at least the following is required

    * White space must be ignored

    * Comparison must be at the statement level, not the code line level

    * Variable names must be replaced by standard placeholders

    * Routine names, other than standard library calls, must be replaced by standard placeholders

    * (Probably difficult) logic will be needed in the tool to detect and ignore noops: how do you deal with

    i++;
    %include noop.i;
    a[i]=b[i];

    The trouble is: a high proportion of the code sections thus simplified will fall into a relatively small number of possibilities, vulnerable to dictionary type attacks. Thus, most of the code could be reconstructed, though admittedly as obfuscated source code. IMHO this provides a valid objection to its use.

  16. Re:MD5 easily fooled by dmiller · · Score: 4, Interesting

    So, you've downloaded Comparator, and run tests, then.

    I didn't need to, the following is in the readme:

    comparator does not attempt to do semantic analysis and catch relatively trivial changes like renaming of variables, etc. This is because comparator is designed not as a tool to detect plagiarism of ideas (the subject of patent law), but as a tool to detect copying of the expression of ideas (the subject of copyright law).

    He's wrong BTW (and he is smart enough to know it, which makes this a deliberate deception). A work is no less subject to copyright if someone does a global search and replace on a variable name.

  17. Re:No source = no copyright by IM6100 · · Score: 3, Interesting

    Get a clue. Nobody who copyrights a work is under any obligation to widely spread around the work. Copyright is inherent in any written work. I can write a poem intended only for my lover, just give the one copy of the poem to that lover, and it's protected by copyright. Break into my lover's house, steal a copy of the poem, and publish it, and you've broken copyright and I have standing to nail you good for it.

    Patents, in order to be patented, need to be fully disclosed. That's inherent in the patent process, you're saying 'this is MY idea, here's the whole deal laid out, I assert that it's mine.' There's no comparable oblication for copyright.

    People like you who try to mush it all up are just trying to loot other people's property.

    --
    A Good Intro to NetBS