ESR to Shred SCO Claims?

← Back to Stories (view on slashdot.org)

Posted by michael on Tuesday September 9, 2003 @09:52AM from the woodchipper dept.

webmaven writes "According to this article in eWEEK, ESR has released a utility called comparator for analyzing the similarity of source code trees. The technical details are interesting, in that ESR says he is using an implementation of a refined version of the 'shred' algorithm, with higher performance (on machines with enough RAM) than other versions. ESR won't say whether he intends the comparator to be used to compare older Unix code to Linux so as to be able to refute SCO's claims, but it's obviously well suited for such a purpose. Interestingly, as the shred algorithm can run reports on source trees using only the MD5 signature shreds (once generated), it is possible to use it to compare trees without direct access to the source code itself, leading to a possible use in comparing various proprietary source trees with each other and with Freely available code bases such as Linux and *BSD without requiring actual disclosure of the proprietary source code (a neutral third party could generate the shreds on a company's premises, and leave without taking a copy of the source with them). I'll be interested to see if (or which of) the proprietary vendors allow their source trees to be 'shredded' for such comparisons, and whether this becomes a standard forensic technique in source-code copyright and trade-secret disputes."

17 of 554 comments (clear)

Min score:

Reason:

Sort:

Doubt it will help by Brahmastra · 2003-09-09 09:57 · Score: 5, Insightful

I think the question here is not about whether there is common code between SCO and Linux. There is no doubt that there will be common code because of the common origins. The issue here is that SCO does not own that code.
1. Re:Doubt it will help by djh101010 · 2003-09-09 10:00 · Score: 4, Insightful
  
  If there's going to be a line-by-line comparison, this is the tool to do it. Once those lines are identified, *then* it's simply a matter of finding out the origins of them; that's where we can roll it back to a textbook published in 1973 or whatever.
  
  Until the lines that are common are identified, it's impossible to defend against the accusations. Because of that, I bet Darling Darl won't allow it to be used. The question is, how to turn the inevitable refusal into something that shuts him (up|down).
2. Re:Doubt it will help by Azog · 2003-09-09 10:37 · Score: 5, Insightful
  
  Well, this would still help determine what the common code is.
  
  If ESR is given the big list of MD5 sums of SCO's kernel by someone who has legitimate access to it, and he runs his shred tool to compare it to the Linux kernel, and a bunch of stuff turns up matching (as expected) he can still see WHAT was matching because he has the Linux sources.
  
  So then he can look at that and say, "hmmm, it looks like part of this ethernet driver is the same, and this NAT implementation, and bits and pieces of the VFAT filesystem code..." and then, find out how those got to be the way they are in Linux.
  
  If it can be proved that the matching code is totally legit in Linux, (which is what I would expect) then it follows that either (a) SCO actually stole stuff out of Linux, rather than the reverse, or (b) Linux and SCO both took the code from a third source, like BSD.
  
  Otherwise, option (c) is that Linux actually contains code from SCO which it should not. But this is still an improvement on the current situation, because it would allow the Linux development team to FIX THE PROBLEM.
  
  Either way, (sooner or later, depending on if Linux fixes are required) it will shoot SCO's claims so full of holes that any reputable journalist reporting on SCO's latest insane claims will have to mention that "... but the source code has been analyzed and all code in Linux similar to SCO's software has been shown to be completely legitimate...", or "... but all code in Linux which SCO might have had a valid issue about has been removed..."
  
  SCO's big stick right now is FUD. Fear, Uncertainity, and Doubt. The shred tool can remove the uncertainty and doubt. Only SCO will still have the Fear. :-)
  
  --
  Torrey Hoffman (Azog)
  "HTML needs a rant tag" - Alan Cox
Nah... by SargeZT · 2003-09-09 09:58 · Score: 4, Insightful

This shouldn't be relied upon in the court of law. Although I acknowledge that SCO likely has no IP claim over Linux, it should have a fair case. A program that would rule out code similarities does not rule out code that is based on the SCO code. There are hundreds of ways to do a single thing, and if the GNU/Linux took ideas from the SCO kernel, SCO may be as eligible for compensation as if it were directly copied from SCO.

--
And why did you staple the trout to the RAM?
Re:SCO! by mik · 2003-09-09 10:05 · Score: 5, Insightful

The point is that we don't need SCO to do anything. Presumably any of the many people with legal rights to SCO source code can publish the hash list without divulging any of SCO's (ahem) "IP". Even more interesting is the theoretical possibility of comparing historical releases of SCO trees against GPL-licensed code, thus (perhaps) demonstrating that SCO has illegally violated the IP of OSS developers. Of course, hash comparisons alone would be unlikely to convince a judge/jury of anything. They ought to be sufficient grounds for some embarrasing subpoenas, and maybe some really neat cease-and-desist orders, though.
Slim to None by tomRakewell · 2003-09-09 10:10 · Score: 5, Insightful

Chances are slim to none that a software company would allow it's "shredded" source code to be publicly released. What happens if the proprietary source is found to violate the GPL?

Proprietary (closed) source companies have a tremendous advantage over open source software when it comes to violating intellectual property. Who will ever know if they did it? A source code "comparator" eliminates that crucial advantage.
Results Will Appear "Tainted" by zapf · 2003-09-09 10:11 · Score: 5, Insightful

While I fully support ESR and the rest of the open source movement's defense of Linux against SCO, I have a feeling that this tool's results will not immediately be accepted by established media simply because of ESR's bias. A reporter looking into the SCO story who knows little about open source wouldn't trust a tool made by one side of the disagreement.

It seems very important to me that "third parties" and experts who are not an integral part of the open-source movement validate that comparator works as intended and is effective at detecting code similarities. Hopefully we'll see some articles on respected sites in the next week or so with conclusive analyses of comparator. Not to mention a chance for someone to use it on SCO's code!

Oh, and "Yes, I'm being deliberately vague and tantalizing" is quite funny.
1. Re:Results Will Appear "Tainted" by Brandybuck · 2003-09-09 11:26 · Score: 5, Insightful
  
  A reporter looking into the SCO story who knows little about open source wouldn't trust a tool made by one side of the disagreement.
  
  Then why would a reporter trust the press releases that SCO puts out on an daily basis?
  
  The unfortunate reality is that they DO trust them. We may all think this is a joke here in our insular community, but the great majority of reporters report the press releases "as is". Then the analysts come along and refine those press releases into easily digestible chunks. Then the pundits come along with preconceptions based on those chunks. Ever wonder why the SCO stock keeps going up and up and up? It's because the only thing the general public knows about this issue has come from SCO.
  
  Anything that can help get the truth before the public eye is a Good Thing(tm). A tool that can mathematically "prove" that SCO is lying is valuable, even if most reporters suspect a bias.
  
  --
  Don't blame me, I didn't vote for either of them!
Re:Nonsensical idea by El · 2003-09-09 10:15 · Score: 4, Insightful

Comparing the hashes doesn't give you a definitive answer; it does, however, tell you where to look. Or which submitters to ask for clarification on the origins of potentially infringing code. That's more than we have now!

--
"Freedom means freedom for everybody" -- Dick Cheney
What if...? by bladernr · 2003-09-09 10:34 · Score: 4, Insightful

What if this ESR tool runs and finds commonality, and the research shows that, in fact, SCO's rights were breached. Remember, this type of analysis is a two-edged sword. The purpose of this ESR is to remove doubt... but remember doubt could be removed either direction.
So, given that hypothetical, what would people here think? Would you forive SCO? Would you concede SCO's point, but think that SCO defended their rights in a very poor manner? (this, btw, is what I would probably do). Would you stick your fingers in your ears and refuse to accept the outcome, and believe in some vast -wing conspiracy?
Obviously, the Linx movement would carry on. I don't think the death of Linux is even worth discussion. Some recourse would happen, probably monetary damages, and the offending code would be removed.
My real curiosity is how people's attitudes or feelings would change (or not change) if it turns out SCO is right (however unlikely that is).

--
Sarcasm and hyperbole are the final refuges for weak minds
Re:maybe... by fireboy1919 · 2003-09-09 10:48 · Score: 4, Insightful

Right. Because as we all know, people who pay Microsoft the huge bag 'o money that it costs to see their source are primarily interested in the pursuits of OSS to see if Microsoft has copied anything it shouldn't have. And Microsoft's NDA surely gives them the right to do this.

If anyone is able to prove Microsoft is doing something illegal via the shared source initiative, they'll probably have to do it illegally.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:In all fairness.. by Dr_Marvin_Monroe · 2003-09-09 11:38 · Score: 4, Insightful

In all fairness, SCO's value is not in being purchased so that the source code can be freed...

SCO's value is in acting as a totem against future companies who would try this same stunt....Their value is in their smoking carcass with Daryl's chared head mounted promanently on a high pike...

At this point, there can be no comprimise with people who commit fraud to inflate their stock price and to promote FUD.... I believe that Daryl KNOWS that his claims are false...he deserves to fry....

I say, "smoking head on stake" for all the SCO/Canopy group members.... leave all the execs at SCO without a job and discredited like the MCI/ENRON execs....Leave all the investors holding worthless stock certs....Somebody needs to be an example, and SCO volunteered by inflating/changing/hyping/FUDing their claims.

I could have had a little sympathy for them if they had just filed their suit and shut-up until the trial....but at $17/share now, we need to destroy some wallets to remind everyone that it's not over till the gavel falls......
Re:This is actually a darn good idea by Trailer+Trash · 2003-09-09 11:56 · Score: 5, Insightful

So, this method of identifying copied code would only work if the code had never been run through an obfuscator.

You've hit the nail on the head, possibly without knowing it. The source code needs to be run through an obfuscator *before* shredding. Actually, I'm thinking a special obfuscator, let me explain.

Let's take a piece of C source, not randomly chosen:
malloc(mp, size) struct map *mp; { register int a; register struct map *bp; for (bp = mp; bp->m_size; bp++) { if (bp->m_size >= size) { a = bp->m_addr; bp->m_addr =+ size; if ((bp->m_size =- size) == 0) do { bp++; (bp-1)->m_addr = bp->m_addr; } while ((bp-1)->m_size = bp->m_size); return(a); } } return(0); } Now, the structure of the code is 99% of what matters. Variable names can change, but few people would change anything beyond that. Let's modify the code in a couple of important ways. First, all variable names are changed to new names, on a per-line basis. Blank lines and unneeded blanks are all removed. Each statement is on its own line, and formatting styles (such as curly bracket placement) are standardized. malloc(a, b) struct a *b; { register int a; register struct map *b; for (a=b;a->c;a++) { if (a->b>= c) { a=b->c; a->b=+c; if ((a->b=-c)==0) do { a++; (a-1)->b=a->b; } while ((a-1)->b=a->b); return(a); } } return(0); }
This might not be perfect, but it should do the trick. A programmer can change variable names, spacing, or format, but as long as the code is the same, it'll match. Obviously, changing the code would have an impact, but nearly every line would have to be changed for it to not match, and in a substantial way. That's literally not always possible to even do in a way that would trick this function.

Anyone want to write it?

Michael

--
Do you have ESP?
No source = no copyright by poptones · 2003-09-09 12:03 · Score: 4, Insightful

This entire argument is happening for ONE reason: various governments of the world )specifically, in this case, the US) has afforded COPYRIGHT protection to works that contribute nothing to "furtherance of the state of the art" and nothing to "the progress of science." If I build a power saw, I can patent unique aspects of its design but have to reveal those aspects.
Copyright is misapplied to source code. Either REVEAL THE SOURCE or you only get protection on that which you "publish" - namely, the binary.
Put up or shut up; no source, no copyright on the source. You won't share it, you don't need it protected.
Re:Who says ESR can't code? by finkployd · 2003-09-09 12:33 · Score: 4, Insightful

Actually fetchmail proves that he can code.

This program just proves that md5 is not the correct hash for doing this kind of comparison. It is TOO GOOD of a one way hash, and will only return is positive if the lines being compared are 100% equal.

Finkployd
Re:maybe... by Courageous · 2003-09-09 13:31 · Score: 5, Insightful

And Microsoft's NDA surely gives them the right to do this.

A term in any contract, including any NDA, as stipulated by any party, which would obligate the other party to not report a violation of law, either statute or criminial, is PER SE unlawful and cannot be enforced within any jurisdiction of of most first world countries. Any contract bearing such a stipulation would in fact be at significant risk of invalidating the ENTIRE contract, not just the unlawful provisions therein.

C//
Re:derivative work? by Wumpus · 2003-09-09 13:43 · Score: 4, Insightful

Let's apply a powerful legal tool: The silly analogy.

Take a copyrighted work (Harry Potter and The Chamber of Secrets, for example).

Now, rearrange all the letters randomly, and pick (say) every 10th letter. Apply rot13 to the result, and print it.

Is this derivative work? If you think it is, then, yes, copyright holders should be able to control MD5 hashes produced from their work.