SCO Berates Linus' Approach To Kernel Contributions
Matthias_305 writes "The New York Times has an article about a new court document in which SCO critizes Linus Torvalds touting the 'inability and/or unwillingness of the Linux process manager, Linus Torvalds, to identify the intellectual property origins of contributed source code.' They claim to have got evidence from a conversation on the kernel mailing list in which Torvalds advocates programmers shouldn't care about patents. According to the article he stands by his view which is at least 'candid'." On a related note, BobDowling points to a proposal at The Inquirer ("Shutting down SCO's FUD machine") regarding SCO's claims. "SCO won't let people see the contested source code without signing an outrageous NDA but the article gives a mechanism for publishing appropriate MD5 checksums which allow code trees to be compared without anyone else seeing the code. This is offered as a means to locate the source of SCO's contested code. ... This mechanism gives a concrete procedure that SCO can be challenged to follow as part of the community's "put up or shut up" response. There would be no threat to SCO's claimed IPR."
Is it just me, or is SCO now acting significantly more "evil" than Microsoft is? Talk about frivolous legal harassment! This is just one of many stories like this.
'inability and/or unwillingness of the Linux process manager, Linus Torvalds, to identify the intellectual property origins of contributed source code.'
Seems they want to bully Linus to present the evidence for their cause they failed to present. This seems at least irrational to me.
SCO's approach seems to scare everyone that Linux is illegal dynamite, waiting to blow a hole through their purses. If they're really concerned and ethical, should they not go upfront and declare the violations in the code and be done with it?
Secondly, what if someone had poisoned the code over a period and SCO's blowing the whistle now? Something like the tcpdump files getting infected with a trojan?
If you keep throwing chairs, one day you'll break windows....
Did anyone RTFA, Linus' comments struck me at first, until I realized that having engineers trying to be patent lawyers just makes the situation worse. However the quotes are quite bloody loaded and could do substantial harm.
My advice to Linus would be to get yourself a good intellectual property lawyer, and talk to him about all your potential liability issues. Do it now.
A point seems of major importance and I have not seen it addressed so far. In such a situation, how could SCO prove they did not steal a part of the linux kernel ? Is there an official organism in the US where companies can register source code for future legal problems ? If not, how is that supposed to work ? Experts would look around at SCO's and get convinced (or not) by the internal memos and CVS logs ? I know we are talking about the US legal system, but that's totally surrealistic to me ...
Like a Microsoft smear campaign. SCO clearly isn't concerned about 'resolving IP issues' -- if they were, they would only have to produce the code and show they have clear title to it, and I'm quite certain the 'programmer's should ignore patents' Linus they are slamming, would quickly remove the code. Problem solved.
Judging from IBM's stance, SCO probably doesn't have have the goods to win anything in court; and they won't even have good FUD if they tip their hand. Clearly the gain for SCO is coming from another source. I'd love to know how much Microsoft is paying them for those 'unix patent rights' it clearly doesn't need. I'm certain if we could see the flow of money from Microsoft to SCO (or at least SCO's execs), all would become clear.....
What I don't understand about the SCO/IBM case, is why IBM isn't taking action to immediately stop SCO from doing what they are doing. I am sure it must be affecting their AIX business, and I can't believe that there isn't a legal method they can use to take some kind of cease and desist out on SCO.
If such a law doesn't exist in the USA, does that mean Pepsi can say they have proof that Coke has dog poo in it, but they aren't going to show the proof? I doubt it somehow.
Furthermore, if SCO are doing these things just to manipulate their share price, and the allegations turn out to be baseless, surely that is fraud?
I have not seen any post from any SCO people standing up for or against anything lately. Can SCO management legally gag their employees during this litigation? Not trolling or stirring, just deafen by the silence.
If there were no IPR (and thus no copyright), all source code anybody publishes could be used approximately as if it was published under a BSD license today.
It would always be possible to publish only binaries, but it would not be possible to restrict distribution of these licenses. (It would also be allowed to re-engineer the binaries.)
So while we couldn't have the protection that the GPL offers today, we would have BSD-like Free Software (you don't deny that the BSDs are FS/OSS?) plus the right to re-distribute, change or disassemble any binaries anybody might publish.
The shredding was done in 5 line groups but on each line.
Thus while A B C D E wouldn't match Q A B C D the next hash value in file 2 would be A B C D E which WOULD match.
The idea was that the process would ONLY hit on 5 line matches to avoid all of the things like #include <stdio.h> hits.
--- I wish I could hear the soundtrack to my life. That way I'd know when to duck.
Sounds like a PERL CHALLENGE!!!
Usage: ./script.pl < code.c
http://www.cs.helsinki.fi/linux/linux-kernel/2002- 32/0160.html
A friend of mine yesterday was telling me that an older version of SCO he has has some excellent tools that he has never seen anyone else write. I dont have any details, but if anyone knows what tools these are write a GPL version and release (out of the US of course) just to piss SCO off. And do it quickly, before IBM destroys them.
- The procedure throws away all code pieces
which occur more than once in the same version
of the code. Okay, most of them will be
trivial, but there might be some that aren't.
These pieces aren't compared to the other
version of the code. Might be an idea to use
a frequency threshold instead.
- During comparison of the two versions, all code
pieces with the same checksum are disregarded.
But different checksum does not mean different
code! MD5 are computed on string level - let
there be an additional comment, or a linebreak,
and you won't get a match. Some simple
operation to bring the code into a kind of
canonical formatting can take care of that.
If you don't do that, you run the risk of losing some correspondences, I'm afraid.The following describes the common sections found by the Inquirer reader (although I have only looked at the linux source files).
Of course, this assumes that the line numbers the Inquirer published are for the linux files and not the BSD files (why did they only publish one set?!?)
Do I smell Microsoft and SCO lawyers having coffee and cookies behind the doors ?
Can we please keep these SCO stories in the Caldera category? I literal signed up for an account solely so that I could filter them out. Now you're disguising them as Linux stories!? Argh!!
"If source code is copied from protected Unix code," the SCO document adds, "there is no way for Linus Torvalds to identify that fact."
True. It is impossible to find out if someone else has a tradesecret.
But then there is a lawyer problem. Linus (or some other kernel hacker) puts a GPL tag on the source. Is he/she allowed to do that? Is the GPL legal? The point is, who GPLed it, and does that make the GPL viral? (And is the Sys V copyright viral?)
They could always sue linus for being a basterd that who not care . He called himself that multiple times in interviews.
SCO has said that they are afraid that if the lines are known, the problem will be fixed and they won't be able to sue any more.
Where have you seen that?
- Following that logic it would mean that if you stole somebodys stereo, and gave it back some point later, then you would not have done anything illegal.
> (BTW, Sun invested $5M in SCO/Caldera in 2000)
$5M is only 4 or 5 E10k or E15k servers. I'd hardly see any major connection there.
This is the same sort of advice I got in my Technical Writing class.
As you know, plagiarism is a very serious offense in many types of writing. It can be serious enough to get a student disqulified or expelled.
The less you worry about what other people have said on the subject you are writing about, the less your chances are of plagiarism. Writers who constantly check their sources for exact phraseology to ensure they aren't "infringing" actually tend to use idioms and vocabulary from the sources they are trying to avoid copying.
Someone posted in the SCO article yesterday, that SCO was in a stock scam, and that their aim was to make money for the board for a while by keeping a high volume in the press before going under when the actual court case proves they do not have any real basis in their case.
I agree that SCO must be one or more of the following things:
1.SCO is indeed doing a stock scam as their actual products are close to worthless. An SEC investigation would be very apropriate here, but would only happen after the fact, sadly.
2.SCO is being funded by another party to persue this scheme, the most likely candidates being Microsoft or SUN, both of whom have a vested interest in seeing Linux and IBM suffer. I would go for Microsoft because while SUN has something to gain in seeing IBM suffer, they also have something to lose if Linux suffers. Microsoft is the only party that has something to gain if both Linux and IBM suffer. It would need a leaked email or something to start the ball rolling on an investigation into this side of the matter though. I also wonder at the same time why no leaked emails have as yet appeared from any SCO employees.
3.SCO's products are absolutely worthless and SCO is indeed trying to do a last ditch fight in order to legally force some kind of artificial marketshare for it's products. The fact that SCO has changed it's public statements on numerous occiasions and even changed the official claim recently (IBM bypassing export controls even though it is no business of SCO to enforce this and the RCU claim which is as patchy as well), means that SCO knows it is on shaky ground. The latest official claims show that SCO is indeed scraping the bottom of the barrel and are truly frightened by the fact that IBM hasn't taken them seriously. Their lawyers nerves must be blank. The accusation against Linux is simply something they are doing in order to try and strengthen their claim. It does however mean that they are actually pouring through every piece of available information in order to come up with some kind of evidence because they truly do not have any that would stand up in court.
The only thing that worries me is that Linus should perhaps learn when to shut the fuck up and think before he speaks. Courts are not democracies and crap like his statments on patents can and will be used against him.
Its amazing that SCO has, in a relatively short time, taken over the reigns of "most reviled on slashdot" from the usual list of suspects including: Rambus, Microsoft, RIAA, MPAA, UCITA, DMCA, AOL, DRM....
Just to be clear, U.S. patents expire 20 years after filing. Patents, unlike copyrights, have been holding the line against term expansion pretty well over the years. Copyrights (though theoretically limited to life of the author + about 70 years) keep getting extended so as to be effectively perpetual. Trademarks are yours as long as you use and protect them.
Also, there is _always_ an economic incentive to invent something better. If I invent a better mouse trap, people will buy it over your old-but-still-patented mousetrap.
A corporation and bunch of lawyers won't ever invent anything and shouldn't be allowed to own a patent
Some fields of research require a $5M or a $50M laboratory and a team of twenty. There are some inventions that will not and cannot be made with a chemistry set in someone's basement. (Of course, software generally is not such a field.)
the patents should be non-transferable and with a relatively short patent period.
We already have a relatively short patent period: 20 years. Formerly patented material continues to pour into the public domain every single week, unlike copyright.
Patents already have periodic maintenance fees that must be paid every few years during the 20-year term.
These fees increase in size in the later years and they are higher for large companies than for small ones. (Both notions that might help with the copyright problem.)
Non-transferrability, now, there is an interesting idea. Let's talk about that.
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
It is based on something like this:
- Preprocess the code (replace all variables with the letter 'V', strip the comments, replace white space strings with a single character)
- Divide the result into fixed sized units of length k that overlap, each starting at a succeeding character. They call these k-grams
- Efficiently calculate a hash for each of these k-grams
- Divide the result into windows that contain a number of these k-grams
- Within each window, use a method of selecting a subset of these k-grams that does not depend on position, but rather on the k-gram itself, such as the minimum hash value within that window; if there are ties, select the right-most hash value within the window
- The result is the fingerprint of the code
- Any document with fingerprints in common has some code in common with the original source
Okay, that's a very rough idea of the process, but you might have some idea of it now. Check it out yourself if you're interested.The MD5 idea is a good idea, but I think it needs some refining.
//) [obviously additional rules for script files, maybe #]
- You want to get EVERY example, for potential manual review
- You want to avoid any problems with white space leading to different MD5s for "identical" code
- Doing a 5 line compare seems flawed as what if you compare lines 1-5 in A and B, but lines 1-5 in A match lines 2-6 in B
I therefore propose that:
1. before calculating any check sums, both files should be massaged into some common "base" format.
- Remove all white space inc. tabs and spaces
- Concatenate on one long line, but line break immediately after any semi-colon (;) or end-comment (*/) or immediately before begin comment (/* or
- With comments, line break at least every (say) 20 characters or if there was a line break in original file.
- Maintain some kind of map back from massaged file to Linux source (line 237 in massaged = line 40-42 in Linux source)
- In the massaged file, mark any line less than say 20 characters in a non-comment section as being potentially and probably too small to be copyrightable. This would eliminate stuff like i++; or #include . Matches for these should still show up in the overall results, but be considered as less important unless there are also lots of "more important" matches in the same source file as well.
2. Run both sets of sources thru this algorithm, and calculate two or more hashes for each line, say MD5 and some kind of CRC. If both sets of sources match for all the hashes, a match is found. This is to reduce number of false positives.
The NY Times article had a surprisingly insightful closing quote.
/. readers, but it's refreshing when a mainstream article makes this point explicit. Slowly, perhaps, the general (non-geek) public will understand open source software and the issues surrounding it.
Indeed, because Linux code is published publicly, it is easier to track what I.B.M. contributed to the operating system. But the issue, of course, is whether SCO's Unix license covered any of the code I.B.M. put into Linux.
Should the SCO suit turn up any offending code, the open nature of Linux â" and the many programmers working on it â" will ensure a quick solution, according to open- source software experts.
Now, that should be old news for
Phiwum's law: anyone that names an obvious law after himself and then puts it in his own sig is just pathetic.
In all reality, microsoft has been paying SCO big bucks to do these things; why would a unix distributor decide to go completly nuts like this all of a sudden out of the blue? Something doesn't fit here. Then Microsoft comes along and buys a crapton of unix lisences for an undiscoled amount of moolah. So it's obvious (and anyone who doubts me is an idiot supreme) that SCO is doing this for Microsoft.
I don't think any major victory will come about for the linux community. Most likely, the linux source code will be contested for several years while hackers ignore the law and their contracts and do what they will, as they always have.
Something you've got to understand here is that microsoft doesn't want to spread fud about linux. They'd rather get the IP and whatnot for it, contest they own it and then sell it.
SCO will use the same tactics (and most likely, legal department) microsoft has to try to get the rights to use linux any way they feel fit. And if that happens, then I think there will be an outcry from the open source community, and probably people organizing to take down microsoft's computer network in it's entirety. I'd be angry if some multibillion doller corperation destroyed my community, way of life, and said my code was theirs and there was nothing I could do about it. Angry enough to get out the souce code, start finding errors and make a killer virus with the same people with whom I built the thing with.
And fine, mod me down if you must but there's only one resolution in a democratic system after all nonviolent solutions (including protest, due process, etc) have been tried, and that's violent protest, aka, war. I don't like violent protest, but if that's what they want to start than that's what they are going to get.
Candy-Coated Knowledge
White space would obviously change the MD5 right? So all the infringer (or someone trying to hide the infringement, to take the argument SCO might use) would have to do is add some space here and there and the MD5 won't be at all the same. I don't think that's a valid method to determine if the code is stolen or not.
Random is the New Order.
SCO won't let people see the contested source code without signing an outrageous NDA
This SCO thing is really starting to f**k me off. It's all just insubstantial FUD with sod-all solid facts. SCO's even looks like it's aiming it's guns at BSD - which is crazy as there has NEVER EVER been any System V code in any of the BSDs. I'm of the opinion that SCO's strategy is to declare total war on the entire Unix community in the hope that people fold. Criticize SCO and you'll be next....
So my message to SCO has to be put-up or shut-up.
I'm not a Linux user, I don't hack kernels. But I can find my way around source-code. So come on SCO I'll sign your fscking NDA to see what you're carping on about. I'll even check it against the BSD sources too....
Do you mind, your karma has just run over my dogma.
I'm wondering why nobody in OSS community is considering hitting SCO where it hurts - in the pocket book.
Is there any legal or other reason why this is not feasible?
1. Find out what applications run on SCO and drive their O/S sales. Then write free clean-room, better, free, open source versions to run on any platform (Linux, other UNIX, Apple, even Windows) other than SCO Unices
2. Some Linux organization in each country, contact all Linux developers in that country and invite them to a free legal briefing. The company can pay their travel, etc, and for a lawyer to come brief them about their legal rights, etc. This would be peanuts for a big company, and would get the ball rolling if Linux contributors wanted to follow the example of (and actually follow up properly on) the guy who sent the Cease and Desist about SCO's Linux distribution of his GPL code in Germany.
I too have been explicitly told not to do patent searches when I come up with an idea, for exactly that reason.
I was also told explicitly that, partially because of the above, violating someone else's patents is essentially unavoidable. Therefore the purpose of patenting things is to have a bigger stick when you sit down to discuss any patent dispute.
So you have tons of engineers inventing things and deliberately remaining ignorant of whether anyone else has invented it, then trying to patent it for the sole purpose of ensuring that if it turns out to already be invented, that inventor will be violating some other patent that did get through.
Does that not sound completely fscked to anyone?
The enemies of Democracy are
Changing space would change the MD5 sum, yes, but that is easily normalised out by feeding both code bases through either the same code pretty printer or through a simple sed filter which replaces any string of whitespace with a single space character prior to the chopping and checksumming process.
I'm old enough to remember when discussions on Slashdot were well informed.
IBM knows how to handle public relations, they have some experience in the matter. Slashot (especially Slahdot) shouldn't be trying to play with the big boys by printing every story they see.
If SCO wants to claim Linus was behind the WTC terrorist attack, let them. If they want to sue Alan Cox for 98 billion dollars, let them. Whether or not they have a case is irrelevent (and for any mis-quoters out there in SCO-land, you don't). You do not ever argue with a fool or a drunk.
Leave them shout from the hill-tops that the entire world is against them, let them buy the senators if they want to/can afford to, if you ignore them, they'll have nothing left to do except go to court and show IBM's big hairy lawyer their ass.
If and when this does go to court, OSS doesn't need SCO to be able to say "look at these terrorists, they saying stealing is ok".
Whilst the tech-news sites keep reporting this as the story, SCO has no reason to go to court because having their name in the NYTimes is what they want, it doesn't matter if their in the cookery section with a recipe for SCO-nuts (Tm) or whether they have a fifteen page expose on Linus and his law-breaking dog (does Linus have a dog, I don't know, geeks seem to like cars more..go figure).
While everybody is running around like headless chickens talking about how SCO is evil,SCO is wrong, no way can SCO be right.... the SCO-board are sitting in their office laughing at your comments. They set up the fire and you are stoking it for them.
They know their going to lose a court-case against IBM, whether by fair or patent-foul. They don't want the court-case, they can't have the court case. What they want is "IBM" to appear next to "owe 3 billion" or "can't be trusted" in the Post or Times so the Big Blue has no choice but to make it go away.
If the tech-savy readers start talking about this to their friends, their friends become interested and so will the Post and Times, suddenly the headlines appear...(yes, more often and wilder than at present)
Ignore the lunatics, ignore the fools and the drunks, this will go away if you let it go away, it very likely doesn't involve you, don't demand retribution, don't demand countersuits. If you must do something find a nice OSS-hackers email address and express your support, email IBM and tell them you love them (all they really want is love :). If you want to be more substantial, donate to the EFF, donate to Linus, donate to the FSF or the OSI.
You can't win by shouting louder, you only lose your voice.
An idea: the method of 'shingles' (fingerprints of n-grams of lines/words/characters) could be used for creating a big, shared repository of copyrighted code -- without the code. This can avoid this kind of claims in the future, without the need of manually checking for every line of code contributed to open source projects.
A 'client' program is run by people that have access to copyrighted code. Then program generates the fingerprints, that are uploaded to the repository (including information about the copyright holder, software name, version, filename, linenums, fingerprint). Whenever anyone wants to check if a piece of code is copyrighted, s/he can generate the fingerprints and compare them against the repository.
False positives?: MD5 checksums in general don't collide. Poisoning?: probably a person can upload huge amounts of fake MD5 checksums. That's why some redundancy is necessary: an MD5 checksum is valid if it has been uploaded by at least X people.