More on SCO Code Snippets
anoopsinha writes "A story in linuxworld reports that SCO itself has no idea what the history of a particular snippet of code might be - even a high profile snippet like the one SCO highlighted at SCO Forum. Having no idea if its claims have merit has not stopped SCO so far, so we can expect more from SCO along the lines of big claims with no merit."
First off, you fail to explain or indicate why you think Linus Tourvalds is being hypocritical.
SCO is giving the open source community a look at the problems in the code
This is a flat-out lie.
the time has come to actually step up and figure out what's going on.
"The open source community" has TRIED, time and time again, to figure out what is going on. SCO will not *tell* anyone. The Linux developers community WANTS to have the copyright issues resolved. However, they cannot read SCO's mind! There is a clear and documented method of dealing with copyright infringements in the linux kernel; the time and source of all contributions is logged, and if at any point someone identifies infringing code it can be noted as such and removed. HOWEVER: The linux community cannot remove SCO's code unless they know what it is!!! SCO ardently refuses to give any indication what this mystery code is.
The fact they seem intent on preventing the linux developers from gaining the information of what to remove to fix the infringement has led some to believe this code does not exist.
the people that leech off the hard work of others
Who do you refer to here?
Just in case someone doesn't understand this and wants to know more about what "shorting" a stock means:
http://www.fool.com/FoolFAQ/FoolFAQ0033.htm
First of, small nitpick: BLAST doesn't look for homology. It looks for identity. Homology refers to a speciation event separating two genes. On the basis of identity, you might hypothesize homology, but mere identity doesn't necessarily lead to homology. Translation: it's not possible to have homology between two sequences of the same species which an ORGN{HUM} BLAST, FE, will turn up. (Yes, for those of you keeping score, I've made the error before, and you'll ocassionally see this mistake in publications.)
BLAST is designed to deal with a specific set of optimal string alignment problems, namely the matching of nucleotide to nucleotide or amino acid to amino acid. (Actually it's even narrower than that. It's really good at finding a match of a small sequence against a database of millions because of the way it idexes the database... but you can read the paper describing it if you really want to know.) It acts on the assumption that any change is as valid as any other change (ignoring secondary, tertiary and quatenary structure.) Because of this, it's not well suited to determining distances between two code bases where what the code says actually has testable meaning.
In this case, we're pretty much stuck to wandering through the code samples by hand with judicious use of grep and diff.
http://www.donarmstrong.com
Um, what's in it for Darl?
Arrg!
No, BLAST won't work. ESR's SHRED won't work. These are, at heart, text matching algorithms, which are easily defeated and of little relevance. Let me explain.
Any simple code obfuscation techniques (changing variable names, adding/removing comments, inserting newlines, changing for loops to while loops, etc.) will totally defeat SHRED and will likely give BLAST a hard time, if not break it entirely.
Why? SHRED searches for lines with identical MD5 sums. If every/most line of purloined code has been changed, even slightly, SHRED fails. BLAST works by finding "seed" regions of identity and then growing those regions out to "near matches." Unfortunately, the idea of a "near match" is a lot more clear cut in DNA/protein than in code, and the initial seeding breaks if the code has been obfuscated at all.
SCO would (wisely) never accept a negative SHRED or BLAST result as proof for just these reasons.
What is necessary is a comparison of the code structure, NOT the simple text of the code. Stanford's, for example (and many other) CS department detects cheating by chewing through source files and turning them into an intermediate representation (think: parse tree) which describe directly the STRUCTURE and FUNCTION of a bit of code in a way that is completely divorced from the text of that code. To find out if people cheated, they compare the parse trees from their code -- not the text of the code.
In this way, they can easily detect (with a surprisingly low false positive rate) when two pieces of textually different code actually stemmed from the same source (but one was then obfuscated to cover up the cheating.)
This is the way to compare code fragments. Not borrowing text-matching (or near-matching) from unrelated disciplines.
This just shows how thorough SCO really is. The article says: That this code "emanated" from SGI was news to SCO.
:-)
Well, the linux code clearly states
* Copyright (C) 1992 - 1997, 2000-2002 Silicon Graphics, Inc. All rights reserved.
I hope they know how Silicon Graphics Inc. relates to SGI
>There may still be many lines of code that were stolen from SCO Unix.
For the zillionth but I'm sure not the last time, according to the US Supreme Court, copyright infringement is not theft.
If you were blocking sigs, you wouldn't have to read this.
Yes, IBM has made it clear with the Canopy subpoena that they intend to "pierce the corporate veil" between Canopy and SCO. Ralph Yarrow may want to take the precaution of deeding his house over to a trusted family member, just to make sure he'll have a place to live when this is all over.
Please mod parent up, this is the most significant development in the fiaSCO so far.
For those who aren't familiar with the legend of Godwin's Law, cheack out How to post about Nazis and get away with it - the Godwin's Law FAQ. Although Godwin's Law is technically a USENET thing, it is frequently mentioned in regards to long /. threads, topics, and the like.
perl -e 'print $i=pack(c5, (41*2), sqrt(7056), (unpack(c,H)-2), oct(115), 10)'
Your link was a bit off. The BLAST manpage is more helpful, and contains a brief description of the algorithm. The algorithm incorporates certain assumptions about biological sequences that are not reflected in computer code. A good bit can be reduced to simple Bayesian logic.
It may be possible to devise a similarly minded algorithm that describes the evolution of computer code (using, perhaps, the CVS trees of large projects) but such a project would be a massive undertaking.
Oops, working link. Well, working at the time of posting. ;-)
If you were blocking sigs, you wouldn't have to read this.
SCO's comments do not indicate that it was news to them that the code emanated from SGI. All they said was that word got out that SGI was the source, and they were fielding a lot of questions about it, so they went public with their position. It was news to the public, not the SCO.
They don't flat out state it, but from their comments it seems likely they've been in negotiations with SGI on this matter for quite some time. Given that some of this code was from XFS, they'd have to be complete idiots not to know that it likely came from SGI.
sigs are a waste of space
http://www.microsoft.com/presspass/press/1997/Nov9 7/scopr.asp
REDMOND, Wash.-November 24, 1997 - Microsoft Corporation today applauded the decision of the European Commission to close the file and take no further action on a dispute between Microsoft and Santa Cruz Operation (SCO) involving a 1987 contract. The Commission's decision follows progress by Microsoft and SCO to resolve a number of commercial issues related to the contract, and upholds Microsoft's right to receive royalty payments from SCO if software code developed by Microsoft is used in SCO's UNIX products.
this article estimates manure cost at about $40/acre... not sure of the tonnage.