More on SCO Code Snippets
anoopsinha writes "A story in linuxworld reports that SCO itself has no idea what the history of a particular snippet of code might be - even a high profile snippet like the one SCO highlighted at SCO Forum. Having no idea if its claims have merit has not stopped SCO so far, so we can expect more from SCO along the lines of big claims with no merit."
First off, you fail to explain or indicate why you think Linus Tourvalds is being hypocritical.
SCO is giving the open source community a look at the problems in the code
This is a flat-out lie.
the time has come to actually step up and figure out what's going on.
"The open source community" has TRIED, time and time again, to figure out what is going on. SCO will not *tell* anyone. The Linux developers community WANTS to have the copyright issues resolved. However, they cannot read SCO's mind! There is a clear and documented method of dealing with copyright infringements in the linux kernel; the time and source of all contributions is logged, and if at any point someone identifies infringing code it can be noted as such and removed. HOWEVER: The linux community cannot remove SCO's code unless they know what it is!!! SCO ardently refuses to give any indication what this mystery code is.
The fact they seem intent on preventing the linux developers from gaining the information of what to remove to fix the infringement has led some to believe this code does not exist.
the people that leech off the hard work of others
Who do you refer to here?
Just in case someone doesn't understand this and wants to know more about what "shorting" a stock means:
http://www.fool.com/FoolFAQ/FoolFAQ0033.htm
Arrg!
No, BLAST won't work. ESR's SHRED won't work. These are, at heart, text matching algorithms, which are easily defeated and of little relevance. Let me explain.
Any simple code obfuscation techniques (changing variable names, adding/removing comments, inserting newlines, changing for loops to while loops, etc.) will totally defeat SHRED and will likely give BLAST a hard time, if not break it entirely.
Why? SHRED searches for lines with identical MD5 sums. If every/most line of purloined code has been changed, even slightly, SHRED fails. BLAST works by finding "seed" regions of identity and then growing those regions out to "near matches." Unfortunately, the idea of a "near match" is a lot more clear cut in DNA/protein than in code, and the initial seeding breaks if the code has been obfuscated at all.
SCO would (wisely) never accept a negative SHRED or BLAST result as proof for just these reasons.
What is necessary is a comparison of the code structure, NOT the simple text of the code. Stanford's, for example (and many other) CS department detects cheating by chewing through source files and turning them into an intermediate representation (think: parse tree) which describe directly the STRUCTURE and FUNCTION of a bit of code in a way that is completely divorced from the text of that code. To find out if people cheated, they compare the parse trees from their code -- not the text of the code.
In this way, they can easily detect (with a surprisingly low false positive rate) when two pieces of textually different code actually stemmed from the same source (but one was then obfuscated to cover up the cheating.)
This is the way to compare code fragments. Not borrowing text-matching (or near-matching) from unrelated disciplines.