Slashdot Mirror


Claimed Proof That UNIX Code Was Copied Into Linux

walterbyrd writes "SCO's ex-CEO's brother, a lawyer named Kevin McBride, has finally revealed some of the UNIX code that SCO claimed was copied into Linux. Scroll down to the comments where it reads: 'SCO submitted a very material amount of literal copying from UNIX to Linux in the SCO v. IBM case. For example, see the following excerpts from SCO's evidence submission in Dec. 2005 in the SCO v. IBM case:' There are a number of links to PDF files containing UNIX code that SCO claimed was copied into Linux (until they lost the battle by losing ownership of UNIX)." Many of the snippets I looked at are pretty generic. Others, like this one (PDF), would require an extremely liberal view of the term "copy and paste."

14 of 578 comments (clear)

  1. Re:What's so liberal about it? by marcansoft · · Score: 4, Interesting

    Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

    There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

  2. variable names and data structures. by goombah99 · · Score: 4, Interesting

    Comparing a variable named elf_t_arname to one names elf_c_arname is not very convincing. The suffix is generic, the prefix is activity specific, and the middle letter is presumably some datatype indicator.
    Where it gets dicey is when there are structs and every variable in the struct has a somewhat similarly named variable in the other one. This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

      The key question then is if there is some structural reason why the two might share an identical stuct? for example, is there an elf spec that defines a protocol for communication or the way a record on disk is serialized (i.e. packed)? if so then of course these will occur like this. Or perhaps both are derived from a common BSD ancestor so both vary only slightly.

    if the answer is no, there was no reference implementation and no ancestor then I'd say that for examples like 251, Mcbride has some evidence.

    However for most of the ones he cites there is no there, there.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:variable names and data structures. by ipX · · Score: 3, Interesting
  3. Re:More details and downloadable archive by tomhudson · · Score: 4, Interesting

    I've seen cases where me and another person are working on code independently, and when it came time to merge, we had both ended up creating the same variable names, and pretty much the same code.

    About the only difference was in indentation - mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)". Even the comments were pretty much the same.

    In this case, though, some of the code is from BSD - which is perfectly fine.

  4. Re:What's so liberal about it? by Jahava · · Score: 3, Interesting

    Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

    There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

    Commenting further on that, here is a link to the System V Reference Specs, one of which is the ELF Tool Interface Standard Specification. This contains not only several constants, structures, and function names, but suggests function prototypes and programming style.

    Like you said, any author wishing to build an ELF-capable system would almost have to have that exact same code. There are only so many ways to build an enum or struct following the exact TIS specifications, and there is no virtue in paraphrasing C code.

    Much of the rest of the code is libc and POSIX prototypes (and more headers), all of which are covered in the System V ABI specification. Anybody wishing to build a POSIX-compatible system would have to define those prototypes.

    Several of the function implementations with similarities are very basic functions. Most of the similarities are in the constant names (rather than the specific implementation of those simple functions), and the constant names are defined by ... the TIS spec. The remainder is a no-brainer. See, for example, Tab 422. This is a simple accessor method. There are only so many ways to retrieve a value from a structure...

  5. Unix kernel hacker and almost IP lawyer view by harlows_monkeys · · Score: 5, Interesting

    I spent several years as a Unix kernel hacker, working extensively with AT&T source code. I also went to law school and was one bad case of writer's block away from becoming a copyright lawyer. Thus I found those code snippets quite interesting, both from my Unix kernel hacker persepective and my almost-became-a-copyright-laywer perspective.

    My conclusion, from the half dozen or so of his samples that I looked at? They show nothing remotely resembling copyright violation.

    Copyright covers expression, not ideas. What that means when dealing with functional works, such as computer programs, is that things that anyone implementing that functionality will have to do are unlikely to be covered by copyright.

    All of the functions I saw that were allegedly copied were very simple functions. All they did was check arguments to make sure they were legal, return the expected error code if not, or return some very simply value otherwise.

    Even if the corresponding functions in Linux were exact matches to the SCO code, it would probably not be enough to support an inference of copying, because there just aren't a lot of ways to reasonably express such simple functions. And they were not exact matches. One would check for a null pointer by comparing to NULL, one would use if(!p), for instance.

    The header files are more similar, so copying is more believable there. The problem with SCO's case there is that the elements in the header files I looked at are entirely dictated by compatibility requirements. There's no copyrightable expression in them.

    To summarize, SCO's claims appear to fall into two groups. First, things where the implementation is so simple that it is not possible to infer copying from similarity since the similarity is imposed by the nature of the function. Second, things where there may have been copying--of things that aren't protected by copyright.

  6. Re:First post by Cylix · · Score: 3, Interesting

    The pdf linked in the document is a snippet for what looks like a struct for the elf API interface. This specification is open and judging by the code they are using it exactly as intended.

    I'm going to guess the majority of their findings are specifically computer generated. They may have known first hand what the code was or even where it came from. However, if pressed to say how they discovered these violations I'm quite sure they would fall back on "the program made the mistake your honor." This would generate a plausible stance when the foundation began to crumble.

    Going further on a limb I'm also guessing this is why they would never release any of the alleged violations. In days a website similar to groklaw would be up in for everyone to review, identify and mark the source of the "violation." ie, this is a struct for the elf library specification or this is a header of a BSD library. (Remember that BSD ancestry is likely still there in large chunks)

    All of this happening in the court room and they had to know there were big holes in the allegations. Even a cursory glance reveals that some of the crap submitted is just that. This was a court room poker face with a huge bluff that many parties would just settle. I suppose it worked because too many people rolled over and handed out free cash.

    --
    "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
  7. Re:What's so liberal about it? by Dantoo · · Score: 4, Interesting

    Ostriches aren't Australian, they're African. Omelettes can be made from emu eggs and I have tasted one. It really wasn't any different to one prepared from hen's eggs. It looked no different to this observer. Compare an emu egg to a hen's egg and they are quite different in size, colour and even texture internally and externally. The formula (recipe) however was just for a standard omelette that we would all recognise by sight instantly. Interestingly, it tasted like one prepared from hen's eggs as well. Couldn't tell the finished product apart.

    Posix header files also look remarkably similar to this observer. If code is being written to a required formula so that it interacts correctly with other code (a standard) then there should be little surprise that it looks the same.

    Egg analogies make me hungry.
     

  8. Re:More details and downloadable archive by moronoxyd · · Score: 4, Interesting

    The truth is that code was reused from a UNIX derivative, which is now (somewhat disputably) owned by SCO.

    Did I miss a verdict here?
    As far as I know, it is right now only a claim, not yet proven.

    And using the terms "truth" and "SCO" in one sentence... well, it just feels wrong.

  9. Re:More details and downloadable archive by MSG · · Score: 3, Interesting

    OF COURSE he is going to write the same commands he has used a thousand times in the same way

    I'm sure this is one of the reasons it's best to call the system GNU: Linus didn't write any of the "commands". Linus wrote a kernel and GNU ended up adopting it. The GNU project wrote the system "commands".

    Just do the search.

    Trivia: Actually, the people at exbiblio found that there is very little repetition of text in literature. Any four or five word sequence in a common magazine article is likely to appear in very few or no other texts. That fact is foundational to their technology.

  10. Re:More details and downloadable archive by jimicus · · Score: 4, Interesting

    Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

    Actually, that's not true. There is some evidence to suggest you only need a remarkably short string of words to uniquely identify a piece of English prose - it's this kind of thing that cheating-detection algorithms rely on.

    But we're talking about a structured programming language - with far more structure and rules than the English language - and the things that are at issue are by and large implementations of existing standards. The final link in TFS is a comparison of ELF utility header files, FFS. They've got to look fairly similar or they won't be any use for dealing with ELF executables! Even then they're sufficiently different that it would probably have been easier to write from scratch than it would be to execute the "copy/paste/obfuscate" cycle that is being alleged.

  11. Re:More details and downloadable archive by MrHanky · · Score: 3, Interesting

    Is that so? Let's see if we take a phrase from your own comment: "a higher incidence of matching phrases". One hit. Not bothering with linking to them all, but how about "rips it from his predecessors"? One hit. "strings that are duplicated between the books"? One hit. "his programming background came directly from Unix"? One hit. "open any dozen books"? One.

    I have, of course, duplicated them in this comment, meaning there will be two hits very soon. BTW, these are all the strings I searched for, giving your comment a 100% originality rating (admittedly, I didn't search for "I'm not a coder", which I expect would show up several times).

    Duplication of whole sentences in ordinary human language is actually quite uncommon for all but the most trivial declarations and stock phrases ("Just do the search" gives 3 million hits; "Just do the twist" gives 105 000).

  12. Re:First post by jimfrost · · Score: 4, Interesting

    That's true, but in the push to get UNIX into the commercial space the SysV interfaces were released as an open specification. This was actually covered during the trial.

    The fact of the matter was that the Linux folk didn't copy code, something that would have been obvious to any observer following it's development. The idea that there were vast amounts of stolen code was ludicrous if you knew anything at all about the internal structure of the two operating systems.

    There was always the possibility of code that got injected during the large commercial code donations by e.g. IBM or SGI, and in fact the only piece of code that showed actual derivation came from SGI ... But it turned out to be both a very small amount of code and buggy to boot. As soon as people got a look at it they excised it in favor of working, original, code.

    I personally expected it to go more the way of the AT&T veresus BSD case, where it turned out that AT&T had stolen tons of code from BSD, not the other way around. The Linux emulation layer in SCO UNIX seemed a particularly likely candidate. Either that turned out not to be the case or IBM simply didn't push the issue (perhaps because SCO was having so much trouble proving anything in their claims) though.

    SCO's strategy always seemed to me to be a shakedown, scare companies into license agreements. Why they went after one of the deepsest pockets first is beyond me, IBM was very likely to fight given their investment, but it was clear early on that management was not very competent.

    --
    jim frost
    jimf@frostbytes.com
  13. Re:First post by Anonymous Coward · · Score: 5, Interesting

    And the idea that this key book to early '80s PC tech (still worryingly relevant today!) was somehow missing from all the bookshelves reachable by the Compaq BIOS writing department is just silly.

    You don't know what you're talking about. I was there at the time: Compaq had administrative staff remove the BIOS listings from all IBM tech ref manuals before they were given to the engineers. (This was especially easy to do because they came in the form of ring binders.)

    At one point, since I didn't work on writing BIOS code, I was assigned to be the one designated guy who could disassemble the IBM BIOS for a certain model. When the BIOS developers got stumped by a compatibility problem, they could send me a question, and I was allowed to poke around in the IBM ROM and then give a "Magic 8 Ball" type vague answer.

    Here's a bit of trivia: A few PC applications wouldn't work unless the ID string "IBM" appeared at a certain address within the BIOS code. Compaq developers worked out a way to make those bytes at that address appear in part of an actual executable code sequence instead.