Slashdot Mirror


Can Watermarking Help Find GPL Violations?

bitkid writes "I recently run across techniques that can be used to watermark program code. While I yet have to see some source code for this to play with, the authors claim that the watermarks can be introduced into the source code and can be found in the compiled executable. My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses) should be watermarked? This could help to find GPL violations (think Everybuddy or Linksys) or can be used in court someday against the next SCO to prove authorship. What might be the ramifications of this?"

26 of 265 comments (clear)

  1. Useful, but easy to get around. by The+Head+Sage · · Score: 5, Insightful

    This would be useful to prove that code is under the GPL, but this could be simply gotten around by just looking at the code, then rewriting it yourself. But, of course this will take time and money, something the big business hate to spend.. But the technology is useful.

    --
    To NULL or not to NULL.
    1. Re:Useful, but easy to get around. by floydigus · · Score: 5, Insightful

      Absolutely right.

      Furthermore, you could automate the process by writing a script to do things like randomising white space, replacing variable names, and even rewriting simple flow control constructs.

      I would suggest that if it is deemed important to be able to establish the originator of the code, then the originator should publish it as theirs as soon as it is written, or at least give it to an independent witness for safekeeping.

      --

      All things in moderation; including moderation

    2. Re:Useful, but easy to get around. by kasperd · · Score: 5, Informative

      randomising white space, replacing variable names

      Those are stuff that cannot be seen in the resulting executable, the watermark is claimed to be found even in the resulting executable. (Yes I know in some cases variable names can be visible in the executable, but you can easilly prevent it from being there.) I somehow doubt this watermarking is at all possible. With optimizing compilers it is hard to find resemblance between source and executable. Finally knowing how the watermarks are made on the code, it is probably easy to write another but slightly similar algorithm that will remove the watermark.

      --

      Do you care about the security of your wireless mouse?
  2. Beware the flipside by egg+troll · · Score: 5, Insightful

    I would be very careful with using something like this. Its nice to think that one could use watermarking for protecting GPL'ed code. However, should the technique prove successful, expect to see everything under the sun watermarked by less benevolent entities.

    --

    C - A language that combines the speed of assembly with the ease of use of assembly.
    1. Re:Beware the flipside by Directrix1 · · Score: 3, Insightful

      It doesn't matter what you do to code as far as watermarking goes. If the watermarking method is publicly known than it can be easily changed anyways too look like it was watermarked by someone else. For instance you could watermark your code by having variable length whitespace before your comments or something. But that could easily be changed.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
  3. I think not by Espectr0 · · Score: 3, Insightful

    GPL appears to common sense still found in people, and simply decency.

    If the trademark stuff gets too hectic, then maybe this will be needed, but for now i dont think it's needed

    1. Re:I think not by LuxFX · · Score: 5, Insightful

      If the trademark stuff gets too hectic

      If?

      Can I have directions to your hole, I'd like to live there too.

      --
      Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
  4. The ramifications. by caluml · · Score: 4, Funny
    What might be the ramifications of this?"

    It might cause the sky to fall down on our heads, or the atmosphere to evaporate, killing us all with solar radiation.

  5. Re:Watermark? by Doomrat · · Score: 5, Funny

    we are talking about a bunch of 1s and 0s here. If it can be watermarked, it can be unwatermarked. A simple script will be able to rearrange stuff to disrupt the watermark without affecting the execution of the program.

    Yes, a bit like how it's easy to reconstruct a burned down house from its ashes.

  6. Just an extra step by Moeses · · Score: 3, Interesting

    I think this would only help the most blatent copying. If the watermark code is embedded in the datastructures of the source code either it would be fairly easy to remove or the software would be in such a state that it would be hard to maintain and evolve. The attempt to avoid piracy would have a negative long term effect on the project.

    I can still see this being useful if blatent copying of the software is the biggest problem the project faces, however I'm having trouble envisioning a scenerio where that's the case.

  7. details about watermarking techniques by gripdamage · · Score: 5, Informative

    The paper cited in the first link is from a professor I once had.

    On his website I found his full article, if you want some details about watermarking techniques. It's has a lot more meat than presentation slides.

    1. Re:details about watermarking techniques by Theoria · · Score: 4, Informative

      The original poster made a comment about never getting to play with watermarking code. Along with some informative papers about software watermarking, obfuscation, tamperproofing, etc and uses for such techniques, there is an implementation on the SandMark website.

  8. as usual by snarkh · · Score: 5, Insightful
    The submitter did not bother to look at the atricle (or rather the presentation).

    The main idea is that you embed the watermark into the code and then obfuscate it. The resulting code is unreadable, otherwise watermark would be trivial to remove, which makes it absolutely useless as far as open source is concerned.

    1. Re:as usual by dspeyer · · Score: 4, Informative
      From the GPL (section 3):
      You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
      * a) Accompany it with the complete corresponding machine-readable source code,
      ...
      The source code for a work means the preferred form of the work for making modifications to it. (emphass added)

      So, unless you plan to do maintainance on obfuscated code, this is no good for GPL software. In fact, it's no good for Open Source software of any kind.

      Admitadly, you could use unobfuscated code and refuse to reveal the watermark, but it's kind of tricky to keep things secret in the OSS world.

  9. Re:Watermark? by Naerbnic · · Score: 5, Informative

    Perhaps this is true for static data (as in a bunch of source code), you can insert a watermark into code, which will create a dynamic watermark (i.e. something that depends on the runtime operation of the program). To make a long story short, you cannot easily remove it by rearranging binary code, and it's difficult (i.e. NP-complete for those in the know) to analyze the software to remove. Tack on the fact you can tamperproof the code (i.e. make the behavior of the program depend on the existence of the watermark), and you have a pretty difficult path to walk if you want to remove it.

    More info can be found in this paper, if you're into reading that sort of thing.

    --


    So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
  10. its for java and its binary watermark, not source by Anonymous Coward · · Score: 3, Interesting

    Caveat - I haven't read the paper but from the description is looks like you apply your watermark to the class files after compilation.

    So,
    1) only protects binaries not source ... therefore not applicable in its current form to source code which would be required for any usefullness to GPL.

    2) its for Java which is easier due to the cannonical form (bytecodes) that can be manipulated by the watermarking tool. You could probably do this to protect GPL binaries but with less portability

    IMHO opinion, not usefull for source but sure if you're worried that some of your precompiled binaries are being ripped, then maybe.

    For source, you need to detect common code patterns and use source tools that have been discussed elsewhere on /.

  11. Does it really matter??? by Pedrito · · Score: 5, Insightful

    I wrote a book ages ago about Windows File Formats. Included in the book was some code which was written by a third party. I obtained permission from the code's author to put it in the book, but it was very clearly copyrighted by the author of the code, both in the code, and in the book.

    So Intel is working on a product and they just swipe up the code out of the book, never ask for permission or anything, and use it in a commercial product (VTune). The author of the code, of course, was furious. He approached Intel. They blew him off. He had reverse engineered their code. He could produce an exact replica of the binary with his own code using the MS C compiler.

    He never got anything out of Intel. I suppose he could have hired attorneys, but he wasn't a wealthy guy. He couldn't find attorneys to take it without cash up front. So my question is: How do watermarks help him? I mean the guy could put the binaries side-by-side, and there was no question, it was his code.

    Your code is as protected as the lawyer you can afford...

  12. No, it can't by anthony_dipierro · · Score: 4, Insightful

    Isn't the code itself a watermark? Sure, you can change things here and there, but ultimately the similarities are going to be far to much to be pure coincidence.

    The purpose of digital watermarking seems to be to identify unique instances of the thing being watermarked. So if I have a copy of Britney Spears' album, it's obviously copyrighted by her record company. With watermarking I can get more specific, and see that it was burned from a CD which was sold to Bob Jones. With the GPL this isn't useful. Sure, the code might have been derived from a copy sold to Bob Jones, but he may have legally made a million copies and distributed them around the globe before the GPL was violated, by someone else. You can't control the watermarks, because you can't control the distribution.

  13. Not easy -- story submitter is confused by 0x0d0a · · Score: 5, Interesting

    Look at the techniques. This stuff is designed for use on binary-only software (with the sole exception of the comment embedding, which is easy to strip, and the embedded strings, which are easy to remove/modify).

    The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.

    It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.

    I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).

    A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.

    This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.

    Hmm...this is an interesting problem.

    A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.

    Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.

    1. Re:Not easy -- story submitter is confused by sICE · · Score: 4, Informative

      It's not that fuzzy - i mean you seem to look like you know what all this stuff is about, and no offense is intended here - but, sadly, you underestimate the power of modern cracking and reverse engineering tools you have at your disposal.

      Even with compiler optimizations and processor specific instructions AND EVEN different compilers, you can actually find and detect "similar HLL code" (there's a tool called DATING that can do that - contact me for a copy, it's hard to find - and which the name is a pun to the IDA FLIRT abilities). I dont know for different cpu, but i guess it would be ressources hungry, and i dont know of a tool that can catch those for now. Try anyway to have a look at VMWARE binaries - win32/linux - with it, you'd probably be surprised.

      blah, dunno what i wanted to say next it's late here... ~<:(

  14. Not possible with open source by HoleNdaBitBucket · · Score: 3, Interesting

    Read the presentation. Although complete sentences aren't exactly present, there seems to be the indication that access to the source can provide an attack on the watermarking scheme: well, duh, if it's open source just modify the source to eliminate the watermark.

    But what's the likelihood a lazy company/individual will actually do this before violating the GPL? Probably slim, but more of the world seems to be going GPL anyway; and if the whole world did GPL, why would you need watermarks?

    Point is: if the monopolies of the world insist on using GPL code without releaing the source, they'll expend the effort to remove the watermark.

  15. UK Method by Gonoff · · Score: 3, Informative

    Put a copy in an envelope - printed or CD, whatever you like. Post it to your solicitor and have them put it in their safe unopened.

    Later when Parasitesoft trys to claim you stole it from them, the solicitor can produce this as legally acceptable evidence of its date of existence.

    --
    I'll see your Constitution and raise you a Queen.
    1. Re:UK Method by malthusan · · Score: 3, Informative

      Address it on the backside of the envelope (the side with the flap) and place the postage over the flap once it's sealed. When the post office postmarks it, the stamp will cross the flap onto the envelope. The intact postage and postmark serves to show the envelope hasn't been opened since it was posted.

      I do this with my own writing (that is, I post it to myself) so I have the means to prove creation date should it ever become an issue.

  16. How does this help GPL? by scdeimos · · Score: 5, Informative

    Having read the .PDF paper and then skimmed the /. comments it would seem few people have taken the time to actually read (or understand) the paper before commenting on it. Hats-off to those who have.

    What is the essence of this watermarking technique?:
    - For embedding copyright information into individual .class files, as opposed to signing .cab's for whole Java apps/applets.
    - It modifies compiled Java bytecode, shuffling eight bytecode operators in targeted "dummy" class methods. The shuffling is able to encode only three bits per operation, so watermarks need to be short or dummy methods need to be large.
    - It relies on the watermarked dummy method(s) appearing in stolen (decompiled/recompiled) .class, which is achieved by pretending to call the dummy method(s) from other methods using always-false logic constructs.

    What are its downfalls?:
    - The technique is specific to Java. Forget about using it for other languages which output platform-specific machine code binaries, although it might be possible to modify it for use in .NET and other bytecode environments.
    - If an intelligent thief (or smart optimizing compiler) is able to detect the always-false condition used to shield the dummy method(s) the watermark(s) will be removed.
    - The larger your watermark, the larger you need to make your dummy method(s), or you need to embed more of them. The larger you make your dummy methods, the more obvious it will be that there's something strange about them.
    - Optimizing compilers could still destroy the modified operators used to form the watermarks.

    The paper also claims it protected more .class files from decompile/recompile attacks than *I* feel it should have: five of the ten .class files crashed their test decompiler (Mocha), thereby "protecting" their watermarks. If someone is keen to re-source your .class file, particularly if there's money to be made, I'm fairly certain they'd try another decompiler instead of giving-up on just one crash. I suspect that these five .class files could be decompiled by another utility, so the question of their watermark protection remains unanswered. Potentially this could cause up to 18 (instead of 3) of their 23 watermarks actually being defeated. This is entirely feasible, since only 3 of the 8 watermarks fully tested survived (the other 15 being in the five .class files which crashed Mocha).

    How does this technique benefit GPL? I'm not sure that it would. Even if the above problems were fixed:
    - To submit "source code" for your protected .class, you'd have to compile it, watermark it, decompile it and then post the decompiled version. Not very pretty and what about comments? I suppose you could have a Perl script reinsert comments from the original source, or copy-and-paste the watermarked dummy methods back in.
    - It's really designed to embed personal/corporate copyrights into code, protecting the IP of the submitter not the GPL community. I suppose the GPL community could design a community-wide watermark policy, but then that would become public knowledge and so thieves would be aware of its existence and be inclined to search harder to remove it.

  17. You missed the point of Free Software by Brandybuck · · Score: 4, Insightful

    Do you think free software (GPL or other viral licenses) should be watermarked? This could help to find GPL violations (think Everybuddy or Linksys)

    You missed the point of Free Software. Ignoring some of the antics of zealous fringe, the idea of "Free Software" isn't to be a separate-but-equal analogue to proprietary software. The point of Free Software is freedom, not surveillance. Too many advocates for Free Software say their contributions are free, but act as proprietary masters with their obsession over owning, controlling and regulating the software.

    It saddens me to see people advocating watermarking Free Software. Next they'll want a "FSSA" analogue to the BSA and their brownshirts.

    --
    Don't blame me, I didn't vote for either of them!
  18. ubiquitous GPL code == BAD? by natron8080 · · Score: 4, Interesting

    Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:

    1. Is it a bad thing that their software just got better, faster, and more standards compliant?
    2. Doesn't this even out the playing field, as far as proprietary technology goes? Everyone starts at 0.
    3. The mozilla developers would have real speed/memory/feature competition from M$, as opposed to the "we'll never touch IE code again" stance of M$.
    4. More company coders would be familiar with and able to develop on open source projects in their spare time (or convert even!).
    5. GPL projects aren't really in competition with corporate firms. GPL software doesn't lose profit margins if there's better software out there.

    So aside from ethical issues, why should the GPL community really care?