Slashdot Mirror


Can Watermarking Help Find GPL Violations?

bitkid writes "I recently run across techniques that can be used to watermark program code. While I yet have to see some source code for this to play with, the authors claim that the watermarks can be introduced into the source code and can be found in the compiled executable. My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses) should be watermarked? This could help to find GPL violations (think Everybuddy or Linksys) or can be used in court someday against the next SCO to prove authorship. What might be the ramifications of this?"

2 of 265 comments (clear)

  1. Not easy -- story submitter is confused by 0x0d0a · · Score: 5, Interesting

    Look at the techniques. This stuff is designed for use on binary-only software (with the sole exception of the comment embedding, which is easy to strip, and the embedded strings, which are easy to remove/modify).

    The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.

    It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.

    I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).

    A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.

    This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.

    Hmm...this is an interesting problem.

    A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.

    Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.

  2. ubiquitous GPL code == BAD? by natron8080 · · Score: 4, Interesting

    Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:

    1. Is it a bad thing that their software just got better, faster, and more standards compliant?
    2. Doesn't this even out the playing field, as far as proprietary technology goes? Everyone starts at 0.
    3. The mozilla developers would have real speed/memory/feature competition from M$, as opposed to the "we'll never touch IE code again" stance of M$.
    4. More company coders would be familiar with and able to develop on open source projects in their spare time (or convert even!).
    5. GPL projects aren't really in competition with corporate firms. GPL software doesn't lose profit margins if there's better software out there.

    So aside from ethical issues, why should the GPL community really care?