Can Watermarking Help Find GPL Violations?
bitkid writes "I recently run across techniques that can be used to
watermark
program code.
While I yet have to see some source code for this to play with, the authors claim that
the watermarks can be introduced into the source code and can be found in the compiled executable.
My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses)
should be watermarked? This could help to find GPL violations (think
Everybuddy or
Linksys) or can
be used in court someday against the next SCO to prove authorship.
What might be the ramifications of this?"
Look at the techniques. This stuff is designed for use on binary-only software (with the sole exception of the comment embedding, which is easy to strip, and the embedded strings, which are easy to remove/modify).
The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.
It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.
I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).
A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.
This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.
Hmm...this is an interesting problem.
A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.
Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.
May we never see th
Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:
So aside from ethical issues, why should the GPL community really care?