Can Watermarking Help Find GPL Violations?
bitkid writes "I recently run across techniques that can be used to
watermark
program code.
While I yet have to see some source code for this to play with, the authors claim that
the watermarks can be introduced into the source code and can be found in the compiled executable.
My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses)
should be watermarked? This could help to find GPL violations (think
Everybuddy or
Linksys) or can
be used in court someday against the next SCO to prove authorship.
What might be the ramifications of this?"
I think this would only help the most blatent copying. If the watermark code is embedded in the datastructures of the source code either it would be fairly easy to remove or the software would be in such a state that it would be hard to maintain and evolve. The attempt to avoid piracy would have a negative long term effect on the project.
I can still see this being useful if blatent copying of the software is the biggest problem the project faces, however I'm having trouble envisioning a scenerio where that's the case.
Caveat - I haven't read the paper but from the description is looks like you apply your watermark to the class files after compilation.
... therefore not applicable in its current form to source code which would be required for any usefullness to GPL.
/.
So,
1) only protects binaries not source
2) its for Java which is easier due to the cannonical form (bytecodes) that can be manipulated by the watermarking tool. You could probably do this to protect GPL binaries but with less portability
IMHO opinion, not usefull for source but sure if you're worried that some of your precompiled binaries are being ripped, then maybe.
For source, you need to detect common code patterns and use source tools that have been discussed elsewhere on
Look at the techniques. This stuff is designed for use on binary-only software (with the sole exception of the comment embedding, which is easy to strip, and the embedded strings, which are easy to remove/modify).
The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.
It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.
I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).
A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.
This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.
Hmm...this is an interesting problem.
A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.
Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.
May we never see th
Read the presentation. Although complete sentences aren't exactly present, there seems to be the indication that access to the source can provide an attack on the watermarking scheme: well, duh, if it's open source just modify the source to eliminate the watermark.
But what's the likelihood a lazy company/individual will actually do this before violating the GPL? Probably slim, but more of the world seems to be going GPL anyway; and if the whole world did GPL, why would you need watermarks?
Point is: if the monopolies of the world insist on using GPL code without releaing the source, they'll expend the effort to remove the watermark.
Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:
So aside from ethical issues, why should the GPL community really care?