New Method To Detect and Prove GPL Violations
qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
I used to be a research assistent, and at university, we used this technique to see if students copied their assignments. They could rename variables, move pieces of text, change comments all the way they liked, but the execution profile stayed the same. We caught a lot of students, and they never figured out how we did it.
lets just set the code free. lets not chase it down the street to make sure it stays free, just let it go as it will.
Did you know? All modern PCs ship with a special Symbolics Lisp co-processor to support the Emacs text editor. Vi users often refer to this $79 chip the "Emacs Tax".
What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.
Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).
File under 'M' for 'Manic ranting'
An identical library call signature for a nontrivial part of the execution could be produced by a clean-room analysis or even independent development of an equivalent component. Neither of these is a GPL violation.
This is not to say that the technique wouldn't be useful for hunting down GPL violations. But a positive is not difinitive by itself.
Meanwhile code obfuscation (even automatically generated obfuscation) could easily modify at least the timing, if not the order, of such calls.
Nevertheless this is a powerful tool: An hunk of GPL code that hasn't had its flow obfuscated systematically (even code that HAS been obfuscated but not systematically) will have large swaths of code that trips the detector. And it doesn't require reverse engineering until after the alarm goes off.
Good job, guys.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
GGA! The GNU Genuine Advantage program!
Karma cannot be described by words alone.
Pitchfork? ... Check ... Check ... Check ... Check ... Check
Torch?
Map of Corporate Castle locations?
FSF Lawyers programmed to be speed dialed in emergencies?
Desire to burn the non-believers?
Okay, I'm ready! What IRC Channel are we meeting in?
load "$",8,1
+10 points for FOSS :-D
Good thing no one asked you to. GPL code is open-source, so "keeping it out of their grubby little hands" is not an option or even wanted. You had probably better come to understand the purpose of the GPL and what a GPL violation is before you post.
Appended to the end of comments you post. 120 chars.
I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.
Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.
That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
-- Mike
When people go to these lengths to prove misuse of commercial licenses, they're called fascists. When it's done to prove misuse of free licenses, it's OK.
I see the community is still working as it always has.
- They didn't copy your code, and the program tells you this.
- They copied your code, and the program detects it.
- They didn't copy your code, but they did implement it in such a similar way that the program thinks they did.
In the first case, you stop checking. In the second and third, you run additional tests and see if you can find more evidence of a common origin.I am TheRaven on Soylent News
This is very cool and potentially useful. By itself, it wouldn't be enough to force compliance or win a violation suit, it could well be enough to meet the threshold for filing a suit and forcing source code analysis in discovery. Really, it is a great tool to have to ensure that open source license terms are respected by removing the "code anonymity" inherent in a binary.
It's the same old problem of protecting software. Big companies like M$ have spent billions of dollars trying to control unauthorized use of software. The problem is the same although we are now protecting source code instead of executable code. Does it mean that we are threading the same path and people (developers) will need to spend so much effort and money to protect their rights?
Instead of coding open source projects, now we're coding projects to detect license violations.
Next, the Open Source Business Software Alliance and raids by the Secret Service...
When is the last time we read anything about open source that wasn't about licensing?
When did it stop being about the code and the value?
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
How well does it work with the Wine versus Windows comparison?
The Tao of math: The numbers you can count are not the real numbers.
In our case it was perhaps a little more understandable. The other student was a friend and we'd been collaborating on a project. We had adopted common naming conventions, etc.
Our code was virtually identical. I know it sounds unlikely but it does happen.
I realise this is going off on a tangent, but I'm concerned about the use of the word theft. Usually I'm one of the first people to jump up and down when I hear the RIAA or MPAA accuse people of stealing, and I've noticed that quite a few other people on Slashdot do the same. I think it's mis-representative of the paper to represent copyright infringement as anything other than exactly what it is, which is copyright infringement.
Language is what it is, and it changes over time, but I'd be really disappointed if this one was let to slip, because rather than the language changing because it's more convenient or better, it's changing because a group of powerful corporations want to confuse the issue for their own control and commercial benefit.
False positives....
The story is presented with a stage light focused on linux but then the house lights come up and show linux in jail along with most of the audience.
This is just one paper for one Automated Software Engineering (ASE) conference.
But if you really want to insure software becomes genuinely free, then the level or automated software development will have to become easy enough for the typical user to apply it. Much like most anyone knows how to use a calculator and uses it as they need.
There is currently some effort being applied in the ASE overall focus that will become unimportant and not used once such a user level is reached, not to mention the changes on hardware that enables users to take their system with them on a key chain. Though there will be servers, the majority of use of such automated software creation at such a level will be at the users level, be they a system developer or a casual end user.
And like a calculator calculations sequence...uh err... finger print.... this finger printing becomes pointless.
as it will be found to be something of a reflection of the underlying knowledge system, and not so unique of the users.
To get to the basics of automation and realize the commonality of it is to then know the GPL advantage is NOT having "its mine and you cannot use, overhead and land mines"
For some automation basics - http://threeseas.net/abstraction_physics.html
Code theft is trivial to detect: just see if your code is missing. Please, can't even Slashdot get the terms consistently correct? This is not about theft at all; it's about a tool that helps find copyright infringement.
??
Linux is still GPL'ed as far as I know. If Linus doesn't want to enforce the GPL (Linux Foundation really) then he (they) could release it under a BSD license.
And I read what he says on GPLv3 as not good enough to switch to or not worth switching to without good reason. He hasn't been super consistent with his statements though.
The technique is quite old and usually used for detecting malware. Their particular implementation is also pretty primitive.
Microsoft Corp. today offered to settle claims of GPL violations in its flagship products, Microsoft Office 2007 and Microsoft Windows XP, by agreeing to pay Free Software Foundation (FSF) a sum of $4.8 billion, and agreeing to discontinue the marketing of said products as closed source until all infringing code in the software is identified and removed to FSF's satisfaction. FSF sources could not be reached for immediate comment. Microsoft co-founder, CEO, and chief software architect, Wlliam Gates, in a press conference late Friday, stated that the episode had been a difficult one for Microsoft, and appealed for patience from shareholders and customers as his company scrambled to respond to the latest news.
The present case is a landmark case arising out of a series of sensational revelations last year that these products included infringing code, and that Microsoft stood in violation of GPL. Present readers will recall that the then CEO of Microsoft, Steven Ballmer initially rubbished the claims. However, that stance became untenable when scores of former Microsoft employees, notably in India and China, came out with startling accusations that their managers actively encouraged wholesale copying of GPL'ed code available in public domain to meet the unreasonable deadlines imposed by senior Microsoft functionaries. It now appears that certain portions of a failed Microsoft operating system (Windows Vista) released four years ago included significant amount of infringing GPL'ed code, and could form the basis of a series of new lawsuits.
Microsoft profits have been in a gradual decline for the past few years since its sponsored format (OOXML) failed to meet widespread acceptance as a document standard. Attempts to reverse engineer OOXML to make it compatible with the industry standard, ODF, have been less than successful. Industry analysts think that the decline and possible demise of Microsoft Office, unthinkable even 3 years ago, will cause only minor disruptions as most enterprise customers are standardized on GPL compatible OpenOffice.org v.4. Commercial solutions exist for migration of legacy documents to OpenDocument format and should be suitable for most customers.
A year ago, William Gates resumed his work at Microsoft as its CEO after Steven Ballmer resigned under acrimonious circumstances, and tried to remake the once vast organization as a services company. However, the company has faced severe competition from established global players like IBM, Wipro, Infosys and TCS and struggled to meet market expectations.
The Justice Department's two year old investigation into questionable marketing practices by Microsoft and allegations of misleading customers is still underway. Justice department spokespersons would not be drawn into commenting on rumors of impending charges under the Rico statute for blackmailing computer system manufacturers over a period of 10 years.
Microsoft stock (MSFT) fell 12% in moderate trading at close.
A couple years ago, a manager outsourced some programming work to India. When I reviewed their work, I was impressed, but the code was inconsistent (quality, indent style, variable names, etc). I figured maybe parts were written by a new programmer. A couple days later, I accidentally discovered that a lot of the code (the part that impressed me) had been copied from a GPL program. I alerted my manager, but he didn't care. I alerted the outsource company, they didn't care. I alerted our legal department, and they seemed to care a lot.
Long story short, the manager got fired and I replaced him. We ended up using the original GPL software with some modifications (which were contributed back).
Balmer is throwing chairs again......
I am the unwilling control for my Origin.
What would a Windoze user like you know about freedom? Why should anyone listen to what you have to say about free software licenses?
What would a zealot like you know about "Windoze"? Why should anyone listen to what you have to say about Microsoft/Apple/your "non-free" enemy du jour?
By summer it was all gone...now shesmovedon. --
I suppose you've never seen the sheer amount of whinging that takes place everytime a story about enforcing the GPL comes up? "Holy Shit! If the GPL is enforced then business will avoid Open Source and MS will 0wnz3r the werld!!!!" I take the reverse tack to the one you take. If it is OK for the likes of MS and Adobe to enforce their licenses then why is it the sheer height of "hippie zealotry" for FOSS coders to enforce theirs?
Proprietary source code tends to be preyed upon by other proprietary interests. That isn't right either but the lawyers will fight it out. There really isn't a community to outrage. What DOES provoke outrage is the sheer amount of patent and trade-secret reachery that goes on. Open Source projects DON'T WANT tainted code. OSS code that can't be freely redistributed legally is mostly useless.
Could the difference between TheRaven64's Hello World in 2 dozen system calls and AC's Hello World in 2 system calls have something to do with use of int main(void) vs. int main(int argc, char *argv[], or perhaps printf() vs. puts() vs. write()? Or is there something deeper?
That's because the major labels in the RIAA, along with their music publisher counterparts, have engaged in anticompetitive behavior. For instance, through payola classic and new payola, the major labels have forced their works on shoppers in grocery stores and forced their works on children riding school buses. With the effect of cryptomnesia case law such as Bright Tunes Music v. Harrisongs Music, this forced listening contaminates the public with potential liability for copyright infringement.
Does the free software community do anything like that?
libvorbis and libtheora are trivial? Please.
http://outcampaign.org/
They should have known this earlies, but now it's too late.
This isn't exactly DRM. Nether has it been adapted by the FSF at all, nor endorsed by any FSF members or important developers that I know of. I'd hardly say that it has been decided that "the GPL needs DRM". This is really little more than someone yelling "hey look what I did!" and a sensational slashdot article toting it as a way to detect "GPL theft".
The world is not ending, what you always predicted is not true. Put down the pitchforks and return to your homes.
Great Intellect...
Ah twitter, I see your problem. You've got an AC trying to crawl up your ass. One dumb enough to think we need a link to your user page.
Seriously though AC, there's a difference between never using FOSS and using Windows for years, developing for it despite the crappy APIs, and supporting other users with it. One gives you *no* experience, the other gives you a great deal. Are you fucking daft?
Besides, the ultimate example of free software vs proprietary was Code Red. There were instructions for OSS (to block the infection) long before there were patches for Windows. The only safe way to use a Windows server then (all the time really) was to put it behind a non-windows machine and filter malformed requests. What more needs to be known about it?
Single-source proprietary crap. Does nobody understand the benefit of multiply-sourced commodity parts?
"It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
Close your Eyes, Plug your Ears and go LA LA LA.
Its DRM. It is just done differently then other DRMs. Because with GPL the freedoms are taken away from the developers you use DRM To insure that the developers are Its still DRM just targeted at a different group. See this as what it is Hypocrisy, work to keep this out from GPL and in the spirit of GPL not just ignore the facts like a mindless GPL Follower as a good thing, but see it as effecting our rights, possibly giving false positives for people who didn't use GPL code.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
HAHAHAH! Oh, the pain. Thanks for the link!
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
Microsoft basically took the BSD network code into Windows.
You claim this has not harmed BSD in any way, because they still have that code.
But this ignores the fact that if Microsoft had to develop their own network code instead of using BSD's code, they would have had less of an advantage in the OS market. This would almost certainly mean more BSD users (perhaps by a slight margin, but there are probably many more pieces of BSD'ish code in Windows), and less Windows users. More BSD users would bring more developers. By closing up a fork of BSD's code, Microsoft gained an unfair competitive advantage (BSD cannot take Microsoft's code) which took away resources from BSD.
As another poster mentioned, Microsoft used the BSD network code to sell more copies of Windows, that funded their work on Windows, and potentially on the BSD network code.
This funding may or may not prove an unfair advantage in their work on the network code derivative. By making a closed derivative to a BSD work far more attractive, it is de-facto "closing the code", as far as users are concerned. This is both because the practical advantages will for many require abandoning the older less-developed open version, and because it may become a de-facto standard or monopoly that forces users to use the code in its closed form.
So while the BSD writers may have had the best of intentions, their software users (Windows users) are not enjoying any of the freedoms that the BSD guys thought they were giving away.
So, if you are creating software with the purpose of reaching maximum popularity, or that the next "hop" (the next developers of it) can do what they want, BSD is for you.
If you want to develop software that is free, GPL is for you.
I used to think the same, but you can check modern dictionaries. The word theft already includes copyright infringement.
The battle was lost. The best way to act is to simply declare that some theft is bad, and some not that bad.
I think its quite a good strategy, e.g: The "Pirate Party". Instead of trying to fight the language changes, embrace them and proudly claim you are a pirate. In this case, I can proudly claim I am a thief, as I do not support copyright law and it is not enforced anyhow.
Don't confuse disagreement with hatred. Hatred is what fills your life, not mine.
Because I don't consider them holy prayers.
I'm really looking forward to your pointing out where I said that. Really.
Otherwise, I suggest you shut the fuck up.