New Method To Detect and Prove GPL Violations
qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
I used to be a research assistent, and at university, we used this technique to see if students copied their assignments. They could rename variables, move pieces of text, change comments all the way they liked, but the execution profile stayed the same. We caught a lot of students, and they never figured out how we did it.
If your algorithm works, say, 95% like one in another GPL project, you're in for the legal ride of your life? I could see this maybe suggesting "this code here is a LOT like that code there. Maybe yous should check it out." I mean, after all, how many possible implementations of doing something like, say, displaying a simple pie chart, could there possibly be?
Life is rarely fair. Cherish the moments when there is a right answer.
lets just set the code free. lets not chase it down the street to make sure it stays free, just let it go as it will.
Did you know? All modern PCs ship with a special Symbolics Lisp co-processor to support the Emacs text editor. Vi users often refer to this $79 chip the "Emacs Tax".
What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.
Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).
File under 'M' for 'Manic ranting'
An identical library call signature for a nontrivial part of the execution could be produced by a clean-room analysis or even independent development of an equivalent component. Neither of these is a GPL violation.
This is not to say that the technique wouldn't be useful for hunting down GPL violations. But a positive is not difinitive by itself.
Meanwhile code obfuscation (even automatically generated obfuscation) could easily modify at least the timing, if not the order, of such calls.
Nevertheless this is a powerful tool: An hunk of GPL code that hasn't had its flow obfuscated systematically (even code that HAS been obfuscated but not systematically) will have large swaths of code that trips the detector. And it doesn't require reverse engineering until after the alarm goes off.
Good job, guys.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
GGA! The GNU Genuine Advantage program!
Karma cannot be described by words alone.
Pitchfork? ... Check ... Check ... Check ... Check ... Check
Torch?
Map of Corporate Castle locations?
FSF Lawyers programmed to be speed dialed in emergencies?
Desire to burn the non-believers?
Okay, I'm ready! What IRC Channel are we meeting in?
load "$",8,1
+10 points for FOSS :-D
Good thing no one asked you to. GPL code is open-source, so "keeping it out of their grubby little hands" is not an option or even wanted. You had probably better come to understand the purpose of the GPL and what a GPL violation is before you post.
Appended to the end of comments you post. 120 chars.
I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.
Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.
That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
-- Mike
Emacs isn't a text editor, it's a dated lisp runtime. Viper is its editor.
When people go to these lengths to prove misuse of commercial licenses, they're called fascists. When it's done to prove misuse of free licenses, it's OK.
I see the community is still working as it always has.
This is very cool and potentially useful. By itself, it wouldn't be enough to force compliance or win a violation suit, it could well be enough to meet the threshold for filing a suit and forcing source code analysis in discovery. Really, it is a great tool to have to ensure that open source license terms are respected by removing the "code anonymity" inherent in a binary.
It's the same old problem of protecting software. Big companies like M$ have spent billions of dollars trying to control unauthorized use of software. The problem is the same although we are now protecting source code instead of executable code. Does it mean that we are threading the same path and people (developers) will need to spend so much effort and money to protect their rights?
Instead of coding open source projects, now we're coding projects to detect license violations.
Next, the Open Source Business Software Alliance and raids by the Secret Service...
When is the last time we read anything about open source that wasn't about licensing?
When did it stop being about the code and the value?
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
That's great and everything, but could someone please explain how violating the GPL is "code theft"? I thought we reached a consensus that copyright violation is not stealing. Maybe they are trying to take credit for other people's work, but even that is not theft.
What a sad state of "freedom". I can't wait for the hunter-killers to be released.
How well does it work with the Wine versus Windows comparison?
The Tao of math: The numbers you can count are not the real numbers.
In our case it was perhaps a little more understandable. The other student was a friend and we'd been collaborating on a project. We had adopted common naming conventions, etc.
Our code was virtually identical. I know it sounds unlikely but it does happen.
I realise this is going off on a tangent, but I'm concerned about the use of the word theft. Usually I'm one of the first people to jump up and down when I hear the RIAA or MPAA accuse people of stealing, and I've noticed that quite a few other people on Slashdot do the same. I think it's mis-representative of the paper to represent copyright infringement as anything other than exactly what it is, which is copyright infringement.
Language is what it is, and it changes over time, but I'd be really disappointed if this one was let to slip, because rather than the language changing because it's more convenient or better, it's changing because a group of powerful corporations want to confuse the issue for their own control and commercial benefit.
False positives....
The story is presented with a stage light focused on linux but then the house lights come up and show linux in jail along with most of the audience.
This is just one paper for one Automated Software Engineering (ASE) conference.
But if you really want to insure software becomes genuinely free, then the level or automated software development will have to become easy enough for the typical user to apply it. Much like most anyone knows how to use a calculator and uses it as they need.
There is currently some effort being applied in the ASE overall focus that will become unimportant and not used once such a user level is reached, not to mention the changes on hardware that enables users to take their system with them on a key chain. Though there will be servers, the majority of use of such automated software creation at such a level will be at the users level, be they a system developer or a casual end user.
And like a calculator calculations sequence...uh err... finger print.... this finger printing becomes pointless.
as it will be found to be something of a reflection of the underlying knowledge system, and not so unique of the users.
To get to the basics of automation and realize the commonality of it is to then know the GPL advantage is NOT having "its mine and you cannot use, overhead and land mines"
For some automation basics - http://threeseas.net/abstraction_physics.html
I used to be a teaching assistant at a university in Canada. The student body in most Comp. Sci. programs in most Canadian universities is quite diverse. There are students from all over the world, from all sorts of different cultures. Different cultures have different attitudes towards cheating on school work. I found this out first hand, when I TA'ed a first-year C++ course a number of years ago.
Out of a class of 150 students, we ran into about 33 cases of cheating on the first assignment. Due to the relatively simple nature of the programs at hand, comparing execution profiles or anything of that sort wouldn't have been feasible. Many of them would have been the same, even if developed completely independently. The cheating we did see was quite blatant. We're talking about three students handing in exactly the same code. Sometimes the original creator's name would accidentally get left in a comment somewhere!
But myself and the two other TAs for that course noticed a trend: out of those 33 cases of cheating, 30 involved students from India. I remember the exact numbers just because they were so stunning. One of the other TAs, who I knew from my undergrad days, was born and raised in BC. But his parents were from India, and he was proud of his Indian heritage. You wouldn't believe how disgusted and embarrassed he was with those students.
He talked to some relatives he knew about schooling in India. He was told that copying work from other students, even those in the same class, usually isn't considered inappropriate, even when the students are instructed to work individually. Of course, that's not how it works in North America. If you were a cheater, and you got caught, you got punished. The other TA, with the Indian heritage, saw to that.
About 75% of the students from India who got caught didn't like this policy of being held accountable. They caused a real ruckus by complaining to the administration. The other TA wasted many hours in meetings dealing with these complaints, rather than working on his thesis or performing research. But he prevailed in each and every case. The cheating was just that obvious.
So maybe a better indicator of whether cheating took place involves looking at the cultural background of the student in question. Those from places that don't take cheating seriously may, not surprisingly, be more inclined to cheat. Including this criteria into such cheat-detection programs may be quite worthwhile, based on the situation I witnessed.
I am with Linus on this one. For the life of me I can't understand what this sucking up to RMS is about. Linus himself does not think GPLv3 is a good thing. So why do people keep adopting it.
Without Linus FOSS is tossed. Not following Linus is dangerous for the survival of FOSS.
"Making the code freer than the GPL lets eg. Microsoft's embrace, extend, extinguish a whole lot easier."
So bits can be locked up. Boy does that destroy a lot of anti-copyright arguments.
"Now they just have to copy/paste and slightly modify the code, compile it, and pass it off as theirs."
Information wants to be free...of consequences.
"And when has anyone ever had any problem with people going to lengths (whatever that means) to prove license violations?"
Oh Lord! Two bits of proof right under your nose and you still miss it.
Code theft is trivial to detect: just see if your code is missing. Please, can't even Slashdot get the terms consistently correct? This is not about theft at all; it's about a tool that helps find copyright infringement.
The technique is quite old and usually used for detecting malware. Their particular implementation is also pretty primitive.
As you can imagine I really don't like the GPL or the FSF or Richard Stallman or any of his friends too much. While I recognize their contributions I think that they've fallen into the trap of trying to force everyone to convert to what has become a quasi-religion [it goes on without gettin better]
I don't have to imagine your hatred because you constantly display it. What would a Windoze user like you know about freedom? Why should anyone listen to what you have to say about free software licenses?
Friends don't help friends install M$ junk.
WWW.ITKONG.COM Dear IT Jobseeker, Freelancer, Coder, Employer, Networking and hobbyist; ITkong.com offers IT professionals of all scopes the opportunity to find jobs by location, skill, job title and company. Directly from our large database Updated every 24 hours! Find the best IT jobs, freelance opportunities and social network in the industry! Are you an experienced IT professional? ITkong.com has hundreds of IT jobs for you, posted by the top employers and recruiters. We are a precisely targeted and superior technology job board, offering unsurpassed resources to get you access directly to employers and recruiters searching for you. On ITkong.com you can access, search, and apply for that better opportunity. Thanks to our exclusive focus on Information Technology, we can give you access to one of the better collection of IT jobs on the Internet. The top hiring companies use ITkong.com to reach qualified individuals and you can also search our database for IT jobs to get exactly what you want - search by keyword, location, skills, travel, telecommuting options, and more. Show you skills and level of expertise to the world! Take your search for IT jobs or freelance opportunities to the next level by creating your ITkong.com Developer Profile. Remember, some of the best IT opportunities jobs aren't posted, so be sure to make your profile and skills searchable by ITkong.com hiring companies, recruiters and other IT professionals in need of immediate solutions. Test it yourself, its clean, lean and simple! Test it for yourself, no registration fees, just a short login, and start searching for the unique IT professional or position you require for the job. We wish you luck in your search. Please feel free to email us your questions, remarks or suggestions on any matter. Sincerely, Team ITkong www.itkong.com
Microsoft Corp. today offered to settle claims of GPL violations in its flagship products, Microsoft Office 2007 and Microsoft Windows XP, by agreeing to pay Free Software Foundation (FSF) a sum of $4.8 billion, and agreeing to discontinue the marketing of said products as closed source until all infringing code in the software is identified and removed to FSF's satisfaction. FSF sources could not be reached for immediate comment. Microsoft co-founder, CEO, and chief software architect, Wlliam Gates, in a press conference late Friday, stated that the episode had been a difficult one for Microsoft, and appealed for patience from shareholders and customers as his company scrambled to respond to the latest news.
The present case is a landmark case arising out of a series of sensational revelations last year that these products included infringing code, and that Microsoft stood in violation of GPL. Present readers will recall that the then CEO of Microsoft, Steven Ballmer initially rubbished the claims. However, that stance became untenable when scores of former Microsoft employees, notably in India and China, came out with startling accusations that their managers actively encouraged wholesale copying of GPL'ed code available in public domain to meet the unreasonable deadlines imposed by senior Microsoft functionaries. It now appears that certain portions of a failed Microsoft operating system (Windows Vista) released four years ago included significant amount of infringing GPL'ed code, and could form the basis of a series of new lawsuits.
Microsoft profits have been in a gradual decline for the past few years since its sponsored format (OOXML) failed to meet widespread acceptance as a document standard. Attempts to reverse engineer OOXML to make it compatible with the industry standard, ODF, have been less than successful. Industry analysts think that the decline and possible demise of Microsoft Office, unthinkable even 3 years ago, will cause only minor disruptions as most enterprise customers are standardized on GPL compatible OpenOffice.org v.4. Commercial solutions exist for migration of legacy documents to OpenDocument format and should be suitable for most customers.
A year ago, William Gates resumed his work at Microsoft as its CEO after Steven Ballmer resigned under acrimonious circumstances, and tried to remake the once vast organization as a services company. However, the company has faced severe competition from established global players like IBM, Wipro, Infosys and TCS and struggled to meet market expectations.
The Justice Department's two year old investigation into questionable marketing practices by Microsoft and allegations of misleading customers is still underway. Justice department spokespersons would not be drawn into commenting on rumors of impending charges under the Rico statute for blackmailing computer system manufacturers over a period of 10 years.
Microsoft stock (MSFT) fell 12% in moderate trading at close.
Balmer is throwing chairs again......
I am the unwilling control for my Origin.
I suppose you've never seen the sheer amount of whinging that takes place everytime a story about enforcing the GPL comes up? "Holy Shit! If the GPL is enforced then business will avoid Open Source and MS will 0wnz3r the werld!!!!" I take the reverse tack to the one you take. If it is OK for the likes of MS and Adobe to enforce their licenses then why is it the sheer height of "hippie zealotry" for FOSS coders to enforce theirs?
Proprietary source code tends to be preyed upon by other proprietary interests. That isn't right either but the lawyers will fight it out. There really isn't a community to outrage. What DOES provoke outrage is the sheer amount of patent and trade-secret reachery that goes on. Open Source projects DON'T WANT tainted code. OSS code that can't be freely redistributed legally is mostly useless.
Yes, it's called public domain. It is simpler that is true. But it's full of as much crap as gpl but no nonsense attached.
Could the difference between TheRaven64's Hello World in 2 dozen system calls and AC's Hello World in 2 system calls have something to do with use of int main(void) vs. int main(int argc, char *argv[], or perhaps printf() vs. puts() vs. write()? Or is there something deeper?
That's because the major labels in the RIAA, along with their music publisher counterparts, have engaged in anticompetitive behavior. For instance, through payola classic and new payola, the major labels have forced their works on shoppers in grocery stores and forced their works on children riding school buses. With the effect of cryptomnesia case law such as Bright Tunes Music v. Harrisongs Music, this forced listening contaminates the public with potential liability for copyright infringement.
Does the free software community do anything like that?
So basically only for trivial uses? Thanks for nothing, Stallman.
libvorbis and libtheora are trivial? Please.
http://outcampaign.org/
They should have known this earlies, but now it's too late.
This isn't exactly DRM. Nether has it been adapted by the FSF at all, nor endorsed by any FSF members or important developers that I know of. I'd hardly say that it has been decided that "the GPL needs DRM". This is really little more than someone yelling "hey look what I did!" and a sensational slashdot article toting it as a way to detect "GPL theft".
The world is not ending, what you always predicted is not true. Put down the pitchforks and return to your homes.
Great Intellect...
public static void main(String[] args)
{
System.out.println("Hello World!");
}
"Language is what it is, and it changes over time, but I'd be really disappointed if this one was let to slip, because rather than the language changing because it's more convenient or better, it's changing because a group of powerful corporations want to confuse the issue for their own control and commercial benefit."
Uh huh. Like those individuals running on the "free the entertainment" platform would never stoop so low as to manipulate the English language for their own ends.
Close your Eyes, Plug your Ears and go LA LA LA.
Its DRM. It is just done differently then other DRMs. Because with GPL the freedoms are taken away from the developers you use DRM To insure that the developers are Its still DRM just targeted at a different group. See this as what it is Hypocrisy, work to keep this out from GPL and in the spirit of GPL not just ignore the facts like a mindless GPL Follower as a good thing, but see it as effecting our rights, possibly giving false positives for people who didn't use GPL code.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Microsoft basically took the BSD network code into Windows.
You claim this has not harmed BSD in any way, because they still have that code.
But this ignores the fact that if Microsoft had to develop their own network code instead of using BSD's code, they would have had less of an advantage in the OS market. This would almost certainly mean more BSD users (perhaps by a slight margin, but there are probably many more pieces of BSD'ish code in Windows), and less Windows users. More BSD users would bring more developers. By closing up a fork of BSD's code, Microsoft gained an unfair competitive advantage (BSD cannot take Microsoft's code) which took away resources from BSD.
As another poster mentioned, Microsoft used the BSD network code to sell more copies of Windows, that funded their work on Windows, and potentially on the BSD network code.
This funding may or may not prove an unfair advantage in their work on the network code derivative. By making a closed derivative to a BSD work far more attractive, it is de-facto "closing the code", as far as users are concerned. This is both because the practical advantages will for many require abandoning the older less-developed open version, and because it may become a de-facto standard or monopoly that forces users to use the code in its closed form.
So while the BSD writers may have had the best of intentions, their software users (Windows users) are not enjoying any of the freedoms that the BSD guys thought they were giving away.
So, if you are creating software with the purpose of reaching maximum popularity, or that the next "hop" (the next developers of it) can do what they want, BSD is for you.
If you want to develop software that is free, GPL is for you.
I used to think the same, but you can check modern dictionaries. The word theft already includes copyright infringement.
The battle was lost. The best way to act is to simply declare that some theft is bad, and some not that bad.
I think its quite a good strategy, e.g: The "Pirate Party". Instead of trying to fight the language changes, embrace them and proudly claim you are a pirate. In this case, I can proudly claim I am a thief, as I do not support copyright law and it is not enforced anyhow.
rm -f COPYING ; find . -type f | xargs touch
I just violated GPL and they detect me! Why?
Well said. Thanks for posting that as well as your previous post.
I too am tired of the flawed logic of the BSD crowd. BSD has its uses. Expanding freedom, by taking away my rights to use my own code, isn't one of them.