Slashdot Mirror


New Method To Detect and Prove GPL Violations

qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."

8 of 218 comments (clear)

  1. new use of old trick by toolslive · · Score: 5, Informative

    I used to be a research assistent, and at university, we used this technique to see if students copied their assignments. They could rename variables, move pieces of text, change comments all the way they liked, but the execution profile stayed the same. We caught a lot of students, and they never figured out how we did it.

    1. Re:new use of old trick by mark-t · · Score: 4, Insightful

      How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)? My experience with marking for a computer science professor showed that about 80% of the students approached any given programming assignment almost exactly the same way in terms of their final implementation... their common origin being something the teacher described during a lecture.

    2. Re:new use of old trick by Just+Some+Guy · · Score: 5, Interesting

      How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)?

      Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.

      It sounds improbable, but consider that:

      1. We both directly transcribed variable names from the homework assignment. A sentence like "it is a fatal error condition for the user to specify a negative number of tasks" became "assert(numtasks >= 0);".
      2. We used the same editor and the same indenting style.
      3. We had done much of our homework together in previous classes because we tended to take the same approach to solving problems.
      4. The assignment wasn't terribly complex to begin with, so the resulting code was only a few pages long.

      We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.

      --
      Dewey, what part of this looks like authorities should be involved?
  2. Coming soon... by koh · · Score: 5, Funny

    GGA! The GNU Genuine Advantage program!

    --
    Karma cannot be described by words alone.
  3. Sweet Mother of All Revolutions by fishthegeek · · Score: 4, Funny

    Pitchfork? ... Check
    Torch? ... Check
    Map of Corporate Castle locations? ... Check
    FSF Lawyers programmed to be speed dialed in emergencies? ... Check
    Desire to burn the non-believers? ... Check

    Okay, I'm ready! What IRC Channel are we meeting in?

    --
    load "$",8,1
  4. Re:No, really by The+Bungi · · Score: 4, Insightful
    That won't do. The GPL is really more of a social instrument than a software license, so for people like Stallman a BSD-style license (which is just one step above public domain and true freedom) would be unacceptable. A lot of bandwidth and keyboard lubricant has been spent over the years to ensure that everyone thinks the GPL is the "best" software license - and the thousands of developers that buy into the FSF "freedom, with caveats" spiel by using the GPL (because well, that's what everyone uses) without really understanding what it's for are part of that problem.

    As you can imagine I really don't like the GPL or the FSF or Richard Stallman or any of his friends too much. While I recognize their contributions I think that they've fallen into the trap of trying to force everyone to convert to what has become a quasi-religion where the Inquisition is more important than celebrating mass.

  5. Other languages by Mike+McTernan · · Score: 4, Interesting

    I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.

    Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.

    That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?

    --
    -- Mike
  6. Re:No, really by Daishiman · · Score: 4, Insightful

    You know, I'm absolutely tired of the BSD trolls that claim that the BSD license is "freer", not because I have a beef with the BSD, simply because your definition of "freedom" is ludicrous.

    There are no absolute freedoms. Freedom to infringe on other's rights or freedoms gives more freedom to yourself, but limits it to other members of society. So long as there are things that cannot be owned or achieved communaly without side effects to others, freedoms have a limit, that is the actions that you cannot do so that others can do them.

    The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free. Yes, it a restriction to the developer who would wish to close up his source and use a GPLed piece of code, but it is an additional freedom to all the users who now have access to this source, which would have otherwise been denied.

    Analogy time: the King is free to treat his peasants as dogs if he wished and if he has sufficient power to repress any opinions the peasants would have about that. The peasants, however, are limited by the freedoms the king has. Therefore the balance of freedoms for a more equal society would be that the king's freedoms be limited in order to allow the peasants to live their life.

    So as you said, the GPL is also a social instrument, but it is no less free than the BSD; it simply distributes freedoms in a different matter. If you have a problem with that, use whichever license you wish to use. But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.