Slashdot Mirror


New Method To Detect and Prove GPL Violations

qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."

5 of 218 comments (clear)

  1. A couple of things.... by mark-t · · Score: 3, Interesting

    What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.

    Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).

  2. Other languages by Mike+McTernan · · Score: 4, Interesting

    I looked through the paper, and it is cool stuff. But I couldn't see where it supposed the system would work well for other languages, and I wonder if it really would be so good.

    Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.

    That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?

    --
    -- Mike
  3. Re:new use of old trick by Just+Some+Guy · · Score: 5, Interesting

    How did you know they were cheating and didn't derive their similar approaches from a common origin (presumably material that was presented in class or else from the textbook)?

    Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.

    It sounds improbable, but consider that:

    1. We both directly transcribed variable names from the homework assignment. A sentence like "it is a fatal error condition for the user to specify a negative number of tasks" became "assert(numtasks >= 0);".
    2. We used the same editor and the same indenting style.
    3. We had done much of our homework together in previous classes because we tended to take the same approach to solving problems.
    4. The assignment wasn't terribly complex to begin with, so the resulting code was only a few pages long.

    We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.

    --
    Dewey, what part of this looks like authorities should be involved?
  4. Re:new use of old trick by Just+Some+Guy · · Score: 3, Interesting

    I take it your code was flawless?

    Of course! ;-)

    people who write flawless code can easily prove their innocence by answering a couple of questions about the implementation on the spot.

    I think there was a bit of that, too: (pointing at me) "why did you do this?" "Because of this requirement in the last paragraph." (Pointing at friend) "and why didn't you use this approach?" "That wouldn 't have worked because of this part here."

    --
    Dewey, what part of this looks like authorities should be involved?
  5. Re:No, really by The+Bungi · · Score: 3, Interesting

    You know, I'm absolutely tired of the BSD trolls

    If by that you mean "you have a different definition of what freedom is, therefore I don't like you" then sure, I'm a "BSD troll" or whatever.

    your definition of "freedom" is ludicrous.

    GPL -> Distribution restrictions.
    BSD -> No restrictions.
    No restrictions -> More freedom.
    More freedom -> Possible unsavory side effects that people choose to live with

    Isn't logic great?

    The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free.

    BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.

    it simply distributes freedoms in a different matter

    The Kool-Aid is strong with this one.

    But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.

    BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.