Slashdot Mirror


Tracking Code to Its Origins?

openbear writes "While doing a code review for a closed source project at work I came across a few files that were stolen from an open source project. The individual that did this was dumb enough to leave the original license in one of the files, however he was smart enough to remove all trace of where the code came from. He since quit the organization, so we (the developers) can't get to him to find out where he got this code from. Now management wants us to ship the product as is (with the stolen code intact) because we can't point to the original source of his questionable code. A few of us scoured sourceforge and several apache projects but couldn't find anything matching. My question is: What is the best way to track down where this code originated from. Is there an organization that would help? A tool? A website?"

59 comments

  1. Did I get stupid? by Issue9mm · · Score: 0, Offtopic

    I can't, for the life of me, figure out why this isn't on the front page. I know I'm not here very often, but did slash implement some sort of section only posting?

    Truly baffled,
    -9mm-

    1. Re:Did I get stupid? by kilrogg · · Score: 0, Offtopic

      There are a lot of stories that never make it to the front page. You either need to visit the sections to see them, or you can select "Collapse sections" in your options to see them all on your front page.

  2. Try www.google.com by Mordant · · Score: 0

    and groups.google.com.

    Do I have to tell you -everything-?

    1. Re:Try www.google.com by "Zow" · · Score: 2

      Except it's not as easy as just feeding in the file and saying "find it", partly because google only allows you to feed in a few search terms and partly because it sounds like the files have been modified from their origional form.

      Another problem is that it's very likely that the source files will only be stored within tarballs, which google doesn't index (not that I've ever seen at least -- would be a nice feature though seeing as how they do decode office docs and the like). The key will probably be then, to search by the names of source files, unique looking variable names, or phrases from the comments. With luck, some of these things will manifest themselves in some sort of on-line discussion about the source, such as diffs posted to mailing lists or something of that nature.

      Another thing to try -- if you know the nature of the origional program that the source was taken from, go to Freshmeat and look though projects of that type and see if you can find a match.

      -"Zow"

    2. Re:Try www.google.com by kilrogg · · Score: 2, Informative
      Except it's not as easy as just feeding in the file and saying "find it", partly because google only allows you to feed in a few search terms and partly because it sounds like the files have been modified from their origional form.

      Assuming the code hasn't been too modified, he can try searching for function or variable names.

      Another problem is that it's very likely that the source files will only be stored within tarballs,

      True but many opensource projects have html front ends to their cvs trees, google sometimes index these. Same for mailing list archives, they'll sometimes contain patches or discussions of the code which include parts of the code.

  3. easy really by Anonymous Coward · · Score: 0
  4. what about rewriting the code? by krs-one · · Score: 5, Insightful

    Couldn't you just rewrite the stolen code? If your program has a main API and such, then couldn't you just rewrite the code to match your API or something like that. Unless the code is the majority of your project, I see no reason why it simply couldn't be rewritten.

    -Vic

    1. Re:what about rewriting the code? by danielrose · · Score: 1

      Don't you mean "recode by removing the offending license" ? :P~

      --
      i hate pansy republicans
    2. Re:what about rewriting the code? by openbear · · Score: 3, Informative

      Yes the code could be rewritten, but the project is at the stage where it takes a show-stopping** bug or management approval to modify any code. The next version of this product will NOT have the questionable code in it, but there will still be customers running this version (with the stolen code) for about a year or so.

      ** And by show-stopping bug, I mean broken core functionality or something deemed important by management.

    3. Re:what about rewriting the code? by little_fluffy_clouds · · Score: 3, Informative

      ** And by show-stopping bug, I mean broken core functionality or something deemed important by management.

      I call getting the pants sued off you something "deemed important by management".

      Several of you fucked up - this code got into the project without being checked where and who wrote it. Now rewrite and reintegrate and retest, and remember this lesson.

      --
      What were the skies like when you were young?
    4. Re:what about rewriting the code? by openbear · · Score: 2

      I call getting the pants sued off you something "deemed important by management".

      You are missing the whole point of this article. Management doesn't consider this code stolen unless we can prove where it came from. Once we can point to the origins of this "questionable" code then we can remove it from this release. Otherwise it has to wait until the next release (in which case it is too late).

      Several of you fucked up - this code got into the project without being checked where and who wrote it. Now rewrite and reintegrate and retest, and remember this lesson.

      Again, this is another point of me posting the original article. I want to be able to prove this code is stolen so that we don't hire this guy again as a contractor, and maybe management won't be so blindly trusting to contractors in the future (LOL).

    5. Re:what about rewriting the code? by smithmc · · Score: 1
      Several of you fucked up - this code got into the project without being checked where and who wrote it.

      You have got to be kidding. What are you suggesting - every time a programmer in any shop in the world writes (or, as you might put it, "claims to write") a piece of code, his/her peers or manager should do a Google search on it?

      --
      Downmodding is the refuge of the weak. Don't downmod, make a better argument!
    6. Re:what about rewriting the code? by Thomas+Charron · · Score: 2

      Search the web for it? Look for hints as to its original origin? Usually, most source files end uop getting hit by one or more search engines, and hence, the code ends up there somewhere..

      Perhaops if you posted some snippets?

      --
      -- I'm the root of all that's evil, but you can call me cookie..
    7. Re:what about rewriting the code? by Thomas+Charron · · Score: 2

      No, but the GPL being present is usually a drop dead giveaway.. Last I checked anyway.. 8-P

      --
      -- I'm the root of all that's evil, but you can call me cookie..
  5. Tried Google? by rtaylor · · Score: 5, Informative

    Find a line or 2 of code that look non-standard.

    Run through google groups, etc. If it's from a popular project, Web based cvs is gonna be on it and Google will have sucked up the source.

    Other than that, I really don't know.

    --
    Rod Taylor
    1. Re:Tried Google? by openbear · · Score: 1

      We have tried Google, but the code that is in question had a time stamp (in the one section of comments this guy didn't remember to remove) of about two years ago. I think it was November 2000, I'll have to look at the code when I get back in the office on Monday. I was searching on file name and method signatures, I'll try searching again on "non-standard" looking lines of code. Great idea. Thanks.

    2. Re:Tried Google? by Martin+S. · · Score: 2

      try searching again on "non-standard" looking lines of code.

      Try searching using variable names, if you choose a *number* of the longer ones and search using OR semantics, I would expect some success.

      Also use the meta-search engines like http://www.go2net.com to cover more ground more quickly.

      Have you considered this 'may be' the contractors lib and may not exist as such in the wild.

      What language are we talking about ?

  6. Errr, you still need to try harder... by Jerf · · Score: 5, Interesting

    You'd better speak to your corporate lawyer. If you don't have one, get one. I'd advise bringing a camera... it's gonna be a real Kodak(TM) Moment when he first understands what you're saying.

    You didn't mention what license this is. Is it the GPL? If so, that means that you have actually managed to stumble on one of the rare situations where the GPL is actually viral! If you release this code, you will be legally obligated to provide source to any customer, just for the asking!

    If it's not one of the 'viral' licenses, then you haven't got a problem anyhow.

    This isn't even a copyright law issue per se; the onus is on you/your company to find the source of the code, and get permission to use it, or face the consequences of not doing so. This is a general principle in the law.

    The law only rarely lets "I tried as hard as I could!" be an excuse. If you can't get permission, you can't use it, end of (legal) story.

    You are asking for it. Hate to say it, but consult a lawyer! Consult a lawyer! Consult a lawyer!

    1. Re:Errr, you still need to try harder... by Anonymous Coward · · Score: 1, Insightful
      This isn't even a copyright law issue per se

      You don't have to agree to the terms of the GPL (or many of the other opensource licenses). But if you don't agree, standard copyright applies, and so you are now violating copyright law by re-distributing the source code.

      So his company can probably pick: license violation or copyright violation. Which is worst, I don't know, but copyright law isn't "viral".

      Either way if word gets out which company this is, I hope people copy their programs to every corner of the web and send them into bankruptcy.

    2. Re:Errr, you still need to try harder... by Jerf · · Score: 4, Insightful

      So his company can probably pick: license violation or copyright violation.

      No, there's the two legal options, too: Find the author and obtain permission, possibly with the judicious use of cash, or dike the code out and replace it with something they wrote.

      but copyright law isn't "viral".

      I can derive no meaning from that phrase. My best-guess rebuttal is that yes, if the code was GPL'ed and they release it, then they are legally obligated to release the source to the whole program under the terms of the GPL. They may refuse; they may also go on a murderous rampage, slaughtering all in their path. But not legally.

      (I admit it, I posted this reply just for the last mental image.)

    3. Re:Errr, you still need to try harder... by Anonymous Coward · · Score: 2, Insightful
      I can derive no meaning from that phrase. My best-guess rebuttal is that yes, if the code was GPL'ed and they release it, then they are legally obligated to release the source to the whole program under the terms of the GPL.

      If you steal source code from another proprietary project (say microsoft), once you get caught microsoft doesn't neccessarily own your project. You usually just pay fines and restitution, maybe get jail time, and of course be forced to remove the offending code. Its copyright violation. You don't need to "accept" any terms of any license to steal the code.

      An example of pure copyright violation is the Cadence vs. Avanti settled last year. A few ex-cadence employees took cadence code with them when they left to create Avanti. They payed hundreds of millions in restitution, one guy (Yuh-Zen Liao) even got 1 year jail time. I submitted this story when it happened as it involved source code and would seem to be a good story for the slashdot crowd, but sadly it was rejected. A full recap here. May this story act as a deterrent to anyone thinking of stealing source code.

    4. Re:Errr, you still need to try harder... by linzeal · · Score: 1

      Has anyone been to cadence it is across the street from a fruit stand.

    5. Re:Errr, you still need to try harder... by Anonymous Coward · · Score: 0

      Interesting point, petty the moderators don't see it.

    6. Re:Errr, you still need to try harder... by Jerf · · Score: 2

      Ahhhh, OK. Rock and hard place. I was assuming that you want to stay legal, which is an admittedly strong assumption. Nice background info.

      Still, I think my original post stands reasonably true. If you distribute code containing GPL'ed code, then by the terms of the GPL, you have agreed to the terms of the license and are under the obligations. That you can decide to simply ignore them and basically go renegade really doesn't change my point; ignoring the law is always an option, and I tend not to advise it, except under extreme circumstances ;-)

      On the off chance the original poster is watching, be aware that doing this knowingly will probably make things worse. AFAIK, there's no explicit provisions in the law for intent in this case, but come penalty time, if the opposition can show foreknowlege, the judge will be more inclined towards the higher side of penalties; that's exactly the kind of decision human judges are there for.

  7. This is a first... by infonography · · Score: 2, Informative

    Some on at Micro$oft actually admiting to stealing code? (kidding), but seriously if you could tell us in very rough detail what the code does we might be able to help. You already told us it's a web app (apache sites?) You'll still get the kudos for trying to be a sport about it, without violating your NDA.

    --
    Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
    1. Re:This is a first... by openbear · · Score: 2, Informative

      The code that he forgot to remove the original comments from was doing base64 encoding/decoding. It was Java code (a class named Base64) with only the following two methods:

      public static String encode(String data)
      public static String decode(String data)

      Most implementations of base64 that I have seen use byte arrays instead of Strings. I have tried searching Google using the filename "Base64.java" and the various method signatures, but no luck. The original stolen code is dated (in the comments he forgot to remove) from about two years ago. This is probably why I can't find it on Google or SourceForge.

      I realize that this isn't much to go on, but like you stated, I don't want to violate the NDA and lose my job.

    2. Re:This is a first... by mperham · · Score: 1

      In the time it took you to write all these posts you could have rewritten the code from scratch. I've written Base64 converters in java before and it's no more than 30-40 lines.

      Not only that but because it's a utility method with well-defined pre/post conditions it would be trivial to put a complete junit test suite around it so you can be assured it works.

      good luck.

    3. Re:This is a first... by openbear · · Score: 2, Interesting

      There is more than just that one file. There are about twelve classes that were "borrowed". Besides, like I said in a different post, the project is at the stage where only "show-stopping" bugs and things with management approval get in. At this point my main objectives are to 1) be able to prove this guy stole code so I can convince management to let me replace it, and 2) make sure he is never able to do contract work with our company again.

      Believe me, this whole thing is/has taken way to much of my time. I'm just trying to stay focused on doing the ethical thing.

  8. Management is just as guilty by Anonymous Coward · · Score: 2, Insightful
    Now management wants us to ship the product as is (with the stolen code intact) because we can't point to the original source of his questionable code

    If your management beleives this, they are just as guilty as the original stealer. Call the police on the original coder and when the shit hits the fan he'll take the blame instead of your company. Either way, get that code out of your program ASAP!

  9. What does it do? by mini+me · · Score: 1

    Are you at liberty to tell us what the code does? The Slashdot crowd probably is pretty versed in all the open source software out there. Someone on here could probably help.

  10. Are you sure its stolen? by Gaetano · · Score: 2, Interesting

    "The individual that did this was dumb enough to leave the original license in one of the files,..."

    Did he leave on good terms? Was he angry at anyone when he left?

    I just thought of a great way to mess with a company if I'm a coder who doesn't care about references. Insert the GPL into a bunch of my source files that I spent a lot of time on. As long as I was working alone on that code they wouldn't know I didn't swipe it from a GPL project. They may evenspend a bunch of time looking for the original source. They may even post a slashdot story about it. :)

    I supposed you tried calling this guy and asking him.

  11. A setup? by Anonymous Coward · · Score: 2, Insightful

    Are you even sure that the code is OpenSource in the first place? Did the moron who put it there to set the company up before he left? He could do so by 1) adding OpenSource code to your product knowing it's wrong, or 2) simply add the appropriate license to fsck with the company after he left.

  12. How do you know it was stolen? by cperciva · · Score: 3, Insightful

    This might be a dumb question, but how do you know the code was stolen? Maybe he just decided to stick a license at the top of some code he wrote in order to confuse people. Or maybe he wrote the code himself for a different project, and when asked to write the same thing just copied his work across intact.

    There are any number of legal possibilities, and I can't see that they can be simply discarded based on the information provided.

    1. Re:How do you know it was stolen? by openbear · · Score: 2, Interesting

      We know the code was stolen because he admitted that he didn't write it and "borrowed it from the Internet". He consistently refused to tell us where "from the Internet" that he got it. The whole thing seems way too suspicious for it to be legal.

  13. I wrote it by Bald+Wookie · · Score: 3, Funny

    Dont worry. I was the one who wrote it. Just deposit $50,000 in my Paypal account and you can do whatever you want with it.

  14. Best way to track down who owns some code... by Innomi · · Score: 0, Flamebait

    Post it on slashdot...

  15. Do a different search? by martyb · · Score: 0, Redundant
    He since quit the organization, so we (the developers) can't get to him to find out where he got this code from.

    Okay, so you've tried to search for the code,/b>, and came up empty... Did he die? If not, then I'd suggest you try to search for him! There's not a lot of info in your post, so some of these may not be appropriate -- don't know if he's still in the same city, state, or country, for that matter.


    • Call and/or write him at home (get his phone number and address from HR - Human Resources),
    • Check with the post office for a forwarding address,
    • Search google for his resume (get latest resume from HR; look for name and key words such as the name of your company) use contact info on it,
    • Use one of the on-line "Find Anyone" tools;
    • Hire a Private Investigator (PI),

    That should be enough to get you started; I'm sure if you brainstorm you can come up with some other sources and/or techniques, too.

    1. Re:Do a different search? by martyb · · Score: 2

      (Blargh, it's 0430 and I made one "little" change after previewing my post. Here it is with the bold tag closed; sorry for the "yelling.")

      He since quit the organization, so we (the developers) can't get to him to find out where he got this code from.

      Okay, so you've tried to search for the code, and came up empty... Did he die? If not, then I'd suggest you try to search for him! There's not a lot of info in your post, so some of these may not be appropriate -- don't know if he's still in the same city, state, or country, for that matter.

      • Call and/or write him at home (get his phone number and address from HR - Human Resources),
      • Check with the post office for a forwarding address,
      • Search google for his resume (get latest resume from HR; look for name and key words such as the name of your company) use contact info on it,
      • Use one of the on-line "Find Anyone" tools;
      • Hire a Private Investigator (PI),

      That should be enough to get you started; I'm sure if you brainstorm you can come up with some other sources and/or techniques, too.

    2. Re:Do a different search? by openbear · · Score: 3, Interesting

      Several of us spoke with him before he left and got nowhere. He admitted that he didn't write the code and that he "borrowed it from the Internet". That is all he would tell us. He refused to tell us where he "borrowed" it from. He since left the company, so we can't threaten him with disciplinary actions. The main point of going through this search is 1) for ethical reasons and 2) to make sure that we never hire this guy back as a contractor again.

    3. Re:Do a different search? by Mr+Guy · · Score: 3, Insightful

      No no no. YOU don't talk to him. YOUR LAWYER explains where providing illegal services is a breach of contract, and how you will be suing for damages, compounded by the damages to your customers.

  16. Grep for it! by phr1 · · Score: 4, Insightful
    Get a big compilation source code CD like the Yggdrasil Internet archives, or even a regular Red Hat source cd. Then run a script which unpacks the zip files as needed, and greps for some sample strings from the code.

    Also, you might paste a few lines into a comment on this thread and see if anyone recognizes it.

    1. Re:Grep for it! by Louis_Wu · · Score: 2

      There might be a legal problem with posting some code here. If the code wasn't actually OS/GPL (as a few have postulated), then the poster might have a legal problem with his company, disclosing company property, etc.

  17. Please tell us the company and the product by Anonymous Coward · · Score: 1, Funny

    So that way we can post a slashdot story about STOLEN GPL CODE and get everyone's underwear in a knot.

    1. Re:Please tell us the company and the product by Tablizer · · Score: 1

      (* ... and get everyone's underwear in a knot. *)

      Undergarment surface contact point alteration and helixification algorithms are already heavily patented. I thus suggest another outcome be attempted.

  18. Try a more careful set of search terms?? by OmniGeek · · Score: 2

    I just searched Yahoo with search terms: +Java +base64 +String, and I saw things that looked very like what you're describing. Some hits had just the 2 methods you describe in your comments. Bear in mind, the ziphead who stole this code in the first place got it through a basic Internet search, so a repeat search has a high probability of success if it's done correctly. A slightly over-broad search that produces a hundred hits can still be winnowed by hand in a practical length of time, and will have better probability of netting the desired target than a vary narrow search.

    Best of luck in your efforts.

    --

    "My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."
  19. I'm almost sure you've tried this, but... by X86Daddy · · Score: 1

    ... just in case you haven't, here's what I do sometimes to track down copies of text on the web:

    Find a unique line in the text/code/whatever, and search for it as a string in several search engines. If it's anywhere on the web, this tends to be a success.

    I've used it with source code a time or two, but it's most frequently useful when I hear a song on the radio... I just memorize a line or two, because the DJ invariably fails to name it after it plays. :-)

    1. Re:I'm almost sure you've tried this, but... by NorthDude · · Score: 0

      The problem is just that
      "hit me baby one more time"
      is far more common then
      "public String getNotSoGoodLyricsFromBS(int singerHitNbr)"
      Not ment to flame, was just an unsuccesfull attemps at humor again...
      Just that I also had to search for some "exemple code" back to when I were trying to do an open-source version of a windows systray... Finally, i came up with geoshell and had to look at how this was implemented to understand it (never tell me now that windows is documented, this dawn window which occurs to be the tray... Only one window of this type can receive the tray system events...)

      An exemple is NOT off-topic

      --


      I'd rather be sailing...
  20. link to the code. by gonar · · Score: 3, Informative

    http://java.sun.com/j2se/1.4/docs/api/java/net/URL Encoder.html

    --
    The difference between Theory and Practice is greater in Practice than in Theory.
  21. just release the code by ReidMaynard · · Score: 1

    My cousin....Joey, yeah, Joey and I can 'take care' of any ... say ... unforseen circumstances...?

    Our standard 20% [of all you got] is fine.

    --
    -- www.globaltics.net

    Political discussion for a new world

  22. Here are two methods ... by openbear · · Score: 3, Informative

    Ok, I thought about it a bit and I think I can post some of the source without violating my NDA. Here are two methods from code that I know is stolen. It is only doing Base 64 encoding and decoding so it is not giving away any company secrets. I removed all comments and package names so it is just the bare code. If anyone can locate the origins please reply to this post. Remember this particular code is dated about two years old. Thanks to all of those who put effort into giving ideas and opinions. I still haven't been able to locate the origins of this code, so if nothing more comes out of this last post then I suppose I will just accept the fact that sometimes sleazy people get away with thievery and walk away without a care. Thanks again.

    public class Base64 {
    public static String encode(String data) {
    int c;
    StringBuffer ret = new StringBuffer();
    try {
    byte[] arr = data.getBytes("iso-8859-1");
    int len = arr.length;
    for (int i = 0; i < len; ++i) {
    c = (arr[i] >> 2) & 0x3f;
    ret.append(cvt.charAt(c));
    c = (arr[i] << 4) & 0x3f;
    if (++i < len)
    c |= (arr[i] >> 4) & 0x3f;
    ret.append(cvt.charAt(c));
    if (i < len) {
    c = (arr[i] << 2) & 0x3f;
    if (++i < len)
    c |= (arr[i] >> 6) & 0x3f;
    ret.append(cvt.charAt(c));
    } else {
    ++i;
    ret.append((char) fillchar);
    }
    if (i < len) {
    c = arr[i] & 0x3f;
    ret.append(cvt.charAt(c));
    } else {
    ret.append((char) fillchar);
    }
    }
    } catch (Exception e) {}
    return(ret.toString());
    }
    public static String decode(String data) {
    int c;
    int c1;
    StringBuffer ret = new StringBuffer();
    byte[] arr = data.getBytes();
    int len = arr.length;
    for (int i = 0; i < len; ++i) {
    c = cvt.indexOf(arr[i]);
    ++i;
    c1 = cvt.indexOf(arr[i]);
    c = ((c << 2) | ((c1 >> 4) & 0x3));
    ret.append((char) c);
    if (++i < len) {
    c = arr[i];
    if (fillchar == c)
    break;
    c = cvt.indexOf((char) c);
    c1 = ((c1 << 4) & 0xf0) | ((c >> 2) & 0xf);
    ret.append((char) c1);
    }
    if (++i < len) {
    c1 = arr[i];
    if (fillchar == c1)
    break;
    c1 = cvt.indexOf((char) c1);
    c = ((c << 6) & 0xc0) | c1;
    ret.append((char) c);
    }
    }
    return(ret.toString());
    }
    private static final int fillchar = '=';
    private static final String cvt = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    + "abcdefghijklmnopqrstuvwxyz"
    + "0123456789+/";
    }

  23. web logs by ddent · · Score: 2

    Does your company have a proxy of some sort which keeps logs? Is it recent enough that his old computer would still have it in its history and or cache?

  24. Country by Martin+S. · · Score: 2


    Your country may be important.

    In the UK, breaching copyright law for a commercial gain is a criminal (theft by deception) as well as civil offense and it is the companies Officers (Directors) are who deemed responsible and do the Gaol (jail) time.

  25. Here it is. by IainHere · · Score: 1

    http://141.76.120.181/javadoc/acid-javadoc/de/acid / til/Base64.html And that's it.

    1. Re:Here it is. by IainHere · · Score: 1

      OK, I can't figure out why that went wrong. Here is the end of that URL, corrected:

      /de/acid/util/Base64.html

    2. Re:Here it is. by GregWebb · · Score: 1

      The long word filter has started deleting letters to put in its spaces, rather than just inserting spaces.

      Not bright (especially not on a tech site), and it bit me a few days ago.

      --

      Greg

      (Inside a nuclear plant)
      Aaaarrrggh! Run! The canary has mutated!