Tracking Code to Its Origins?
openbear writes "While doing a code review for a closed source project at work I came across a few files that were stolen from an open source project. The individual that did this was dumb enough to leave the original license in one of the files, however he was smart enough to remove all trace of where the code came from. He since quit the organization, so we (the developers) can't get to him to find out where he got this code from. Now management wants us to ship the product as is (with the stolen code intact) because we can't point to the original source of his questionable code. A few of us scoured sourceforge and several apache projects but couldn't find anything matching. My question is: What is the best way to track down where this code originated from. Is there an organization that would help? A tool? A website?"
Couldn't you just rewrite the stolen code? If your program has a main API and such, then couldn't you just rewrite the code to match your API or something like that. Unless the code is the majority of your project, I see no reason why it simply couldn't be rewritten.
-Vic
Find a line or 2 of code that look non-standard.
Run through google groups, etc. If it's from a popular project, Web based cvs is gonna be on it and Google will have sucked up the source.
Other than that, I really don't know.
Rod Taylor
Except it's not as easy as just feeding in the file and saying "find it", partly because google only allows you to feed in a few search terms and partly because it sounds like the files have been modified from their origional form.
Another problem is that it's very likely that the source files will only be stored within tarballs, which google doesn't index (not that I've ever seen at least -- would be a nice feature though seeing as how they do decode office docs and the like). The key will probably be then, to search by the names of source files, unique looking variable names, or phrases from the comments. With luck, some of these things will manifest themselves in some sort of on-line discussion about the source, such as diffs posted to mailing lists or something of that nature.
Another thing to try -- if you know the nature of the origional program that the source was taken from, go to Freshmeat and look though projects of that type and see if you can find a match.
-"Zow"
You'd better speak to your corporate lawyer. If you don't have one, get one. I'd advise bringing a camera... it's gonna be a real Kodak(TM) Moment when he first understands what you're saying.
You didn't mention what license this is. Is it the GPL? If so, that means that you have actually managed to stumble on one of the rare situations where the GPL is actually viral! If you release this code, you will be legally obligated to provide source to any customer, just for the asking!
If it's not one of the 'viral' licenses, then you haven't got a problem anyhow.
This isn't even a copyright law issue per se; the onus is on you/your company to find the source of the code, and get permission to use it, or face the consequences of not doing so. This is a general principle in the law.
The law only rarely lets "I tried as hard as I could!" be an excuse. If you can't get permission, you can't use it, end of (legal) story.
You are asking for it. Hate to say it, but consult a lawyer! Consult a lawyer! Consult a lawyer!
Some on at Micro$oft actually admiting to stealing code? (kidding), but seriously if you could tell us in very rough detail what the code does we might be able to help. You already told us it's a web app (apache sites?) You'll still get the kudos for trying to be a sport about it, without violating your NDA.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
Assuming the code hasn't been too modified, he can try searching for function or variable names.
Another problem is that it's very likely that the source files will only be stored within tarballs,
True but many opensource projects have html front ends to their cvs trees, google sometimes index these. Same for mailing list archives, they'll sometimes contain patches or discussions of the code which include parts of the code.
If your management beleives this, they are just as guilty as the original stealer. Call the police on the original coder and when the shit hits the fan he'll take the blame instead of your company. Either way, get that code out of your program ASAP!
"The individual that did this was dumb enough to leave the original license in one of the files,..."
:)
Did he leave on good terms? Was he angry at anyone when he left?
I just thought of a great way to mess with a company if I'm a coder who doesn't care about references. Insert the GPL into a bunch of my source files that I spent a lot of time on. As long as I was working alone on that code they wouldn't know I didn't swipe it from a GPL project. They may evenspend a bunch of time looking for the original source. They may even post a slashdot story about it.
I supposed you tried calling this guy and asking him.
Are you even sure that the code is OpenSource in the first place? Did the moron who put it there to set the company up before he left? He could do so by 1) adding OpenSource code to your product knowing it's wrong, or 2) simply add the appropriate license to fsck with the company after he left.
This might be a dumb question, but how do you know the code was stolen? Maybe he just decided to stick a license at the top of some code he wrote in order to confuse people. Or maybe he wrote the code himself for a different project, and when asked to write the same thing just copied his work across intact.
There are any number of legal possibilities, and I can't see that they can be simply discarded based on the information provided.
Tarsnap: Online backups for the truly paranoid
Dont worry. I was the one who wrote it. Just deposit $50,000 in my Paypal account and you can do whatever you want with it.
(Blargh, it's 0430 and I made one "little" change after previewing my post. Here it is with the bold tag closed; sorry for the "yelling.")
He since quit the organization, so we (the developers) can't get to him to find out where he got this code from.
Okay, so you've tried to search for the code, and came up empty... Did he die? If not, then I'd suggest you try to search for him! There's not a lot of info in your post, so some of these may not be appropriate -- don't know if he's still in the same city, state, or country, for that matter.
That should be enough to get you started; I'm sure if you brainstorm you can come up with some other sources and/or techniques, too.
Also, you might paste a few lines into a comment on this thread and see if anyone recognizes it.
Several of us spoke with him before he left and got nowhere. He admitted that he didn't write the code and that he "borrowed it from the Internet". That is all he would tell us. He refused to tell us where he "borrowed" it from. He since left the company, so we can't threaten him with disciplinary actions. The main point of going through this search is 1) for ethical reasons and 2) to make sure that we never hire this guy back as a contractor again.
I just searched Yahoo with search terms: +Java +base64 +String, and I saw things that looked very like what you're describing. Some hits had just the 2 methods you describe in your comments. Bear in mind, the ziphead who stole this code in the first place got it through a basic Internet search, so a repeat search has a high probability of success if it's done correctly. A slightly over-broad search that produces a hundred hits can still be winnowed by hand in a practical length of time, and will have better probability of netting the desired target than a vary narrow search.
Best of luck in your efforts.
"My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."
http://java.sun.com/j2se/1.4/docs/api/java/net/URL Encoder.html
The difference between Theory and Practice is greater in Practice than in Theory.
No no no. YOU don't talk to him. YOUR LAWYER explains where providing illegal services is a breach of contract, and how you will be suing for damages, compounded by the damages to your customers.
Never confuse volume with power.
Ok, I thought about it a bit and I think I can post some of the source without violating my NDA. Here are two methods from code that I know is stolen. It is only doing Base 64 encoding and decoding so it is not giving away any company secrets. I removed all comments and package names so it is just the bare code. If anyone can locate the origins please reply to this post. Remember this particular code is dated about two years old. Thanks to all of those who put effort into giving ideas and opinions. I still haven't been able to locate the origins of this code, so if nothing more comes out of this last post then I suppose I will just accept the fact that sometimes sleazy people get away with thievery and walk away without a care. Thanks again.
public class Base64 {
public static String encode(String data) {
int c;
StringBuffer ret = new StringBuffer();
try {
byte[] arr = data.getBytes("iso-8859-1");
int len = arr.length;
for (int i = 0; i < len; ++i) {
c = (arr[i] >> 2) & 0x3f;
ret.append(cvt.charAt(c));
c = (arr[i] << 4) & 0x3f;
if (++i < len)
c |= (arr[i] >> 4) & 0x3f;
ret.append(cvt.charAt(c));
if (i < len) {
c = (arr[i] << 2) & 0x3f;
if (++i < len)
c |= (arr[i] >> 6) & 0x3f;
ret.append(cvt.charAt(c));
} else {
++i;
ret.append((char) fillchar);
}
if (i < len) {
c = arr[i] & 0x3f;
ret.append(cvt.charAt(c));
} else {
ret.append((char) fillchar);
}
}
} catch (Exception e) {}
return(ret.toString());
}
public static String decode(String data) {
int c;
int c1;
StringBuffer ret = new StringBuffer();
byte[] arr = data.getBytes();
int len = arr.length;
for (int i = 0; i < len; ++i) {
c = cvt.indexOf(arr[i]);
++i;
c1 = cvt.indexOf(arr[i]);
c = ((c << 2) | ((c1 >> 4) & 0x3));
ret.append((char) c);
if (++i < len) {
c = arr[i];
if (fillchar == c)
break;
c = cvt.indexOf((char) c);
c1 = ((c1 << 4) & 0xf0) | ((c >> 2) & 0xf);
ret.append((char) c1);
}
if (++i < len) {
c1 = arr[i];
if (fillchar == c1)
break;
c1 = cvt.indexOf((char) c1);
c = ((c << 6) & 0xc0) | c1;
ret.append((char) c);
}
}
return(ret.toString());
}
private static final int fillchar = '=';
private static final String cvt = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ "abcdefghijklmnopqrstuvwxyz"
+ "0123456789+/";
}
Does your company have a proxy of some sort which keeps logs? Is it recent enough that his old computer would still have it in its history and or cache?
SSL Certificate
Your country may be important.
In the UK, breaching copyright law for a commercial gain is a criminal (theft by deception) as well as civil offense and it is the companies Officers (Directors) are who deemed responsible and do the Gaol (jail) time.
http://141.76.120.181/javadoc/acid-javadoc/de/acid / til/Base64.html