Outsourced Confidential Data On Children Posted
Kataire writes "MSNBC exposes a grievous blunder in which an outsourced programmer posts highly confidential data to a public website, concerning the daily whereabouts of hundreds of children in upstate New York. Yes, this person did this not once, or twice, but three times, with two different data sets. Even worse, the data was out there, publicly 'visible' for months. Just because RentACoder finally discovered and yanked it, after a coder 'stuck with a tricky formatting issue' posted the specific database he was working on to their messageboards, doesn't mean the damage is undone. The ramifications reach beyond the painfully obvious privacy issues, touching on outsourcing and peer ethics."
Very creative, however, if you had read the whole article, you would have realized that the chain of contractors - the university that received the original contract, the programmer they subcontracted, and the programmer that the subcontractor contracted, were all US citizens and/or organizations.
Just because a programmer is located in the US does not make him or her infallible and capable of doing perfect work.
Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
Who the hell thought to give him REAL information about these children in the first place? A fake datase would've worked just as well for development purposes.
You can't judge a book by the way it wears its hair.
And what will they do with what they know? They claim to be able to pinpoint every move you made from college to getting tossed out your duplex etc.,
MoFscker
Rather than mod you down, I'll just let you (and all the other knee-jerks) know that THIS WAS NOT AN INDIAN PROGRAMMER. This was a guy named Mark Dennis. Not a very Indian sounding name. Also, Mark Dennis actually subcontracted the job involving the database out to someone in New Jersey. Maybe IHBT, but the article summary could make you believe this had to do with offshore outsourcing, so that's a misconception we should clear up early.
It should be illegal to say that freedom of speech should be limited.
However, in this case, all the outsourcing was within US borders, as is evident from the contents of the article.
Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
:%s/[A-Za-z]/X/g :%s/[0-8]/9/g
Simple. Just obfuscate it, and you can pass it around for people to help with formatting issues all you want. I've done that with payroll data plenty of times.
Just two lines or vi commands could have saved this guy so much trouble....
their logs only say how many people downloaded the file. not how many people actually unzipped it.
Ok, take off the tinfoil hat and realize that NONE of this took places outside of the US. They DID hire a US contractor (actually a university) with hired a US subcontractor, who hired a US subcontractor. The guys lived in Nowhere, NY and Nowere, NJ!
And was it India or Pakistan? And was the "Indian" really a Pakistani woman named Lubna Baloch? And was the problem really because UCSF required such little control over the custody of the medical records that it allowed them to be handed of to a chain of at least four levels of subcontractors before they ended up in Pakistan?
Oh, and was it really Sonya Newburn who paid off Baloch?
It's not so super-secret as you think. And the real issue (in your hospital's case) isn't that you couldn't bring the weight of American law enforcement to bear, it's that your organization completely lost control of the data that was entrusted to it.
Incidentally, UCSF has revised its contracts to require its transcriptions firms to reveal who they subcontract with.
P.S., if you click on the little "Post Anonymously" checkbox, your
In other words, this guy could not only have given a black-eye to the county, but he could even go to jail for it.
If the information lost can be linked to a crime against one of the kids (no matter what age), he better have a good attorney. Gross Negligence and Reckless Endangerment come to mind.
Who do you trust? And who do you get to solve something like this?
In this particular case, you needn't trust anyone.
Nothing that Mark Dennis wanted to do -- build the database structure, build the front-end, or get help with his "tricky formatting problem" required that he use supply real data to RentaACoders or other sub contractors
And furthermore, nothing the Livingston County Social Services Commission wanted required that Mark Dennis ever see live data.
This one's simple, folks -- sure, Mark (or someone) needed to do a requirements analysis, sure, somebody had to decide what data entities to capture -- but very little real data was needed.
First, make some dummy data for the developers' use: run through your real data -- if you even need to base the dummy data off the real data --, and replace every name with a random dictionary word. Do the same thing for addresses, and replace Social Security and other id numbers with randomly chosen numbers. In all cases, maintain a constant map of real to dummy, to preserve relations within the data: "Mike Smith" is always translated to "Armchair Landowner" and "1450 Main Street" to "3321 Crumpet Sponge".
Once you've finished your translation, throw away the map.
Now the coder has data that's exactly as diverse as the real data, shows the same frequencies and inter-relations as the real data, is as internally self-consistent as the real data, and yet is (nearly) completely meaningless in terms of the real world, and (nearly) impossible to link to any real persons, places, or identifying information.
(It's possible one could still do traffic analysis on the data, and come up with aggregate data: either more male or more female (but which?) children are in the Social Services system; two zip codes out of six produce 70% of the cases (but which two?). If this is a problem you have to take a weighted slice of the data, and provide the developer with only this weighted slice; that (intentionally) skews your frequencies, but still preserves diverse data and any inter-relations among that data, closely enough to be representative for almost all design and coding needs.)
No trust involved. Just a simple and mechanical translation process that has to take place only once.
(If you really have a situation where the developer must base his requirements and code against gradually accumulating real world data -- and you shouldn't if you've planned at all well -- let one non-out sourced person hold the translation map -- and be held responsible for keeping it secret.)
And a process like I've outlined should be standard for any organization dealing with sensitive data.
Opinions on the Twiddler2 hand-held keyboard?
Note I bolded the 16 and the date, there is a page somewhere on that monstrous site which states they have 40 billion. I've seen it a few times unfortunately I can't pinpoint the location right now.
MoFscker
There is one Mark Dennis listed in Google in Lima, New York. This same Mark Dennis is also listed as the webmaster and treasurer for the local democratic committee in NY (http://www.limademocrats.com/bios/mark.asp). From there he volunteers a wealth of information about himself, including his email address.
I'm sure the 1200 families affected by his decision wouldn't mind finding out how to contact him.
That md5sum may as well be the ip address itself.
2^4 bytes * 2^32 addresses means that only 2^36 bytes would be required to store a copy of the hashes of the entire ip address space. Doing the lookup live and flagging all matches (you would have to search the entire space to make absolutely certain there are no collisions) would not take an unreasonable amount of time.
It's pathetic that they even question whether or not to inform the parents. That's like publicly saying; "Hey, we know we screwed up BIG, we know the media knows, but we're not quite sure if we're going to try and cover our own asses yet or not."
Knowingly endangering a child in any form is a felony. This is simply more proof that allowing the government to act with relative impunity results in criminal acts against citizens. The county is responisble for the leaked information and should be responsible for securing the daily activities of those children, to ensure the leaked data does not allow any harm to come to them.
When I was seven years old, my day-care center had 'accidently' released confidential information about myself and several other children in their care. The day-care center cared for somewhere around 70 children. The leaked information was found in the posession of a convicted child molestor. By the next day, the day-care center was shutdown and the city had filed criminal charges against it's owner and two employees at the facility.
Why is it that when the government does it, everything is not only OK -- but they're not even sure they should bother wasting their time to inform the parents/guardians that their children have been placed at risk.
This bogus trash needs to stop, the government has to be responsible for it's actions. They violate laws on a regular basis as a part of their daily operations. Enron is almost perfect compared to our own government.
That's pitiful.
Actually, I think I heard about this incident. It's a good argument for compartmentalization.
I used to work for a healthcare transaction company, and we developers had absolutely no access to patient data. I had no access to production databases, just dev and staging. Those databases used fake test data only. We weren't likely to be sued by Ima Genius or Homer Simpson over the loss of their records.
He mapped all alphabetic characters to X and all numeric characters to 9. The data will look like this:
XXXXXXXXX XXXXXXXXXX 9999 XXXXX XXX 999-999-9999
XXXXXXXXX XXXXXXX 999 X XXXXX 999-999-9999
XXXXXXX XXXXXXXXXX 999 XXXXX XX 999-999-9999
Which is fairly obfuscated. Obviously it looks like name, address and phonenumber and a skilled logician might be able to extract information based on the lengths of the data fields, but it's pretty secure.
Thank you, thank you, thank you. That is EXACTLY the kind of thoughts which HIPPA et al are supposed to foster. Real patient data should never be acessible except by people whos jobs it is to use that data. The people whos job it is to track and store the data have no need to see it. Now if only we could get an anti-PATRIOT act passed that forbade the government from accessing an private database for purposes of following its citizenry.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.