Slashdot Mirror


Outsourced Confidential Data On Children Posted

Kataire writes "MSNBC exposes a grievous blunder in which an outsourced programmer posts highly confidential data to a public website, concerning the daily whereabouts of hundreds of children in upstate New York. Yes, this person did this not once, or twice, but three times, with two different data sets. Even worse, the data was out there, publicly 'visible' for months. Just because RentACoder finally discovered and yanked it, after a coder 'stuck with a tricky formatting issue' posted the specific database he was working on to their messageboards, doesn't mean the damage is undone. The ramifications reach beyond the painfully obvious privacy issues, touching on outsourcing and peer ethics."

21 of 438 comments (clear)

  1. Re:Hmmm by MaineCoon · · Score: 4, Informative

    Very creative, however, if you had read the whole article, you would have realized that the chain of contractors - the university that received the original contract, the programmer they subcontracted, and the programmer that the subcontractor contracted, were all US citizens and/or organizations.

    Just because a programmer is located in the US does not make him or her infallible and capable of doing perfect work.

    --
    Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
  2. the dumbasses... by SHEENmaster · · Score: 4, Informative

    Who the hell thought to give him REAL information about these children in the first place? A fake datase would've worked just as well for development purposes.

    --
    You can't judge a book by the way it wears its hair.
    1. Re:the dumbasses... by SirSlud · · Score: 5, Informative

      actually

      1. It's bad to develop with real data, because you make assumptions about what kinda of data you have to process. You should unit test the code, by *trying* to break it by using known invalid formats or invalid data to ensure that your software handles such input inconsistancies gracefully. As in, the only way to be sure your software won't core, or fork bomb, or enter an infinate loop is to test it on test data, which should be created by the developer.

      2. You're right about going live tho. You'd never go live with software before you QA'd it in the final go-around with the real data just to ensure you're not going to spend 2 hours upgrading a platform, and 2 hours backing out.

      Neither of these points has any bearing on the fact that, as a developer, you will (most of the time) have/need access to the real data at some point, so it really is up to the developer and the contractor to set out rules for the usage of the data, and even to have the developer sign an NDA of sorts to put the accountability where it should belong.

      What stories like this really highlight is the sorts of losses that can occur from outsourcing or contracting that dont often show up on a cost analysis of the project. The less control and supervision you have over your 'employees', the higher the likelihood that those employees may do something with their relationship with you that may damage the company. I've had numerous higher-ups in other companies pass me sensitive data just because they need something fixed as soon as possible, and they can't find the experience/ability in house, and I just think its a completely irresponsible way of conducting business. But if I did something dumb with that data, it wouldn't be my ass on the line, because I was handed that data with no legal documentation concerning how I can use it and what I can do with it. Then again, maybe lawyers might see that differently.

      All I know is that when it comes to outsourcing, its usually a gain in labour flexbility and cost effectiveness at the expense of a higher risk for the disclosure of sensitive information, be it data or security rights. It's a cost that employers can willfully ignore if they so choose, but again, I think its just bad business practices. Full employees have a far greater vested interest in the success of their employer and are far less likely to do stoopid things that one-off contractees have been known to do. That is, full time employees are more likely consider the legal and financial implications of how they go about providing solutions for product development. Employers hate that to admit it, tho, because it highlites the downside of a their utopian flexible labour force in which there exists little job security for the people actually doing the gruntwork.

      --
      "Old man yells at systemd"
    2. Re:the dumbasses... by orthogonal · · Score: 2, Informative

      The more interesting question is why he felt the need to post the real data. If I had a database formatting error, I would have written a fake database that was corrupted in a similiar (sic) wayt (sic) and asked about it.

      I'm guessing it's because he was a lazy dumbass who just didn't give a rip about the confidentiality of low-income kids in foster care.

      Given that the article mentions he was informed that he'd posted live data, responded that he'd made a mistake and wouldn't repeat it, and then re-posted the same data the very next day I think supports my assessment.

      As to why you would have gone to the trouble to substitute in fake data, well, you've got some equipment he apparently lacks: professional integrity and an ethical compass.

  3. Re:Who do you trust? by segment · · Score: 4, Informative
    Who gets to play Big Brother? That's an easy one ... Choicepoint gets to play Big Brother. They tout 40 billion records... 40 billion records on about 300million Americans?...

    And what will they do with what they know? They claim to be able to pinpoint every move you made from college to getting tossed out your duplex etc.,

  4. Not outsourced overseas by crymeph0 · · Score: 5, Informative

    Rather than mod you down, I'll just let you (and all the other knee-jerks) know that THIS WAS NOT AN INDIAN PROGRAMMER. This was a guy named Mark Dennis. Not a very Indian sounding name. Also, Mark Dennis actually subcontracted the job involving the database out to someone in New Jersey. Maybe IHBT, but the article summary could make you believe this had to do with offshore outsourcing, so that's a misconception we should clear up early.

    --
    It should be illegal to say that freedom of speech should be limited.
    1. Re:Not outsourced overseas by crushinghellhammer · · Score: 2, Informative

      LOL, you xenophobic freak!

      And here's some info that you can spout at your next Xenophobes R Us meeting:

      Mandara is not an Indian name, Mandira is (added bonus info: that's a woman's name)

      I've never heard of Deepthanshu (I'm part Indian, and though I live in the US, know quite a bit about India) and even if it was, it would be a first name, and not a last name.

  5. Re:Really, this is not OT by MaineCoon · · Score: 5, Informative

    However, in this case, all the outsourcing was within US borders, as is evident from the contents of the article.

    --
    Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
  6. Simple... by Vrallis · · Score: 3, Informative

    :%s/[A-Za-z]/X/g :%s/[0-8]/9/g

    Simple. Just obfuscate it, and you can pass it around for people to help with formatting issues all you want. I've done that with payroll data plenty of times.

    Just two lines or vi commands could have saved this guy so much trouble....

  7. Re:Stupid coder, stupider company... by jas79 · · Score: 2, Informative

    their logs only say how many people downloaded the file. not how many people actually unzipped it.

  8. Re:Really, this is not OT by The_K4 · · Score: 2, Informative

    Ok, take off the tinfoil hat and realize that NONE of this took places outside of the US. They DID hire a US contractor (actually a university) with hired a US subcontractor, who hired a US subcontractor. The guys lived in Nowhere, NY and Nowere, NJ!

  9. Re:Who do you trust? by Anonymous Coward · · Score: 4, Informative
    C'mon, give it up! Do you work for UCSF, Sausalito Transcription Stat, Sonya Newburn or Tom Spires?


    And was it India or Pakistan? And was the "Indian" really a Pakistani woman named Lubna Baloch? And was the problem really because UCSF required such little control over the custody of the medical records that it allowed them to be handed of to a chain of at least four levels of subcontractors before they ended up in Pakistan?


    Oh, and was it really Sonya Newburn who paid off Baloch?


    It's not so super-secret as you think. And the real issue (in your hospital's case) isn't that you couldn't bring the weight of American law enforcement to bear, it's that your organization completely lost control of the data that was entrusted to it.


    Incidentally, UCSF has revised its contracts to require its transcriptions firms to reveal who they subcontract with.


    P.S., if you click on the little "Post Anonymously" checkbox, your /. ID won't be revealed. Although I don't think that you'll be in much trouble given that the whole business is splattered all over Google.

  10. Potential coppa violations, too by bugnuts · · Score: 3, Informative
    If the kids were under 13yo, the programmer could have violated COPPA, the Children's Online Privacy Protection Rule.

    In other words, this guy could not only have given a black-eye to the county, but he could even go to jail for it.

    If the information lost can be linked to a crime against one of the kids (no matter what age), he better have a good attorney. Gross Negligence and Reckless Endangerment come to mind.

  11. Re:Who do you trust? by orthogonal · · Score: 5, Informative

    Who do you trust? And who do you get to solve something like this?

    In this particular case, you needn't trust anyone.

    Nothing that Mark Dennis wanted to do -- build the database structure, build the front-end, or get help with his "tricky formatting problem" required that he use supply real data to RentaACoders or other sub contractors

    And furthermore, nothing the Livingston County Social Services Commission wanted required that Mark Dennis ever see live data.

    This one's simple, folks -- sure, Mark (or someone) needed to do a requirements analysis, sure, somebody had to decide what data entities to capture -- but very little real data was needed.

    First, make some dummy data for the developers' use: run through your real data -- if you even need to base the dummy data off the real data --, and replace every name with a random dictionary word. Do the same thing for addresses, and replace Social Security and other id numbers with randomly chosen numbers. In all cases, maintain a constant map of real to dummy, to preserve relations within the data: "Mike Smith" is always translated to "Armchair Landowner" and "1450 Main Street" to "3321 Crumpet Sponge".

    Once you've finished your translation, throw away the map.

    Now the coder has data that's exactly as diverse as the real data, shows the same frequencies and inter-relations as the real data, is as internally self-consistent as the real data, and yet is (nearly) completely meaningless in terms of the real world, and (nearly) impossible to link to any real persons, places, or identifying information.

    (It's possible one could still do traffic analysis on the data, and come up with aggregate data: either more male or more female (but which?) children are in the Social Services system; two zip codes out of six produce 70% of the cases (but which two?). If this is a problem you have to take a weighted slice of the data, and provide the developer with only this weighted slice; that (intentionally) skews your frequencies, but still preserves diverse data and any inter-relations among that data, closely enough to be representative for almost all design and coding needs.)

    No trust involved. Just a simple and mechanical translation process that has to take place only once.

    (If you really have a situation where the developer must base his requirements and code against gradually accumulating real world data -- and you shouldn't if you've planned at all well -- let one non-out sourced person hold the translation map -- and be held responsible for keeping it secret.)

    And a process like I've outlined should be standard for any organization dealing with sensitive data.

  12. Re:Who do you trust? by segment · · Score: 3, Informative
    I sincerely hope you were kidding about that. In case you weren't, Choicepoint is in the business of selling data... Yours

    source ChoicePoint Acquires National Data Retrieval, Expands Presence in Public Records Field

    ALPHARETTA, Ga. - January 2, 2003 - ChoicePoint (NYSE: CPS) today announced the acquisition of National Data Retrieval Inc. (NDR), one of the nation's leading providers of public records information for bankruptcies, civil judgments, and federal and state tax liens. Terms of the acquisition were not disclosed.

    National Data Retrieval, which also is based in Alpharetta, has 26 fulltime employees, all of whom will be retained, plus a nationwide network of approximately 400 independent collection contractors. The privately held company was established in 1989.

    NDR's products, services and public records databases of nearly 43 million records will complement ChoicePoint's existing Court Research and Retrieval Group (CRRG), which processed approximately 5 million records requests in 2002. NDR's customers will gain access to ChoicePoint's CRRG technology and records collection facilities, supported by ChoicePoint's proprietary database of more than 16 billion public records.

    Note I bolded the 16 and the date, there is a page somewhere on that monstrous site which states they have 40 billion. I've seen it a few times unfortunately I can't pinpoint the location right now.

  13. Mark Dennis by Anonymous Coward · · Score: 1, Informative

    There is one Mark Dennis listed in Google in Lima, New York. This same Mark Dennis is also listed as the webmaster and treasurer for the local democratic committee in NY (http://www.limademocrats.com/bios/mark.asp). From there he volunteers a wealth of information about himself, including his email address.

    I'm sure the 1200 families affected by his decision wouldn't mind finding out how to contact him.

  14. Re:Confidential data on slashdot by Anonymous Coward · · Score: 1, Informative
    Unlike other forums, posting anonymously leaves nothing but a MD5SUM of your ip to be used in court.

    That md5sum may as well be the ip address itself.

    2^4 bytes * 2^32 addresses means that only 2^36 bytes would be required to store a copy of the hashes of the entire ip address space. Doing the lookup live and flagging all matches (you would have to search the entire space to make absolutely certain there are no collisions) would not take an unreasonable amount of time.

  15. Ewwwwwwwwww by GoMMiX · · Score: 3, Informative

    "County officials have not yet determined if they will tell the families involved about the incident.

    It's pathetic that they even question whether or not to inform the parents. That's like publicly saying; "Hey, we know we screwed up BIG, we know the media knows, but we're not quite sure if we're going to try and cover our own asses yet or not."

    Knowingly endangering a child in any form is a felony. This is simply more proof that allowing the government to act with relative impunity results in criminal acts against citizens. The county is responisble for the leaked information and should be responsible for securing the daily activities of those children, to ensure the leaked data does not allow any harm to come to them.

    When I was seven years old, my day-care center had 'accidently' released confidential information about myself and several other children in their care. The day-care center cared for somewhere around 70 children. The leaked information was found in the posession of a convicted child molestor. By the next day, the day-care center was shutdown and the city had filed criminal charges against it's owner and two employees at the facility.

    Why is it that when the government does it, everything is not only OK -- but they're not even sure they should bother wasting their time to inform the parents/guardians that their children have been placed at risk.

    This bogus trash needs to stop, the government has to be responsible for it's actions. They violate laws on a regular basis as a part of their daily operations. Enron is almost perfect compared to our own government.

    That's pitiful.
  16. Re:Oops.... by cyclist1200 · · Score: 4, Informative

    Actually, I think I heard about this incident. It's a good argument for compartmentalization.

    I used to work for a healthcare transaction company, and we developers had absolutely no access to patient data. I had no access to production databases, just dev and staging. Those databases used fake test data only. We weren't likely to be sued by Ima Genius or Homer Simpson over the loss of their records.

  17. You misread the regular expression by Anonymous Coward · · Score: 1, Informative

    He mapped all alphabetic characters to X and all numeric characters to 9. The data will look like this:

    XXXXXXXXX XXXXXXXXXX 9999 XXXXX XXX 999-999-9999
    XXXXXXXXX XXXXXXX 999 X XXXXX 999-999-9999
    XXXXXXX XXXXXXXXXX 999 XXXXX XX 999-999-9999

    Which is fairly obfuscated. Obviously it looks like name, address and phonenumber and a skilled logician might be able to extract information based on the lengths of the data fields, but it's pretty secure.

  18. Re:Who do you trust? by afidel · · Score: 2, Informative

    Thank you, thank you, thank you. That is EXACTLY the kind of thoughts which HIPPA et al are supposed to foster. Real patient data should never be acessible except by people whos jobs it is to use that data. The people whos job it is to track and store the data have no need to see it. Now if only we could get an anti-PATRIOT act passed that forbade the government from accessing an private database for purposes of following its citizenry.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.