Slashdot Mirror


Databases and Privacy

A couple of stories made an interesting juxtaposition today. First read this story about information marketers scouring public records to compile personal information. Note the emphasis on cross-linking data from various sources to provide more information than any one source did - databases are synergistic. Now read this column about David Nelson, and its follow-up.

5 of 173 comments (clear)

  1. Good thing databases are perfect! by plopez · · Score: 5, Informative

    Seriously, I spend a large amount of my time working with gov't. and private databases and info sources. Reconciling different views of the universe is nearly impossible. WHen I read about people cross referencing databases the amount of checking, QA and scrubbing required to have any confidence in the results iis horrendous.

    Example: person A gives you a download from thier database into a SS, person B (who may actually work for the same agency or company) supposedly gives you the same information but the 2 version do not match.

    And this is assuming that there are other areas where they may or may not be in alignment (e.g. abbreviations, type of info gathered, spelling variations etc.).

    Now take the combinatorics of tens of thousands of gov't and private DB's, and you will understand that:
    1) A good clean DB is horrendously expensive.
    2) Driven by the profit motive, most compaies are unwilling to take the time and spend the money to properly QA and scrub thier data.
    3) Much of the cross matching is therefore useless due to noise.
    4) TIA is totally bogus. See above.
    5) Having some anonymous DB of information tracking your life is very scary.

    --
    putting the 'B' in LGBTQ+
    1. Re:Good thing databases are perfect! by stanwirth · · Score: 4, Informative

      Actually, governments and corporations are very willing to spend tremendous amounts of money on:

      • data cleansing and QA
      • data warehousing
      • surrogate key generation
      • data correlation
      • data mining
      • geocoding (linking an address to a lat/lon, identifying the lat/lon with a neighborhood, municipality, county, state, country; linking a lat/lon to an address)
      • database integration
      • data migration
      • legacy systems
      • data audit trail generation
      • dataset purchases
      It's not "impossible" to reconcile different data on the same subjects, it's just a whole lot of work, much of it analysis and data discovery, and being able to do the work typically requires that you be familiar with a variety of RDBMS's, billing engines, debt engines, file formats and platforms. The combinations are almost endless.

      Take heart. You'll start seeing the same kinds of problems over and over: middle initial vs. middle name, spacing and capitalisation issues, address data entered as a small number of big long strings that needs to be parsed out into attributes, date/time format inconsistencies, record doubling, data integrity issues (1 supposedly unique key identifying multiple distinct records), data accuracy issues (data way out of range, data incorrect), null values with meaning, attributes used to identify a range of different things, "smart keys" that are not so smart being used to code everything about a customer in 8 characters, and so on and so forth. And you'll know to look for these "usual suspects" first, and develop some standard ways of dealing with them.

      Metadata management and ETL tools make the job easier, but as you say, data are imperfect. There are plenty of legitimate applications--every merger, acquisition and JV is yet another opportunity for some more mind-numbing, back-breaking, soul-destroying, spirit-crushing DB work. Oh goody. That's why they call it "work," I suppose. I'm surprised the work Neo was doing in The Matrix -- before he found his "calling" so to speak--was something as creative and interesting as software development. The real grind is the big databases. As you so aptly point out.

      Many industries have, as their primary asset, data and data only . Banking and insurance are the classic examples. Companies in these industries are certainly willing to invest in their most important asset, because just about all the money in the world is in databases.

      A database is like a gun. It can protect you, it can kill you. You can shoot yourself in the foot, somebody else can take you out in a 'hunting accident.'

      The difference between a database and a gun is that a gun needs someone behind it pulling the trigger. A database, OTOH, has triggers that can fire based on whatever criteria's been set--like when a 'David Nelson' tries to fly to Peoria. Yah, it's scary, all right.

  2. I work for a "Risk Management" company.. by booms · · Score: 5, Informative

    And honestly, you'd be surprised how many privacy laws we have to follow (which is a good thing). For instance, we only sell accounts to people who have a legitimate purpose for searching information (such as insurance companies when you apply for insurance, law enforcement agencies to track down criminals, collection agencies who are trying to track down people who skip payments, etc.). If I were to search for information about someone besides myself or others in the development team whom have agreed to let me search their names, even when testing, I'd be fired within the hour. We have a compliance department who keeps track of all searches, has to report them to various authorities, etc. If someone searches for someone marked as a celerbrity, their account is shut down within minutes and one of our compliance people is on the phone getting documentation about why they searched for that name. In fact, the applications to get to the data we sell are quite nasty, and we only have a very narrow scope of people that we can sell data to.

    I think in general, personal data is protected more than you would think (at least public records, credit agency data, etc)-- I really have no idea how these 'unscruplous' companies get by with public data without having anyone come down on them. I'm a privacy & security advocate, and I don't feel what I do crosses my moral boundries (at least at this point).

    1. Re:I work for a "Risk Management" company.. by booms · · Score: 4, Informative

      Like I said, I don't know how other companies get around all of the various laws. He also violated FCRA by getting information about you which was used in a decision to "allow or deny credit" without it being a place which is certified for that, which is a pretty nasty penalty as I understand it. I don't know the specifics, as IANAL.

      I can see why the local police would probably not do much about it to be honest, but they are lazy for not pointing you in the right direction. If you want, I can ask around to see who the proper authorities would be to report this occurance to.

  3. Re:Random Lies by Cygnusx12 · · Score: 5, Informative

    Anyone else? I Lie. Sometimes I'm a yak herder with a yearly income of ~$6000, other times I'm a "Decision Maker" with a yearly income of $800k+.

    As someone who used to work in database aggregation with this sort of data. I can tell you that we corrollated income as a function of your home value. (Which is freely available right down at your local county court house in most states).

    You typically don't have 800k/yr decision makers living in 12k/yr apartments. There's a process in compilation here, they don't just enter this into a database and sell it.