Databases and Privacy
A couple of stories made an interesting juxtaposition today. First read this story about information marketers scouring public records to compile personal information. Note the emphasis on cross-linking data from various sources to provide more information than any one source did - databases are synergistic. Now read this column about David Nelson, and its follow-up.
Seriously, I spend a large amount of my time working with gov't. and private databases and info sources. Reconciling different views of the universe is nearly impossible. WHen I read about people cross referencing databases the amount of checking, QA and scrubbing required to have any confidence in the results iis horrendous.
Example: person A gives you a download from thier database into a SS, person B (who may actually work for the same agency or company) supposedly gives you the same information but the 2 version do not match.
And this is assuming that there are other areas where they may or may not be in alignment (e.g. abbreviations, type of info gathered, spelling variations etc.).
Now take the combinatorics of tens of thousands of gov't and private DB's, and you will understand that:
1) A good clean DB is horrendously expensive.
2) Driven by the profit motive, most compaies are unwilling to take the time and spend the money to properly QA and scrub thier data.
3) Much of the cross matching is therefore useless due to noise.
4) TIA is totally bogus. See above.
5) Having some anonymous DB of information tracking your life is very scary.
putting the 'B' in LGBTQ+
And honestly, you'd be surprised how many privacy laws we have to follow (which is a good thing). For instance, we only sell accounts to people who have a legitimate purpose for searching information (such as insurance companies when you apply for insurance, law enforcement agencies to track down criminals, collection agencies who are trying to track down people who skip payments, etc.). If I were to search for information about someone besides myself or others in the development team whom have agreed to let me search their names, even when testing, I'd be fired within the hour. We have a compliance department who keeps track of all searches, has to report them to various authorities, etc. If someone searches for someone marked as a celerbrity, their account is shut down within minutes and one of our compliance people is on the phone getting documentation about why they searched for that name. In fact, the applications to get to the data we sell are quite nasty, and we only have a very narrow scope of people that we can sell data to.
I think in general, personal data is protected more than you would think (at least public records, credit agency data, etc)-- I really have no idea how these 'unscruplous' companies get by with public data without having anyone come down on them. I'm a privacy & security advocate, and I don't feel what I do crosses my moral boundries (at least at this point).
Anyone else? I Lie. Sometimes I'm a yak herder with a yearly income of ~$6000, other times I'm a "Decision Maker" with a yearly income of $800k+.
As someone who used to work in database aggregation with this sort of data. I can tell you that we corrollated income as a function of your home value. (Which is freely available right down at your local county court house in most states).
You typically don't have 800k/yr decision makers living in 12k/yr apartments. There's a process in compilation here, they don't just enter this into a database and sell it.