Data Miners Scraping Away Our Privacy
Presto Vivace writes "Twig, writing for Corrente, reports on data scrapers. They are not looking for passwords and such; scrapers are looking at blogs and forums searching for material relevant to their corporate clients. We are assured that the information is 'anonymized' to protect the identities of forum participants. However, a tool called PeekYou permits users to connect online names with real world identities. No worries, though — if you have a week to spare, you can opt-out of some of the larger data banks."
The biggest issue with information on the internet has always been how to separate the crap from the good stuff. The fact that they're gathering data is uninteresting: what I'd be interested in is their signal-to-noise ratio.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
If it's posted in a public space - it's not private ... and so on.
If it's accessible via public records - it's not private
If it occurs in a public forum - it's not private
If, for legal reasons, it must be disclosed in public - it's not private
If someone were to compile that set of information in an easy-to-read for, complete with a table of contents and nice index, that is also not invasion of privacy.
Using a computer to do the heavy lifting and reducing the time required to match everything together is also not invasion of privacy.
Listen, if you're talking about the privacy of your public information, and you're threatened by search engines, you are relying on security through obscurity. At least the people here on slashdot should recognize the follow of that.
It is a bit unnerving to think of "opting out" of something that I never consented to in any form. I am going to guess that most people are not even aware of these companies.
Yes, I know, "Don't post data about yourself online!" That is not really the answer when most people think that Facebook is the way to be social. I do not have a Facebook profile, and I stay off of other social networking websites too; I am not going to pretend for a moment, though, that I am even close to representative of the norm. It is easy to make fun of all those "fools" out there who are undermining their own privacy, but in the end, that is not going to solve the problem, and eventually even people who want to have privacy will find that it is not possible to do so.
Palm trees and 8
It takes your help. It is not like they are sneaking into my house and going through my underwear.
In the one case they mention the guy feels "violated" because he linked from a pseudonymous depression message board to his blog where he used his real name.
Bloggers publicly blogging on the public internet have no expectation of privacy.
nothing more than what anyone can find about someone else online. one time a contractor ripped off my inlaws for $15000 and it took my wife and I 3-4 hours to find his home, phone number, the fact that everything was in his wife's name, etc. cost $40 or so.
For our privacy rights as individuals, it should ALWAYS be opt-IN for this, not opt-OUT!
He who knows best knows how little he knows. - Thomas Jefferson
If "Bob Smith" is a registered sex offender in a large urban area, another Bob Smith in the same area might have some difficulty getting hired for a job. Perhaps the scrapers might see some revenue in selling "whitelist" services.
I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
I'm quite careful about what I post online and actually have had positive comments in interviews about what people have found having googled my name. I tried putting my screen name (which I believe is unique) into PeekYou and it entirely failed to find my real-name, google searching names is much better to be honest.
Lets hope they're Chilean - they might get stuck in a shaft between our Blog and our Facebook.
Brain surgery - it's not rocket science!
Banks, insurance companies, etc may end up using this kind of data to inform their risk management decisions. Eventually, that may mean that if they don't have this kind of data, you are risky by default. Look at what's happened with the credit bureaus. Technically they are opt out. But if you actually opt out, you put yourself at such a tremendous disadvantage that you can't really do it. You are forced to let these people have all sorts of detailed personal information, if you just want to live your life.
Perhaps we need some sort of data mining fifth amendment, where refusing to provide information cannot be used against you. But that's wishful thinking. In reality, people who just want to be left alone are probably going to be better off not opting out, as that would draw more attention than just blending into the crowd.
Give me Classic Slashdot or give me death!
go walk on a beach so the directional microphones can't pick up what you say through the surf noise
but if you want it to be public, post it on the internet
because as the other story from yesterday about the government spying on facebook shows: you are in the absurd scenario of trusting the GOVERNMENT to make rules, and you are trusting the GOVERNMENT to enforce rules, about what? about what you put in wide open view on a public internet. to me, that expectation of yours is insane
why are you trusting the government to do this? even if they had the intent and the enforcement capacity to do so, you honestly think they will do a capable job? with what? the corporate subcontractors with the financial involvement with the corporations who are after your data? pffft
and say the government fails to protect your data. ok, they sue and prosecute the offending corporations. but your info is already in the database. the database that is now mirrored 50 times by 25 different entities! once it gets on the internet, IT NEVER DIES. so please, get real: if you don't want it to get in a database, DON'T PUT IT ON THE FREAKING INTERNET
it is that simple. all other point of views are, frankly, a form of absurdity in which
1. you distrust corporations and governments with your private info,
2. so you put that private info on a public internet,
3. trusting corporations and governments to keep that info safe from
4. the same corporations and governments!
(smacks forehead)
i have a hen house. to protect that hen house from the wolf in the woods, i will hire the wolf in the woods to guard the hen house. wtf?!
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Are these the 33 gold miners Chile saved from doom 2 days ago? Wow. Skills transfer in awesome ways these days.
We have no privacy on the internet - it is something no matter how much we want it to be otherwise, how much it ought to be that way, you can just forget it. Someone can reply to me all they want and nothing will change this. We do understand this principle in some ways - look at the recording industry and music sharing. We tell them that their old model of thinking is outdated because of the way the internet operates. That they just need to learn to live with it as there is nothing one can do about it - the only recourse is through legislation and that will not work in the long run. We tell them no matter how much they want otherwise it just isn't the Way Things Work - same here.
Ultimately this type of privacy no longer exists - they aren't intruding into something you have protected (which is another matter - that is still viable for many things), they are mining things you said in public. If you say it in public then it isn't "privacy" - the only thing different now is that the information is persistent and index-able.
Such is life on the internet - you choice is amongst four different things: Accept it and act accordingly, rail against it and be run over, pretend it doesn't happen and get run over, or never use the internet. You have no other choices.
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
If I flash my privates in house but have the curtains open and so anyone from the street can see, I cannot complain about people looking and might indeed be arrested myself.
If I do the same in a house seperated from the road by a high fence and you put a ladder on the street and use nightvision goggles to look at my dangler, YOU are going to be arrested.
What is privacy? Is it the absolute letter of the law OR does EXPECTATION of privacy come into play?
You can follow me night and day. BUT that is very expensive and so you don't. So my actions in public are private simply because logging them would be far to costly. So I have come to expect that my actions in public are not constantly logged. Should this now change just because it has become possible to log them all? Should it be legal to record my every movement just because total CCTV surveilance has become feasable?
I do NOT know the answer to this question. On the one hand, I think that if you misbehave in public you should not have the right to complain "but I didn't expect anyone to catch me, so I should be free" BUT I also think that private companies being able to trace everyone constantly would be a REALLY bad idea.
If I ask on a forum about a health issue, should my insurance company be able to use this? I think not. Sure, if I am breaking the law, making false claims. But to deny people access because they think they might have a probem? No, that is going way to far.
Privacy is about more then things being recorded, it is about the idea that NOT everyone should constantly want to check up on everyone else. Just because I wrote a poem to a girl does NOT mean it has to be recorded by every private company in the world and be sold to the highest bidder.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I saw no connection between my real identity and any of my online identities.
In fact, it barely had any information on my online identies, anyway. The only information it had on my real identity was stuff I already knew was out there, mainly job related stuff like LinkedIn.
Technoli
You know its hell being called Anonymous Coward. People keep accusing me of posting all sorts of things on line. Its almost as bad as the trouble my Brother In Law Allan Specimen has buying things with a credit card.
you are trusting facebook to actually keep it private
in what world do you live in which you have determined that facebook is worthy of that trust?
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
and since these entities simply cannot be trusted with keeping info safe, i think we are rapidly entering an age in which privacy simply doesn't exist
not out of any malevolence or malfeasance, but as a simple direct logical corollary to the growth of the internet and the unintended consequences of how things actually play out in the real world, regardless of anyone's intent
anything that gets in contact with the internet: it never dies
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
again slashdot falling for providing free PR to some bullshit idea.
this thing does not really work as considerable context is required to correctly associate a username with its user.
likewise and as pointed above for real name.
automating a simple scrape search of 'social' network is silly.
Surely you mean mining away our privacy? Hah hah hah hah hah hah hah hah hah, just like mining in Chile, or playing minecraft.
i read your post in its entirety. i was attacking the idea that trust is even possible in the situation. and now i see we are actually in agreement, because you also think such trust is nonexistent. your point simply seems to be a lot of fools meanwhile still trust where there is none. therefore, we have no disagreement, because that is my point too. cheers
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Just like the do not call registry.....all it really does is confirm for the companies that you really are you. That they've successfully connected those dots.
He has a middle-of-the-road name - not exactly common, but not wildly inventive.
Just so happens that a man convicted of indecent assault against a minor has the same name and comes from the same county.
The worst thing to happen (so far) was that my friend's FB account was deleted, and he had to create a new one and fire a "WTF?" email at FB. It was all rather amusing and it didn't cause any lasting damage, but I haven't had the heart to take him to one side and say, "Dude, seriously, you were *lucky* that's all that happened..."
People are dumb, and computers are dumb, yet the two sets seem to trust each other far more than is warranted. *That's* where the problem lies.
Meta will eat itself
Copying music wasn't much of an issue until it became not only trivial to do but also trivial to share.
Once upon a time a third party would have had real work to do to find out how much I pay in property taxes, for example.
Yeah, it's public information, but it wasn't trivial to get.
I want accessibility of information about me to help me and make my life easier.
I don't want easy access to _my_ information to make it easy for other people to make my life more difficult.
I have no doubt that 'opting out' causes the problem to get dramatically worse, as the companies use the additional details (you have to fax your drivers licens to the first one on the list) to increase the value of your portfolio and sell it off to a bunch of other databases while they are 'removing' you from their own. They probably don't even bother removing you from theirs, because honestly what consequences are they going to suffer?
"They are not looking for passwords and such; scrapers are looking at blogs and forums searching for material relevant to their corporate clients."
Web scraping for passwords? Why would anyone have thought this in the first place? It's a bad comparison. If your passwords are already on a website to be scraped, your problem isn't data scrapers.
If the only thing I have to fear about is PeekYou, then I'm utterly anonymous.
http://dilbert.com/2010-12-13
that this is a real problem as I have personally experienced problems with data scrapers, scraping my data. However, this tool they are talking about (PeekYou) couldn't find a stripe in a pack of fruit stripe gum. I looked up several of my handles and several of my friends handles and was not able to find anyone. Then I looked up real names and was still unsuccessful. So, don't worry about (PeekYou) worry about people doing actual data-scraping the old fashioned way.
... we'll be rescuing them from an underground data mine after they've been stuck there for 69 days.
Surely the way round this is for those that feel strongly about thier privacy to post meaningless drivel that has no relationship to themselves or anyone else at regular intervals. The datascrapers will be unable to tell the difference between truth and reality and their business model will fail.
There's a use for Twitter after all!
SteveB
There's only two ways to fight this - one is to push for data privacy laws, and the other is to pollute the data stream. When you're asked for a name, address, phone number or birthdate on a web site or form, lie. Just flat out lie. If you live on a town that borders another state (I'm originally from Kansas City, MO), say on forms you live on the other side of the border. Mixing states REALLY confuses data aggregators. The more information you get into the data stream that is fucked up, the harder it is to put it back together in an accurate way.
Make throwaway email addresses at gmail or wherever on a regular basis to use for all this, btw. And keep using DIFFERENT fake data, too, otherwise it will still be a consistent identity of sorts, and will probably eventually be tracked back to you. And don't ever put any real data in Facebook, etc., or put a link between your Facebook account and anything else. Social networking sites are by far the biggest leakers of personal data.
I have a mailbox at a local UPS store where I have everything sent.
If anyone has the least bit of concern regarding this issue, then please use a pseudonym when you post to conceal your identity.
Using a simple pseudonym is a tried-and-true method to prevent any comments, monologues, rants, or theses from being linked to ones true identity.
Just like the Credit Rating Bureau's you should be able to go to any of these companies and at least once a year be able to get a report of all the data they have on you for free! I know that for the limited data they do provide for free they have some of it wrong, and I'd like to be able to at least correct it so my virtual identity is correct !
"Shrug?" You obviously haven't been burned. I was foolish enough to send emails to a mailing list for a chronic medical condition under my real name, and now if you search for it you get all those stupid sites with misspelled URLs that show the searchable full text. The list admin went bonkers hiring lawyers and everyone unsubscribed in a hurry. I guess people do visit those sites if they're looking at it from the perspective of a signal to noise ratio.
*types in his user name*
"What that's not me!"
"... I wonder if it recognises my real name"
*types in real name*
*system now has the information it needs to link the user name to real name*
Looked myself up. It says I'm 40 years old! Because when someone insists on my birthday for no good reason, I use the birthday of Unix - or more precisely the birthday of the Unix clock.
Holy shit! *cuts internet cable*
I'm going to live in the woods. Err, shit....I'm going to go love in the desert! Yeah, I live in the desert, data aggregators!
After spending millions on data miners, surveys, demographic analysis, consumer panels, and consultants we've come to the conclusion that the target market for beer is young men.
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
yeah but then they can use those laser devices that translate the vibrations in the window glass back into speech at long distances
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I tend not to talk about myself in online forums. People may accuse me of being paranoid, people may even accuse me of being a fraud or a shill because I don't say who I am, but I act the way I do because I believe it's genuinely best to not care about such things, and I'd prefer others felt the same way.
But they won't be so sensible, so I do what I can to avoid giving them any justification to attack me. That's all they need, any reason at all will give them something to seek in the way of ammunition. And actual truth and honesty won't matter to them.
And since I just can't behave that unethically, I'm at something of a disadvantage compared to them because I just won't sink to their gutter level.
This is a very timely story. I just received an email from the university I graduated from years ago. They suddenly started sending the alumni newsletter to an email address that I have NEVER given them. In fact it's an address I keep semi-private. I have no idea how they got it... maybe by scraping linkedin or my blog or something. Contratz- now I'm sure I'll never donate to you.
Why a double standard for public officials?
Disband corporations? Seriously? I want some of whatever you are smoking.
Good god man, get some perspective.
You clearly have no idea what makes the USA work and apparently, you don't understand why the USA is a superpower. Hint: our corporations are partly responsible.
I realize it is not a perfect system. Nobody has ever claimed it is. But it's the best one we know of. Out of curiosity, what the heck would you replace corporations with?
Mailinator is here to help with the fake address thing.
Or maybe we should build better systems to empower regular users towards a more "transparent society" like David Brin talks about and an improved collective IQ like Doug Engelbart talks about?
Or, as I said here about intelligence tools, but would apply equally well with a fascistic/plutocratic binding together of corporations and governance like is increasingly the norm in the USA:
http://pcast.ideascale.com/a/dtd/76207-8319
"Now, there are many people out there (including computer scientists) who may raise legitimate concerns about privacy or other important issues in regards to any system that can support the intelligence community (as well as civilian needs). As I see it, there is a race going on. The race is between two trends. On the one hand, the internet can be used to profile and round up dissenters to the scarcity-based economic status quo (thus legitimate worries about privacy and something like TIA). On the other hand, the internet can be used to change the status quo in various ways (better designs, better science, stronger social networks advocating for some healthy mix of a basic income, a gift economy, democratic resource-based planning, improved local subsistence, etc., all supported by better structured arguments like with the Genoa II approach) to the point where there is abundance for all and rounding up dissenters to mainstream economics is a non-issue because material abundance is everywhere. So, as Bucky Fuller said, whether is will be Utopia or Oblivion will be a touch-and-go relay race to the very end. While I can't guarantee success at the second option of using the internet for abundance for all, I can guarantee that if we do nothing, the first option of using the internet to round up dissenters (or really, anybody who is different, like was done using IBM computers in WWII Germany) will probably prevail. So, I feel the global public really needs access to these sorts of sensemaking tools in an open source way, and the way to use them is not so much to "fight back" as to "transform and/or transcend the system". As Bucky Fuller said, you never change thing by fighting the old paradigm directly; you change things by inventing a new way that makes the old paradigm obsolete."
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
I'd like to point out at this point that I own my name, any pseudonyms, characters, described events and anything else I post. Any information gleaned from anything I have ever posted to the net may be used for data at a fee of not less than $500 per use and perhaps more dependent on the phase of the moon. If such remuneration should not be paid, an equivalency ,determined by me, will be extracted from corporate holdings at any time without notice. This warning is completely legal and enforceable by me at my convenience. Further I am not responsible for any injury, death or loss of property, status, paternity or esteem suffered by anyone objecting or attempting to bar collection. This contract is irrevocable and binding even if you have not read it.
So, feel free to poke into my privacy, it comes at a price. Call the cops, lawyers, government agencies or Santa Claus, it doesn't matter, I will collect and you WILL pay. Others have, so will you. Try me, go on, I want you to. Be seein' you later.
*Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
I'd happily give-up superpower status if it meant no longer having lawless corporations that can get-away with killing people, or stealing money, and no legal consequences (the executives take a golden parachute & never serve jail time).
.
>>>what the heck would you replace corporations with?
Proprietorships..... same thing we had before the "incorporation license" was invented. Then the owner would be directly responsible w/o any way to escape punishment.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Efficiently meaning that if middle management or lower management did something bad in the name of the company, the owner would be responsible without giving authorization to said act.
The way it works in US is corrupt & bad. But here in Europe, corporations seems to generally play nicer and held accountable easier.
Pulsed Media Seedboxes
I know I'm a bit late, but... I'm not very worried. I use my name everywhere, am not at all secretive about who I am in real life, and I've seen it used by exactly one other person ever (who capitalizes it NemineM, anyway, while I don't capitalize it at all). A human could get from neminem to my real name in about 3 minutes if they were at all competent at googling. So, I clicked that link, entered my username: it thinks approximately 97% of the United States associated with the name. That's some great detective work there.