Why Big Data Could Sink Europe's 'Right To Be Forgotten'
concealment tips this news from GigaOm:
"Europe's proposed 'right to be forgotten' has been the subject of intense debate, with many people arguing it's simply not practical in the age of the internet for any data to be reliably expunged from history. Well, add another voice to that mix. The European Network and Information Security Agency (ENISA) has published its assessment of the proposals (PDF), and the tone is skeptical to say the least. And, interestingly, one of the biggest problems ENISA has found has to do with big data. They say, 'Removing forgotten information from all aggregated or derived forms may present a significant technical challenge. On the other hand, not removing such information from aggregated forms is risky, because it may be possible to infer the forgotten raw information by correlating different aggregated forms.'"
Few ideas are more absurd. They will have to outlaw all recorded media and burn down the libraries. Make ignorance the law of the land. Or maybe the authorities will get flashy things
“He’s not deformed, he’s just drunk!”
Let's say you meet The President or Prime Minister in real life. They say something that impacts you so greatly, it changes your entire life. Now - At the end of their life, let's say this law is in effect... But only available to the very wealthy... They decide they want the entire traces of their live erased to guarantee the ability to move on to an afterlife... Since energy/electricity and memories are reflections of each-other - that 'impact - the president had on you.. Is subsequently gone. Erased from your head, against your will. Your memory of this event, gone, because the wealthy KNEW how it would impact memories.. And the subsequent trajectory your life takes... Ever wonder why we get Deja Vu? Maybe this explains why we get it. Maybe this legislation did pass before, the world blew up when someone who did something miraculous never got a chance to be seen or heard, and died at the cross... Maybe he's coming back, now as we speak, to warn of making that same decision again, and the implications it means should the wrong decision be made.... Life can't happen. Until we let go of our need to control our image. And just accept who we are, and then have fun with wherever the journey takes us. It only becomes a cycle when we try to purge 'public records' of that information.... You are the mind of god.
If customers want their data forgotten then maybe they didn't want it stored or shared in the first place. The rule should not so much be about data retention but data gathering. The rule should be quite simple. Any organization that gathers data can't share it at all with anyone not directly connected with the reason it was gathered. So my power company needs my address to know where the lights need to be turned on and enough info to bill me. But anyone beyond billing and switching should not have my data, not management, not marketing, and definitely not a "trusted" third party.
The same with my driver's license that is needed by two small groups of people, the people who issue the license, and the police if they need to know that I am allowed to drive. It should literally be illegal for anyone else to copy anything from my license if it doesn't involve my ability to drive so say a car rental place would be OK. Many bars have taken to scanning driver's licenses as you enter the bar. Then you start getting mail and crap from the bar and anyone else they sell the data to. I met a guy who rewrote the data on the magnetic strip to cause buffer overruns and crash their little hand held units. He regularly went to every bar downtown that had the scanners as the crash wasn't a simple reboot of the unit as some remote server lost its mind requiring someone to come in.
These organizations find this data valuable but somehow think they can take that valuable thing from us without negotiation. I say you want my data you can pay me $1,000,000 per byte plus royalties on resale.
What about my right to control my server. I look at this 'right to be forgot' as the same sort of over reach which allows media companies to put DRM on my ebook reader or smartphone, then make it illegal for me to remove it. My equipment. My decision. You want to force be to keep or remove any software/data, then you get yourself a court order. I don't see why phantom Imaginary property rights seem to keep trumping rights over real property. Sheesh.
Expunging data is undesirable for those who think of it as an expense, especially one that might interfere with a revenue stream. Greenhouse gas emissions are no different in this regard. For industry it's a nuisance to be held accountable to any concern that gets in the way of short term profitability. That's why carbon caps, reduction and the like have gone nowhere.
What is the problem with doing the same for people?
Facebook actually makes it hard for people to remove their content from the service, and it doesn't even say "delete", it says "remove from timeline" (but not from the whole system).
If I want my Facebook history Wiped, it is my right to do that, it is *my* data and Facebook and others shouldn't have a operating license unless they make it really simple for people to "be forgotten".
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
As long as I have my right to not care. In the unlikely event I stumble upon your embarrassing "e-foible", I do not judge, and will soon forget. Unless you "protest too much", which might spark a memory...
This issue is a bit more complicated than you think.
I don't know if you remember MacOS prior to OSX, but classically it had two "forks" - the data fork which compares to the typical flat file we all know, and a properties fork which is something like the metadata in a file system (time created, ownership, permissions, etc) but with a much richer syntax.
OSX lost that separation and now uses a Unix-y model.
If we wanted data to be trustably limited in scope, then we'd have to structure *all* our data everywhere so that it contains the literal data being saved, as well as another "properties" fork which could contain information about the scope of acceptable usability.
It could be done, but it would be very, very, very expensive. I'm not sure whether it wouldn't be worth it, the right to privacy and personal rights does count for quite a bit, and the court system in the USA is also very, very expensive and equally worth it.
Note that since we're talking about data, Moore's law means that the cost is about 1 or 2 years of actual growth. 1 or 2 years of no growth at all to accommodate this idea....
I have no problem with your religion until you decide it's reason to deprive others of the truth.
not any government policy or commercial entity
they call it disruptive technology for a reason. like the printing press, or the gun, or the atom bomb, it dramatically changes the status quo
it's simple: if you don't want it to live forever, don't put it on the internet. if you put it on the internet, it lives for ever
that's about the truth of it
but i suppose many people out there are like music company executives trying to impose legal constructs from the cassette tape age on the internet: unwelcome to accept ugly reality on the subject
well i'm sorry, you need to accept this as reality, no matter your feelings
one other point: privacy is NOT dead
all you have to do is stop offering parts of your life to the internet
the insane part is feeding private parts of your life to the internet, and then whining about a lack of privacy
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
How do they update data with mistakes? They can delete things the same way. Sure the change might not hit every backup, but we should draw the line somewhere as good enough.
... de-identification is an area of active research, because we'd really like to be able to mine all that juicy medical record data without infringing on patients' privacy rights. The gold standard so far seems to be Vanderbilt's Synthetic Derivative, which cleverly alters individual records enough that they can't be traced back to the actual patient. If these records are then used to create aggregate data, then attempts to reconstruct patient records by "correlating different aggregated forms" won't work, because they'll just reconstruct the SD instead. It seems to me that a similar two-stage process could be applied in a number of realms, so Google or whoever could still do all the "Big Data analytics" they want without raising privacy problems.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
I think some would argue that there is a right to remember. The Wayback Machine, for instance, has been instrumental in proving corporate malfeasance. Do we really want to lose that?
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
They can delete your data but they don't want to so they tell you it's a nearly impossible task. Yet if a fellow corporation asked, the data would disappear. People need to have a backbone and stand up for themselves.
It's always a situation where we are using technological means to adapt to our human flaws or flaws in our own inability to accept embarrassing information as a society. Instead another way of looking at it is if everyone looks stupid, and embarrassing, eventually that becomes the new normal and we evolve and adapt.
The rules have to change as to what a "good person" is to include far more people. "Good people" should be allowed to look bad without being ruined.
You can't guarantee that you've erased all data about a person unless there is a unique identifier attached to their data. Otherwise, you could plausibly say, "we didn't know that browsing history was referring to John Doe."
So the "right to be forgotten" carries the risk of inviting the requirement that you be tracked more closely before you are forgotten. It's a little bit like being told you have to provide your DNA so the authorities can be sure you aren't the criminal they're looking for – and of course they'll destroy the data afterward.
Advertisers and "big data" will love this. In the short term, they can complain about how much it's going to cost them to do this. The bureaucrats and the public will assume they've scored a victory because of the complaining, and they might even give the companies a tax break or subsidy to offset these costs. People let their guard down, thinking they have more control over their data, which means they're exposing more information. Meanwhile, the corporations have a government mandate and possibly even funding to do what they've always wanted: tie everything together to identify you as an individual and get a complete picture of you.
When the companies are told to forget you, they will say they are complying, but many companies will find a way around it, like selling a copy of your data to an off-shore subsidiary that they own.
This is one of the things I found truly visionary in BS: Caprica. The idea that your personality, even your entire being, can be inferred from all the data that's being kept about you. Just, in the real world it's more scary.
BTW, before Caprica there was Gibson's Neuromancer, which featured 'Constructs'. A bit the same, but didn't incorporate Big Data to help create the construct.
Way back, the US Military, probably under the guise of DARPA, wanted a new database written. The concept was that it could track, for example, the salaries of all civil servants. If someone queried for the salary for one particular civil servant, the database would refuse to return that data unless the requester had the specific security clearance required (the need to know).
A clever requester might know that one civil servant working in one particular division might be the one and only manager. As such, they could request the total salary for the divsion, and then request the total salary all the non-managers (secretaries) in that division. Then the manager's salary could be deduced by taking the difference between the two queries. This would be an obvious security violation, and as such the specification said that any sequence of operations that might give away classified data should also be prohibited (unless the requester had clearance.)
This database project generated lots of press, because it was such a good idea. Last I heard, no one quoted on the project, because no one had the faintest idea how to build the database.
Anyone have any updates? What was the standard called?
"I was walking in the hills when I came across a man who looked as though he carried the cares of the world on his shoulders. I introduced myself and asked him what was wrong.
The man pointed to a bay in the distance and said: "look at all those ships down there. Do you know who built them?". "No", I replied. " I did", her replied. After a pause he said "but do you think they call me Dai the ship builder?.... no"
He then pointed to the city and and said "look at all those houses down there. Do you know who built them?". "Was it you?" I asked. "Yes", said the man, "but do you think they call me Dai the house builder?.... no"
He then pointed at a fine new church building , saying "See that church.... I designed that myself... but they don't call me Dai the Architect either".
With a sigh he turned to me and said: "....... but you shag one sheep"
We come across this regularly in confidentiality agreements with totally inappropriate clauses in them, for instance:
Must remove all copies of and derivative works from all backup media, databases, disks, etc.
You can set up systems that allow for some of this, but I can think of many cases where expunging derivative works can be practically impossible without violating some other piece of keeping-records-post-Enron kind of legislation (and we are in Australia with NO US subsidiaries). For instance, I could create a space for this material so that it is never backed up, then delete it when the CA finishes, but I have no control over whether the staff copy bits of it to other places or forward it around on email while creating derivative works. And you can be damn sure I'm not destroying my backups (our policy is keep forever- its amazing how long ago you might need documentation from relating to a lawsuit)
I know most of this discussion is weighted towards social media content, but the fact is if it is visible it may be copied, reposted, altered to be offensive to you or others and then it really is impossibly expensive to remove.
For instance, there is stuff I put on the web in the early 90s I wish to hell I hadn't. It's everywhere. Eventually it might be forgotten, but I'm pretty sure it'll outlive me.
So the argument is: because it is impractical, we cannot have such a rule?
So why exactly is sharing music illegal?
Either the law does not need to be practical, or we need to abolish laws that ask impossible things.
To techies the idea seems absurd, but it's not. Sure, your server, your rules. But what you pull into them is another matter entirely, and the American view that if it's not behind closed curtains, it must be public, doesn't scale.
Compare, of all places, Japan, where it is in fact customary to "not see" things that are pretty much out in the open out of sheer necessity because too many people are living too close together. In a sense, the internet is worse than Tokyo.
There's irony here, where the techies are deriding politicians for doing boneheaded things with far too much data. Well, this is part of that, but in reverse, and if they're doing it wrong it's up to us to find ways to do it right and nudge them in the right direction.
DRM became a bad word because big media deployed it to control their customer whom had thought they'd bought something only the seller afterward pulled a legalised fast one. David losing to Goliath until dvdjon came along.
Data protection in this case wouldn't include money passing hands in the reverse direction. It's more like, well, you put DRM on your SSN when you sign up (and pay) for something that requires it, and you can more or less reliably wipe your SSN out of their databases once they no longer need it.
No longer having to trust some faceless large entity on their wooly word salad assurances and their pretty face is a nice boon for the individual. Bit of a different power balance there.
Yet the only real fix is to not store all that data in the first place. This means that a lot of data that's being gathered now must not be gathered at all or perhaps some other data needs to be gathered. Zero-knowledge proofs will likely have a big place in that, say to prove you're old enough without showing your ID card with all that extra data you're forced to give out currently. This'll need new techology, but will prove necessary to really scale out our data use without building databases of ruin.
I wonder how much of this is genuine and how much is distortion of the news, sponsored by corporations with a vested interest in not having this legislation passed?
I don't think one can fault the intentions behind this legislation: A citizen should have the final say concerning any personal data. This is important when it comes to things like credit card information and other personal information, and it is perfectly feasible for a company to delete a person's information from their systems, when that data is held in a suitable format, like a database.
It is, of course not as easy if the data is part of a huge, unformatted stream of data - what they now call "Big Data"; but I wouldn't say it is impossible. After all, there are several technologies that target exactly personal information held in huge, unformatted datasets; if they can find it, they can also delete it - or blank it out.
It is idiotic to talk about "having to change history and going through backup tapes ...". Even an EU bureaucrat wouldn't demand that; this is just typical FUD, and the purpose is easy for everyone to see: people's personal information is worth money.
So now that someone actually wants to protect your rights and the control of your information you are against it?
And when you have no control of your private information that is also wrong.
In the UK (I don't know about the rest of the EU) an individual can send a subject access request to a company or organisation and that organisation has 40 days to send you all the information they have on you. Companies have been doing this for years now. It doesn't seem so hard to change the query from a SELECT to a DELETE.
Now the paper in the article talks about how publicly available information may be copied (via the web) without the original author/organisation knowing, e.g. you could copy this post and store/publish it else where and neither slashdot or I would know, so you can't guarantee that the data will be completely deleted. But personally I don't think this is that big of a deal. If I want company Foo to remove all the information they have on me, for whatever reason, what do I care that company Bar also has information on me?
I think, to a point, an individual should be responsible for tracking all the information that they want removed, and companies/organisations should be responsible for acting on legitimate requests to remove the information.
A few years ago, there was a lawsuit where a guy who had been convicted of murder sued Wikipedia to get his name and crime details removed. This was based on some German privacy laws, but could this fall under a Right To Be Forgotten as well? Could we get people suing individuals who post information about them (especially true information) because those people would rather the incidents be forgotten? Could posting "I just saw X and Y getting quite cozy over lunch together" on Facebook lead to a lawsuit from X and Y because their spouses didn't approve.
My sci-fi novel, Ghost Thief, is now available from Amazon.com.
In the unlikely event I stumble upon your embarrassing "e-foible", I do not judge, and will soon forget.
That doesn't work past the point (which we've already reached) where it's not real human beings considering personal data and causing significant consequences to the subject of that data, but automated systems that can essentially check everyone for everything they have data about.
If there are no systems in place to limit the decisions that can be made by such automated systems without human review, or inadequate checks and balances to put things right when the machines make a mistake, real people wind up suffering real consequences, yet mysteriously no other real people are ever responsible any more so there's no-one to hold to account, to compel to fix what they broke (if that's possible), or to force to change their ways.
Letting computers make these important decisions automatically just because we can leads to a world where no-one under 30 can afford insurance to drive, no-one with a family history of some unfortunate medical condition can enjoy any activity where it might be risky if they have the same condition, you can't travel if you have the same name as a wanted criminal and once lived in the same city, no-one who put a photo on-line of themselves drunk as a student can get a good job when they graduate, all the muslims get shot because they're obviously terrorists, you're likely to get several months of your life wasted complying with a tax investigation because you once filed the wrong form (or you filed the right form but the OCR software misread it), and so on. Do you really want to live in that kind of world?
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
true before the internet, true now
that's a separate point from mine
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Even if you could opt out and even make it the law, one second after the government wanted that data they would instantly violate the law, even if they were subject to it to begin with, which is highly doubtful, and the one area most important for "being forgotten", criminal sanction would be bypassed. The sheer variety and rate of increase of felonies these days is astounding.
and assume you aren't purposefully twisting the words i say, just not understanding them
if you don't want everyone to know about your large dildo collection, don't store it on your front porch
it's not "if you have nothing to hide, you've got nothing to fear", it's "if you don't want the world to know about it, hide it"
it has nothing to do with corporate entities or evil governments, it has to do with you managing your own social existence
you don't get to put something on a public network and then complain about that thing not being private
manage your social existence. and certainly don't complain when YOU decide to trust your private details with a corporation on a public network
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
As a former Oracle DBA, I'm well aware that developers do a good job getting data into the database but a terrible job of purging stale data from the DB. It seems they never think of that.
There are band websites I have saved in my Favorites from bands that were active in the late 90's but even an exact text search won't match in Google, Bing, or Yahoo. (Some other engines will gladly match the websites). Lyrics of the band no longer show up in the search, although I can manually navigate to the bands' lyrics pages from popular lyrics websites. So, clearly these websites were forgotten, or I am forced to admit that Google's search algorithm is horrendously not accurate. Indicating that big data CAN forget "aggregated" information (like the lyrics and such).
Synthetic Derivative is fundamentally broken. It rests on the "de-identification" idea that removing certain data renders the remaining data untraceable. Good *old* k-Anonymity.
In the spy world they call that "redaction" and there are good reasons that some classified documents are only released after decades instead of releasing redacted copies right away. http://www.cs.cornell.edu/~vmuthu/research/ldiversity-TKDD.pdf The paper on l-Diversity offers some clear examples and explanation of why the deidentification proceeds from a false assumption.
The only reason Synthetic Derivatives can get a gold star classification is because the medical lawyers who wrote HIPPA carved out a safe harbor exception to allow re-identifiable personal medical information to be traded.
I wonder what will happen when the people who are teenagers now grow up and want to run for politics. Most of them have spent the last few years doing extremely embarrassing things online (from an adult's perspective) and all this is going to get dug up by their opponents when they run for office.
That's when you might start to see a push for these "rights to be forgotten".
"I have never let my schooling interfere with my education." - Mark Twain
The summary makes me laugh "significant technical challenge" I can think of many things that present significant technical challenges none of which are relevant when they are impeding a companies income. How hard can deleting data be?
IMO significant technical challenge = this impacts my perceived business model I don't want to play ball.
Man, you would think they would have learned when they required every website operator in the world to flash a sign about cookies that they don't know enough about the internet to govern it adequately. These people are intentionally ignorant, and it would be annoying, if it wasn't so adorable.
This signature intentionally left blank.
Jesus Christ!
It's called a Robinson list!
We use it in Pharma every other day! Just match the inferred info against that list and if the person being referred is in the list, you cannot use the info. Period.
Instead of trying to chase every other bit of information to delete it (ludicrous and utter impossible), use opt-in.
It's not so hard...