Amazon Launches Public Data Sets To Spur Research
turnkeylinux writes "Amazon just launched its Public Data Sets service (home). The project encourages developers, researchers, universities, and businesses to upload large (non-confidential) data sets to Amazon — things like census data, genomes, etc. — and then let others integrate that data into their own AWS applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various US Census databases, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl."
Now I have somewhere I can store the index of my massive porn collection. Thanks, Amazon!
This is true. But the easier it is to obtain datasets like these, the easier it is for anyone to do data mining and correlate the public (presumably non-identified) datasets with any private data they do happen to have.
This just looks like a way to sell there cloud computing services. They provide the free data and you provide the monthly service fee.
The US Census Bureau charges to access much of their datasets.
If you want news from today, you have to come back tomorrow.
Note that on Amazon's website they say that you can only access the data if you're paying them to crunch numbers on their cloud computers.
That is, you can't just download the data off their sites, which would be the nice thing to do.
As such, this article is nothing more than a slashvertizement.
Expect a new slew of Amazon patents...
"1-Sick" -- Health Data
"1-Mick" -- Irish Census Data
"1-Dick" -- Porn Movies Database
"1-Lick" -- Lesbian Porn Movies Database
"1-Fick" -- German Porn Movie Database
"1-Hick" -- The George W. Bush Presidential Library catalog.
"1-Kick" -- Pharmaceutical Index
"1-Nick" -- Crime Data
"1-Prick" -- Copyright Law Legal database
"1-Trick" -- List of iKea-nu Reeves Movies.
"1-Tick" -- Camping Places Data set.
"1-Brick" -- The Lego Catalog.
"1-Thick" -- Obesity Index.
One more step to a non private world CHECK
Privacy, as we have experienced in the last hundred years, is on its way out anyway. The sheer volume, immortality, and interconnection of, even publicly available, datasets inadvertently reveal information most of us would rather keep private. Much like how most people don't have a problem with beat cops regularly patrolling an area, but feel threatened by cameras monitoring, recording, analyzing, and storing information about the same public area.
That said, its here to stay. The data's here as long as we use credit cards for most purchases, use I-Pass(or similar) toll paying systems, carry GPS enabled cell phones, and expect the police to protect us from 100% of terrorist and criminal bogeymen. We might as well get some private research done, rather than leave it all to the government and big business.
... and that's when the C.H.U.D.'s came at me.
The less privacy we have, the less we have to worry about our privacy. That sounds flip, and along the lines of "if you have nothing to hide..." but it isn't.
We want privacy primarly due to shame.
We have shame because we wear masks almost 100% of the time.
We wear masks don't want people to realize who we 'really are' either mentally or phyically.
We don't want people to really know us because we have been convinced to hold ourselves to standards that no one actually meets.
We hold ourselves to these standards because everyone else is wearing masks and while we can tell ourselves that 'they are just like us', it's hard to grasp that cognatively without actual proof.
If there were no privacy, no one could wear a mask. If no one were wearing a mask, we would realize that the standards we hold ourselves to are unrealistic. If we realize the standards we hold ourselves to are unrealisitic, we are freed from shame. If we are freed from shame, we no longer find privacy necessary.
We (or at least some of us) also want privacy to prevent annoyances and for protection.
I certainly don't want to have to answer to the government anytime I say the word "bomb" or "terrorist" on the telephone, in email, or in an IM.
I also don't want some company complaining anytime they see me buy a product from one of their competitors.
I also don't want to have everyone on the internet knowing my social security number, address, license plate number, or telephone number.
That isn't because of "shame" that's because people can be assholes, and some people will abuse information. I don't care if people that I trust know these things, but I don't think shame or masks or whatever has anything to do with getting one's identity stolen, or having the government ensure you don't say anything bad about them.
That said, I don't think this public dataset business really affects individual privacy. This is more a database of already public, but hard to find, data, that doesn't contain personally identifiable anything in it.
Let's just hope they keep it that way.
If the uploaded data is not available for download, but is only available to AWS applications running on Amazon's (paid for) compute service, then Amazon deserves nothing but contempt and an "Up yours" for this.
It seems that working for a living is out of fashion at Amazon. They expect people to supply them with resources so that they can charge them and others for their use. It's creative business bullshit, and not even remotely funny.
Amazon, how about you PAY BACK for the privilege of having the datasets uploaded to you by hosting them freely for the Internet community, and only on the back of that you charge for local, higher-speed access by AWS applications? Or would that be too "fair" for an Amazon business practice?
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
Which is very different from a large society in which some people know everybody else's business.
Even if this stuff is public, the time and money and knowledge necessary to use it will not be evenly distributed.
Information has never been evenly distributed. In small communities it was the neighborhood gossip, the corner pharmacist, the village priest, or the county sheriff who knew everybody's business. The replacement of social capital with monetary capital is the only difference.
Those small communities had, however, a fast-acting, closely monitored feedback system. If someone abused their position of power and trust, it was caught quickly and it was easy to remove them from the loop. A similar system is needed now, only on a national, or worldwide scale. I think the only way to accomplish this, without going back to a pre-computer society, is to make sure that as much information about the watchers is as publicly accessible as possible. Hopefully, the same spirit that makes the OSS community so vibrant and quick to act will transfer to this new domain.
... and that's when the C.H.U.D.'s came at me.