800 Megs of Data Per Person Last Year?
Ant writes "Growing net, computer and phone use is driving a huge rise in the amount of information people generate and use.
US researchers estimate that every year 800MB of information is produced for every person on the planet.
Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999.
Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years."
i bet half this increase is due to the number of slashdot reposts also increasing over the same timespan.
The article fails to address the issue of redundant information
You mean stuff like this?
Karma: Excellent (In Soviet Russia, karma pimps YOU)
Funny you should mention redundant information, as THIS IS A DUPE. Woo! 2.8% of the 800 megs per person average comes from Slashdot dupes alone.
Your brain is not a computer.
...all warm and fuzzy inside doesn't it? All that data about you and yours? Think it'll get any better? Think again.
It amazes me how much people don't think about privacy anymore. How the concept of supermarket sales has given way to 'Bonus Cards' which track what you buy. Few understand how this information can be used to piece together a bigger picture.
Some Wachovia Bank branches are now requiring a FINGERPRINT before you can cash a check. The situation is this: If you are not a customer, you are now required to give them a an electronic finger scan to cash a check made out under Wachovia.
Where does it end? Should I just give them a hair sample now or wait until my implant is required?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
An audio producer may lay down gigs of tracks for one song. In my research lab we burn a DVD almost full of new data each day. In the hospital we record more and more detailed information into our systems.
Sadly, an assload of this information is useless, useless, useless. I spend more time detailing information in the medical chart than I actually spend with the patient.
In my lab, more data is better... however, when it's just useless information to keep the shark lawyers off my back it's a bad thing.
Davak
Also, if If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
Bravo.
Is a cracked version of some latest software package new data?
Honestly, only about 25k in the *.exe has been changed... but this would count as a doubling of information, hard drive space, whatever.
Likewise, when we chart medical information, we often duplicate the information from note to note to remind ourselves and others about the important aspects of the patient's history. Really...it's just data duplication.
Davak
Personally, I find that figure insulting.
I mean, I'm worth at least a gig...
When a person rips a cd to mp3, does s/he create new data? And when s/he copies existing mp3's, is that new data? I think it's impossible to define the term "new information", unless you go with the strictest definition: before the information was digitally genarated, it did not exist in any other form. Only then it is new.
If in generating the average they could discount the extremes.
Some of us go through a truly silly amount of data. There's a nontrivial number of people reading this discussion who exhaust their dorm's 1 GB bandwidth cap every day.
On the other hand there's somewhere a barefoot palestinian refugee child for whom not so much as a piece of paperwork was generated since he was born.
These two extremes would probably tend to distort things. It would be interesting to find out if the study was based on usage of storage data as it appears and these extremes were included in the study, or if they just (being Americans) couldn't be bothered when compiling their study to talk to geeks and starving african children. If the former, i'd be curious how their results would change if they could somehow just like chop off the ends of the bell curve.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
The thing about information is that it's not quite so easy to count as the article suggests. If the question is solely one of how many magtapes to buy, sure the exabyte thing is interesting enough. But in a "human" sense, that's not all that interesting.
For example, the article cites 18 exabytes of what is basically analog data--sound and images--over telephone, radio, TV. It claims that 98% of that is in telephone calls, essentially all, in other words.
First thing is that most telephone calls are not recorded. Well, I dunno, maybe Carnivore and Eschelon are even worse than I think. So mostly this is just a question of how much bandwidth AT&T and MCI need to buy; I'm sure they care about that question, but most people have no reason to. Maybe how many tape drives the NSA needs to buy too.
Just how much information *IS* there in a telephone call though. At a certain level, ten million calls about the same snowstorm aren't really that information rich. But I understand that you want to hear YOUR sister complain about shoveling the snow, not somebody else's sister do so. But just at a technological level, how much is there to a phone call?
If I record the call as CD-Audio WAV format it comes to something like 9 MB a minute. But then, if I compress it to MP3, or Ogg Vorbis, or AAC, I'm down to something more like 1 MB a minute. In fact, if I go for a 56k bandwidth, or something along those lines, I can probably get it down to less than half a MB... and that's not really much different from what I could discern originally on my cell-phone on a noisy street, or over my old wiring in my house. So far, we've reduced the "information content" by 20 times by purely technial means. Then again, it's not clear if this is fair... in those cop shows where they reconstruct background noises to filter the gunshot or car crash in the background, they probably want the full original data... but do *I* care about that when I talk to my sister?
Moreover, audio compression is just the start. There's this old thing called TRANSCRIPTION that compresses quite a bit more. A stenographer (or maybe a computer program, at least at the NSA) can type up our conversation perfectly well. How much information is lost by reducing the "data" to:
Lulu's Sister: We got over 10" of snow, and it took me an hour to shovel it.
Even at the highest audio compression I can find, I need tens of kilobytes to encode this remark... as text we're down to a couple tens of BYTES. Maybe I've lost a little of Sis's inflection, but how much INFORMATION was there really, to start with? Some probably, but is it worth a thousand words? Moreover, I expect some lossy compression to reduce that text by at least another half.
Depending on just what you think is information, perhaps 300,000 times compression is possible. That brings exabytes down to gigabytes. Given some automated transcription technology, maybe I can store the whole last year of family chats on my local harddisk!
Buy Text Processing in Python