800 Megs of Data Per Person Last Year?
Ant writes "Growing net, computer and phone use is driving a huge rise in the amount of information people generate and use.
US researchers estimate that every year 800MB of information is produced for every person on the planet.
Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999.
Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years."
i bet half this increase is due to the number of slashdot reposts also increasing over the same timespan.
The article fails to address the issue of redundant information
You mean stuff like this?
Karma: Excellent (In Soviet Russia, karma pimps YOU)
Funny you should mention redundant information, as THIS IS A DUPE. Woo! 2.8% of the 800 megs per person average comes from Slashdot dupes alone.
Your brain is not a computer.
...all warm and fuzzy inside doesn't it? All that data about you and yours? Think it'll get any better? Think again.
It amazes me how much people don't think about privacy anymore. How the concept of supermarket sales has given way to 'Bonus Cards' which track what you buy. Few understand how this information can be used to piece together a bigger picture.
Some Wachovia Bank branches are now requiring a FINGERPRINT before you can cash a check. The situation is this: If you are not a customer, you are now required to give them a an electronic finger scan to cash a check made out under Wachovia.
Where does it end? Should I just give them a hair sample now or wait until my implant is required?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
An audio producer may lay down gigs of tracks for one song. In my research lab we burn a DVD almost full of new data each day. In the hospital we record more and more detailed information into our systems.
Sadly, an assload of this information is useless, useless, useless. I spend more time detailing information in the medical chart than I actually spend with the patient.
In my lab, more data is better... however, when it's just useless information to keep the shark lawyers off my back it's a bad thing.
Davak
Damn... Go read the first article from like frou days ago...
United States of America, good ol' backers of world peace.
Also, if If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
Bravo.
Is a cracked version of some latest software package new data?
Honestly, only about 25k in the *.exe has been changed... but this would count as a doubling of information, hard drive space, whatever.
Likewise, when we chart medical information, we often duplicate the information from note to note to remind ourselves and others about the important aspects of the patient's history. Really...it's just data duplication.
Davak
Are you an idiot, or just trying to be funny?
Ethopian refugees are counted in the "total number of people in the world", yet they probably don't own a hard drive.
Personally, I find that figure insulting.
I mean, I'm worth at least a gig...
As discussed in this previous thread - following an article ABOUT THE SAME SUBJECT - the new unit is the Great Pyramid.
Someone should program a calculator to convert all these units from one to another. Elephants to Great Pyramids, Great Pyramids to K-Marts, K-Marts to Libraries of Congress... Now that'd be innovation for ya !
United States of America, good ol' backers of world peace.
When a person rips a cd to mp3, does s/he create new data? And when s/he copies existing mp3's, is that new data? I think it's impossible to define the term "new information", unless you go with the strictest definition: before the information was digitally genarated, it did not exist in any other form. Only then it is new.
it's not like we have to read it all. Most of it is as important as receipts for toilet paper (and production, shipping and marketing data for said ass-wipe).
... and if we look for patterns we'll see the forest and know what are the important bits. Plus we'll have the ability to search for individual trees instantaneously.
The medium is the message
world: USING LINUX since 1991!!
sco: SUING LINUX til 2011!!
If in generating the average they could discount the extremes.
Some of us go through a truly silly amount of data. There's a nontrivial number of people reading this discussion who exhaust their dorm's 1 GB bandwidth cap every day.
On the other hand there's somewhere a barefoot palestinian refugee child for whom not so much as a piece of paperwork was generated since he was born.
These two extremes would probably tend to distort things. It would be interesting to find out if the study was based on usage of storage data as it appears and these extremes were included in the study, or if they just (being Americans) couldn't be bothered when compiling their study to talk to geeks and starving african children. If the former, i'd be curious how their results would change if they could somehow just like chop off the ends of the bell curve.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
I thought the pr0n I was in amounted to WAY more than 800 MB last year...
For your security, this post has been encrypted with ROT-13, twice.
With information growning exponentialy, one must wonder if we're on the edge of the Singularity as anticipated by Vernor Vinge.
Shh.
This is the article a few days ago ...
Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
Thats a lot of porn...
The game of scrabble is a good illustration. Common letters (a,e,i,o,u,s) have 1 point. Uncommon letters (z,x) are 10 points. All letters have a different point score based on their frequency of use in the english language. (At least for the english version of the game. I know the scores on the letters are different for the German version at least.)
All of the modern compression algorythems work on this principle. They detect the parts of your "signal" that contain the least information, and convert them to a smaller form. Of course you have to know a bit about your signal before you can be good at all at predicting what is common or not.
LZW compression, for instance, is great at compressing text. Images OTOH LZW is not so good at. At least color images (GIF actually uses LZW.)
For full color images we use JPEG, which breaks the image into 8x8 tiles and then compares the tiles to the output of the inverse cosine transform. So instead of storing the actual RGB information it actually stores the coefficients of the transform needed to reconstruct the tile, and the varience of the original from the ideal.
MPEG uses a JPEG-like compression for key frames, and then simply stores what pixes change in between frames. Some implementations also attempt to compensate for motion, which is starting to get beyond what I can explain in the space provided.
Suffice to say information is the level of surprise inside a signal. It doesn't really matter what form of signal it is.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Images with complex shapes compress terribly. I was out at a botanical garden trying to photograph the ends of tree branches as they fork off into millions of buds. It looks crappy in JPEG form.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
The thing about information is that it's not quite so easy to count as the article suggests. If the question is solely one of how many magtapes to buy, sure the exabyte thing is interesting enough. But in a "human" sense, that's not all that interesting.
For example, the article cites 18 exabytes of what is basically analog data--sound and images--over telephone, radio, TV. It claims that 98% of that is in telephone calls, essentially all, in other words.
First thing is that most telephone calls are not recorded. Well, I dunno, maybe Carnivore and Eschelon are even worse than I think. So mostly this is just a question of how much bandwidth AT&T and MCI need to buy; I'm sure they care about that question, but most people have no reason to. Maybe how many tape drives the NSA needs to buy too.
Just how much information *IS* there in a telephone call though. At a certain level, ten million calls about the same snowstorm aren't really that information rich. But I understand that you want to hear YOUR sister complain about shoveling the snow, not somebody else's sister do so. But just at a technological level, how much is there to a phone call?
If I record the call as CD-Audio WAV format it comes to something like 9 MB a minute. But then, if I compress it to MP3, or Ogg Vorbis, or AAC, I'm down to something more like 1 MB a minute. In fact, if I go for a 56k bandwidth, or something along those lines, I can probably get it down to less than half a MB... and that's not really much different from what I could discern originally on my cell-phone on a noisy street, or over my old wiring in my house. So far, we've reduced the "information content" by 20 times by purely technial means. Then again, it's not clear if this is fair... in those cop shows where they reconstruct background noises to filter the gunshot or car crash in the background, they probably want the full original data... but do *I* care about that when I talk to my sister?
Moreover, audio compression is just the start. There's this old thing called TRANSCRIPTION that compresses quite a bit more. A stenographer (or maybe a computer program, at least at the NSA) can type up our conversation perfectly well. How much information is lost by reducing the "data" to:
Lulu's Sister: We got over 10" of snow, and it took me an hour to shovel it.
Even at the highest audio compression I can find, I need tens of kilobytes to encode this remark... as text we're down to a couple tens of BYTES. Maybe I've lost a little of Sis's inflection, but how much INFORMATION was there really, to start with? Some probably, but is it worth a thousand words? Moreover, I expect some lossy compression to reduce that text by at least another half.
Depending on just what you think is information, perhaps 300,000 times compression is possible. That brings exabytes down to gigabytes. Given some automated transcription technology, maybe I can store the whole last year of family chats on my local harddisk!
Buy Text Processing in Python
but I pirated 80000 MBs of data.