800 Megs of Data Per Person Last Year?

← Back to Stories (view on slashdot.org)

800 Megs of Data Per Person Last Year?

Posted by CmdrTaco on Sunday November 2, 2003 @05:12AM from the now-thats-a-lot-of-data dept.

Ant writes "Growing net, computer and phone use is driving a huge rise in the amount of information people generate and use. US researchers estimate that every year 800MB of information is produced for every person on the planet. Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999. Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years."

15 of 177 comments (clear)

umm..repost by Anonymous Coward · 2003-11-02 05:13 · Score: 4, Funny

i bet half this increase is due to the number of slashdot reposts also increasing over the same timespan.
Re:What about redundant information? by t0rnt0pieces · 2003-11-02 05:16 · Score: 5, Funny

The article fails to address the issue of redundant information

You mean stuff like this?

--
Karma: Excellent (In Soviet Russia, karma pimps YOU)
Re:What about redundant information? by FiloEleven · 2003-11-02 05:17 · Score: 3, Funny

Funny you should mention redundant information, as THIS IS A DUPE. Woo! 2.8% of the 800 megs per person average comes from Slashdot dupes alone.

--
Your brain is not a computer.
Makes you feel... by Chordonblue · 2003-11-02 05:17 · Score: 4, Interesting

...all warm and fuzzy inside doesn't it? All that data about you and yours? Think it'll get any better? Think again.

It amazes me how much people don't think about privacy anymore. How the concept of supermarket sales has given way to 'Bonus Cards' which track what you buy. Few understand how this information can be used to piece together a bigger picture.

Some Wachovia Bank branches are now requiring a FINGERPRINT before you can cash a check. The situation is this: If you are not a customer, you are now required to give them a an electronic finger scan to cash a check made out under Wachovia.

Where does it end? Should I just give them a hair sample now or wait until my implant is required?

--
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
1. Re:Makes you feel... by Anonymous Coward · 2003-11-02 05:53 · Score: 3, Funny
  
  Dude, lighten up and stop being paranoid.
  
  The NSA doesn't care which Mary Kate & Ashley video is your favorite.
2. Re:Makes you feel... by Chordonblue · 2003-11-02 05:58 · Score: 3, Insightful
  
  What is it about a 'slippery slope' that you don't understand?
  
  You're a great example of who I mean. No consideration at all...
  
  --
  "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
More and more data by Davak · 2003-11-02 05:18 · Score: 3, Interesting

An audio producer may lay down gigs of tracks for one song. In my research lab we burn a DVD almost full of new data each day. In the hospital we record more and more detailed information into our systems.

Sadly, an assload of this information is useless, useless, useless. I spend more time detailing information in the medical chart than I actually spend with the patient.

In my lab, more data is better... however, when it's just useless information to keep the shark lawyers off my back it's a bad thing.

Davak
1. Re:More and more data by Richard_at_work · 2003-11-02 05:25 · Score: 3, Funny
  
  How many Libraries of Congress is an "Assload"? Is the data easily retrievable?
2. Re:More and more data by Davak · 2003-11-02 05:39 · Score: 3, Funny
  
  Definition of assload
  
  Assload is a relative term... like "a lot"
  
  Normally one wouldn't ask "how many Libraries of Congress is a lot?"
  
  /end of my stupid point
It's really 400. by Eric_Cartman_South_P · 2003-11-02 05:22 · Score: 4, Funny

If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
Also, if If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
Re:What about redundant information? by Davak · 2003-11-02 05:22 · Score: 4, Interesting

Bravo.

Is a cracked version of some latest software package new data?

Honestly, only about 25k in the *.exe has been changed... but this would count as a doubling of information, hard drive space, whatever.

Likewise, when we chart medical information, we often duplicate the information from note to note to remind ourselves and others about the important aspects of the patient's history. Really...it's just data duplication.

Davak
Re:Only 800Mb per year by bersl2 · 2003-11-02 05:26 · Score: 5, Funny

Personally, I find that figure insulting.

I mean, I'm worth at least a gig...
Re:What about redundant information? by lanswitch · 2003-11-02 05:31 · Score: 3, Interesting

how much new information is being generated per person, per year
When a person rips a cd to mp3, does s/he create new data? And when s/he copies existing mp3's, is that new data? I think it's impossible to define the term "new information", unless you go with the strictest definition: before the information was digitally genarated, it did not exist in any other form. Only then it is new.
This would be more interesting by mcc · 2003-11-02 05:36 · Score: 5, Insightful

If in generating the average they could discount the extremes.

Some of us go through a truly silly amount of data. There's a nontrivial number of people reading this discussion who exhaust their dorm's 1 GB bandwidth cap every day.

On the other hand there's somewhere a barefoot palestinian refugee child for whom not so much as a piece of paperwork was generated since he was born.

These two extremes would probably tend to distort things. It would be interesting to find out if the study was based on usage of storage data as it appears and these extremes were included in the study, or if they just (being Americans) couldn't be bothered when compiling their study to talk to geeks and starving african children. If the former, i'd be curious how their results would change if they could somehow just like chop off the ends of the bell curve.

--
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Seems like significant overcount by Lulu+of+the+Lotus-Ea · 2003-11-02 07:21 · Score: 3, Interesting

The thing about information is that it's not quite so easy to count as the article suggests. If the question is solely one of how many magtapes to buy, sure the exabyte thing is interesting enough. But in a "human" sense, that's not all that interesting.

For example, the article cites 18 exabytes of what is basically analog data--sound and images--over telephone, radio, TV. It claims that 98% of that is in telephone calls, essentially all, in other words.

First thing is that most telephone calls are not recorded. Well, I dunno, maybe Carnivore and Eschelon are even worse than I think. So mostly this is just a question of how much bandwidth AT&T and MCI need to buy; I'm sure they care about that question, but most people have no reason to. Maybe how many tape drives the NSA needs to buy too.

Just how much information *IS* there in a telephone call though. At a certain level, ten million calls about the same snowstorm aren't really that information rich. But I understand that you want to hear YOUR sister complain about shoveling the snow, not somebody else's sister do so. But just at a technological level, how much is there to a phone call?

If I record the call as CD-Audio WAV format it comes to something like 9 MB a minute. But then, if I compress it to MP3, or Ogg Vorbis, or AAC, I'm down to something more like 1 MB a minute. In fact, if I go for a 56k bandwidth, or something along those lines, I can probably get it down to less than half a MB... and that's not really much different from what I could discern originally on my cell-phone on a noisy street, or over my old wiring in my house. So far, we've reduced the "information content" by 20 times by purely technial means. Then again, it's not clear if this is fair... in those cop shows where they reconstruct background noises to filter the gunshot or car crash in the background, they probably want the full original data... but do *I* care about that when I talk to my sister?

Moreover, audio compression is just the start. There's this old thing called TRANSCRIPTION that compresses quite a bit more. A stenographer (or maybe a computer program, at least at the NSA) can type up our conversation perfectly well. How much information is lost by reducing the "data" to:

Lulu's Sister: We got over 10" of snow, and it took me an hour to shovel it.

Even at the highest audio compression I can find, I need tens of kilobytes to encode this remark... as text we're down to a couple tens of BYTES. Maybe I've lost a little of Sis's inflection, but how much INFORMATION was there really, to start with? Some probably, but is it worth a thousand words? Moreover, I expect some lossy compression to reduce that text by at least another half.

Depending on just what you think is information, perhaps 300,000 times compression is possible. That brings exabytes down to gigabytes. Given some automated transcription technology, maybe I can store the whole last year of family chats on my local harddisk!

--
Buy Text Processing in Python