800 Megs of Data Per Person Last Year?
Ant writes "Growing net, computer and phone use is driving a huge rise in the amount of information people generate and use.
US researchers estimate that every year 800MB of information is produced for every person on the planet.
Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999.
Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years."
i bet half this increase is due to the number of slashdot reposts also increasing over the same timespan.
That's a whole lotta dupe right there...
...but didn't we see this the day before yesterday?
Given a choice between free speech and free beer, most people will take the beer.
Imagine the hoardes of sweaty basement-dwelling Linux Zealots furiously masturbating to their 800MB of porn! Its not a pretty thought.
because who wants to keep their personal thoughts in an outlook application...
mwahaha, flame away!
--------
Free your mind.
The article fails to address the issue of redundant information, so this number I'm sure is inflated. It raises an interesting question though, to what extent are we becoming more redundant in our data storage? Once we answer that, we also answer exactly how much new information is being generated per person, per year.
__________________________________________
Take comfort in your ignorance.
Grandmaster Plague
DUPE-idy-do
It's no shock since this message is data that I'm generating, my dentist appointment generates data, my email generates data, etc.
I wonder how much of that data is redundant?
:)
US researchers estimate that every year 800MB of information is produced for every person on the planet.
Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999.
Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years.
Data deluge
The researchers from the University of California, Berkeley, last carried out a study of how much information was being generated and where it was kept three years ago, based on data from 1999.
The most recent study has revealed that every year since then the amount of information generated has grown about 30%.
But these percentages belie the vast mountains of information involved.
Most new information is captured on computer hard disks
Study authors Prof Peter Lyman and colleagues found that in 2002 alone about five exabytes of new information was generated by the worlds print, film, magnetic and optical storage systems.
By comparison the US Library of Congress print collection, comprising 19 million books and 56 million manuscripts, equates to about 10 terabytes of information.
It would take 500,000 Libraries of Congress to equal five exabytes.
But even this figure is dwarfed by the gargantuan amount of information flowing through electronic channels such as the telephone, radio, television and internet.
Same old TV
In 2002 the study estimates that 18 exabytes of new information flowed through these channels. The vast majority of this, 98%, was in the form of person-to-person phone calls.
It also found that most of the information transmitted via radio and TV is not new information, the vast majority are repeats.
Of the 320 million hours of radio shows only 70 million hours are actually original shows. On TV only 31 million hours of the total 123 million hours of broadcast programmes count as new information.
Prof Lyman said he was surprised that paper was still proving popular as a storage medium but put its resilience down to the fact that a lot of the information generated on computer is printed out. He was also surprised by the amount of gay anal sex Rob Malda and Jeff Bates engaged in.
One area that is gradually losing out to digital media is film. Prof Lyman said the increasing popularity of digital cameras and cameras was driving people away from the older format.
In the years since the last study, the amount of images captured on film has declined by 9%.
The study also revealed an image of the average amount of time people spend with different sorts of media.
It showed that the average American adult spends 16.17 hours on the phone a month, listens to 90 hours of radio and watches 131 hours of TV. The 53% of the US population that uses the net spends more than 25 hours online a month at home and more than 74 hours on the net at work.
The researchers point out that this means we are accessing information media 46% of the time.
With data/harddrives so cheap now a days, most people don't even take notice to what they are filling up. I can only see this number growing since there is very little reason not to have data for people (since the data is so cheap)
but I'm no expert.
except that have of that is Slashdot dupes
Maybe ./ wants to creat 800 Meg of dupes per year..
Anyway, my personal information count was very low for a couple of years (amassing email, and 90% of that is my spamarchive), but once I started taking photos, I created about 8 Gig of information - in 6 months! (And that's with JPEG, I wonder what would happen to my HD if I shot in RAW..)
Cheers,
Tels
Let me reduce your 800 Mb/person data storage down to a handful of bytes:
Who cares.
The Slashdot editors are trying to achieve 900 megs of space per user by duplicating the article over and over.
:)
Then they can duplicate the 900 megs article.
...all warm and fuzzy inside doesn't it? All that data about you and yours? Think it'll get any better? Think again.
It amazes me how much people don't think about privacy anymore. How the concept of supermarket sales has given way to 'Bonus Cards' which track what you buy. Few understand how this information can be used to piece together a bigger picture.
Some Wachovia Bank branches are now requiring a FINGERPRINT before you can cash a check. The situation is this: If you are not a customer, you are now required to give them a an electronic finger scan to cash a check made out under Wachovia.
Where does it end? Should I just give them a hair sample now or wait until my implant is required?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
800 megabytes per person, and yet most people have 40+ gig hard drives... Is there something wrong here?
eclecti.cc
This is a duplicate story; see previous one here.
That doesn't sound like a lot. Slightly more than 1 CD, or about 12 hours on the phone every year.
And how much of that data was a duplicate?
An audio producer may lay down gigs of tracks for one song. In my research lab we burn a DVD almost full of new data each day. In the hospital we record more and more detailed information into our systems.
Sadly, an assload of this information is useless, useless, useless. I spend more time detailing information in the medical chart than I actually spend with the patient.
In my lab, more data is better... however, when it's just useless information to keep the shark lawyers off my back it's a bad thing.
Davak
http://developers.slashdot.org/article.pl?sid=03/1 0/29/1355259 : Info Glut - Five Exabytes of Data Created in 2002
Damn... Go read the first article from like frou days ago...
United States of America, good ol' backers of world peace.
How about we talk about something else?
How about them sporting events?
all of the duped slashdot articles I read each year?
It would take 500,000 Libraries of Congress to equal five exabytes. Since when does a 'Library of Congress' qualify as a unit of storage? Yes, i realize that they were trying to give a comparison, but it looked very odd to me.
The Braying and Neighing of Barnyard Animals Follows.
News for nerds, news that reapeats....
"WebTV: bringing the Internet into the shallow end of the gene pool since 1995" - Martin Bishop
Where do I subscribe again?
Do you even lift?
These aren't the 'roids you're looking for.
Big ass dupe!!!
Also, if If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
to find out how many mb of data people write on paper per year. I suppose you'd have to take a sample of about 10000 people, then enter all the things they write in a year into some handwriting recognition program. Of course it'd take less time than normal handwriting recognition because the program would only have to scan for the number of letters, rather than what the letters actually were.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
If it's intelligent reflected work ppl produce, I would think it's less. If it's blogging, grocery lists, Slashdot articles (yes, dupes make the number rise quite a lot probably), then that's probably a lot more than that.
...
Oh well, besides, I don't really know what that amounts to, the official Internet storage unit being the Library of Congress
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
As discussed in this previous thread - following an article ABOUT THE SAME SUBJECT - the new unit is the Great Pyramid.
Someone should program a calculator to convert all these units from one to another. Elephants to Great Pyramids, Great Pyramids to K-Marts, K-Marts to Libraries of Congress... Now that'd be innovation for ya !
United States of America, good ol' backers of world peace.
800 Megs? I grab that in pr0n every day!
this and the article before is just relevant to
... (repeat 1'000'000
people who don't have a "photografic memory".
i for one part haven't notice a increase AT ALL
(but maybe i'm just getting senial?)!
still amazing how much redudant information
one can create on the second level of the data
sphere, e.g. not the first level: (a,b,c,...1,2,3...)
just the same b#llshit on IRC jacks me off to
e.rage daily!
but i "blame" the no-future feeling and mass
human multiplication on these figures maybe if
they would spray some insecticied on those book
pages alla "name of the rose" they would last
longer. or maybe genetically modify paper/trees
to be able to resist mold/insects/? longer?
but waste is still still
times) the driving force of economy (and egos).
"so get lost! go repeat something!"
Ha, I bet no one else noticed that this article was a dupe! I'll ebt he first to make a witty comment on dupes accounting for half of all the data! ...oh, wait...
Hmmmm...
:)))
800 megs of data... Would that be the server logs of the FTP's which I get my stuff?
Hmmm. Well, that and probably the 20+megs of spam I get to have every month would make it easy to put it up to 800 megs/year...
Koos
Doesn't that seem ... I dunno, a low? Granted, I have hundreds of gigabytes of hard drive storage filled, but I didn't create any of it ... the movie studios, TV studios, and game studios did.
However, the stuff I do create are digital pictures. Lots of them. I take everything in 1600x1200 resolution, so each image is about 800KB, and my camera has a 128MB flash card. I fill it up quite often. I'd say I take on average 20 pics a day (which averages out to around 6 GB per year), and that's just in pictures!
What about save game files - do those count? And I also create text files, but those probably don't total over a few megabytes per year.
I don't shoot videos or record songs or anything - but yet I do enough data creation for 10 other people. I shudder to think how much information people who shoot digital videos are creating.
Cyde Weys Musings - Scrutinizing the inscrutable
i downloaded enough music for 31.43875 people...
it's not like we have to read it all. Most of it is as important as receipts for toilet paper (and production, shipping and marketing data for said ass-wipe).
... and if we look for patterns we'll see the forest and know what are the important bits. Plus we'll have the ability to search for individual trees instantaneously.
The medium is the message
world: USING LINUX since 1991!!
sco: SUING LINUX til 2011!!
No way, I get that much per movie.
come on now, don't you guys check?
;)
This is an obvious duplicate...
must be damn slow cuz there's no SCO news
If in generating the average they could discount the extremes.
Some of us go through a truly silly amount of data. There's a nontrivial number of people reading this discussion who exhaust their dorm's 1 GB bandwidth cap every day.
On the other hand there's somewhere a barefoot palestinian refugee child for whom not so much as a piece of paperwork was generated since he was born.
These two extremes would probably tend to distort things. It would be interesting to find out if the study was based on usage of storage data as it appears and these extremes were included in the study, or if they just (being Americans) couldn't be bothered when compiling their study to talk to geeks and starving african children. If the former, i'd be curious how their results would change if they could somehow just like chop off the ends of the bell curve.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
I thought the pr0n I was in amounted to WAY more than 800 MB last year...
For your security, this post has been encrypted with ROT-13, twice.
Please don't read my journal
You have to wonder how storage business data would take if you could compress powerpoint/word docs into ascii.
The stuff in this post was already discussed in Info Glut - Five Exabytes of Data Created in 2002
With information growning exponentialy, one must wonder if we're on the edge of the Singularity as anticipated by Vernor Vinge.
Shh.
Too bad the number of trees has NOT increased 43% in the past 3 years. At this rate we'll have a naked planet in no time.
Banjo - The more I know about Windoze, the more I love *nix
I'm pretty sure we'll have at least 800MB worth of "dupe article" posts by people who think such posts get funnier every time they do it! :\
Join the TWIT army now!
They keep talking about all this information, but data would probably be more accurate. My guess is the actual information created in 2002 amounts to less than 1 kb/person, and that's probably being optimistic amount humanity.
(My contribution to the glut: long-shrift.blogspot.com
This is the article a few days ago ...
Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
Only after the last tree has been cut down
Only after the last river has been poisoned
Only after the last fish has been caught
Only then will you find you cannot eat data
Do all the virus/worm generated mail that I get counted against my 800M since it's sent to me, or against the poor Microsoft user who didn't patch their machine in time?
And does eliminating spam/virus email make a noticable hit in the numbers, or is it not even counted?
"At this rate we'll have a naked planet in no time."
Planet porn.
Woo Hoo! Jupiter, take it off.
or
Child: Mom! The moon is mooning me. Make it stop.
or
Uranus: The first planetary shots are in, and they all look like the Goate.cx guy.
Pluto: Well blow me down!
Only after the last oil rig has been sunk
Only after the last supertanker has called port
Only after last gas station has closed
Only then will you find Greenpeace doesn't sell beer at night
Any sufficiently advanced libertarian utopia is indistinguishable from government.
They do much more than 800MB's.
;-)
Perhaps they are the academics of the future?
I'm gunna marry that genious I saw the other day.
fauxking data is required to disempower unprecedented evile?
truth is, despite terabytes of phonIE payper liesense MiSinformation being suppLIEd buy felonious corepirate nazi ?pr? ?firm? hypenosys peddling execrable, almost any moron can figure out which way the wwwind is bullowing, without much more than a just few lines of pateNTdead kode.
get ready to see the light.... both literally, & (meta)physically?
400 MB per person
The rest were dupes
"The vast majority of the global population neither owns computers nor uses the internet. (Check statistics.) So what, exactly, is the relevance of this estimate?"
The half with the Internet, and computers is collecting information on the other half.
Be afraid.
800 Megs of Data Per Person Last Year - is this the amount of surveillance data the Department of Homeland Security was generating per person ?
My pr0n collection is growing at a MUCH faster rate...
No, it's less. No duplicate is really a duplicate.
The first time around, the article gets a bunch of discussion.
The second time around, the article gets the same discussion, same arguments. Then it gets an equal amount of "dupe" talk.
X + 2X = 800. "Content" = 800/3.
(insane referece) That's like 17 volkswagon beetles.
...how much info is destroyed each year to offset these numbers. I mean shredded files, stuff thrown in trash, bills, deleted data files, discarded/lost storage media, etc... In the end (of each year), I wonder, what is the actual increase in stored information?
Interested in open source engine management for your Subaru?
Yes, and it wasn't true then either. Perhaps 800MB/person of noise was generated, but it contained almost no information. Just look at /. How much is information and how much is data and howmuch is just bytes?
That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 GB. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 GB (or more if you're using higher than 128kbps MP3s). And I'm not even commenting on pr0n.
Interested in open source engine management for your Subaru?
"Who cares though? As data expands, it gets cheaper and cheaper to store it..."
Fat farms.
Those numbers are soo far off as to be laughable.
There's no way the 4 billion people of the planet that don't have internet access, or that live in countries without computerized governments, have that amount of information stored about them. Most countries can't perform a census accurately.
The reality is that those people that belong to Eschelon-partner countries have gigabytes of data about them being stored everywhere, and people that live in less technically advanced countries can relish their anonymity, because no computer knows they exist.
While I used Pngcrush to squeek out the last few %, even the moderate compression offered by Photoshop was enough to make her happy for a week.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
This is just wrong!
I can _assure_ you I've gotten more than 800MB of pr0n a week for the past years.
This is FUD. Liars.
Thats a lot of porn...
"Under US law the cops don't need a warrant for anything you willfully disregard..."
I agree, too little too late, but I wasn't talking about law enforcement; I was talking about a corporation! Big difference, or at least I think so.
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
If you store the same image in two different resolutions, does the high-res version contain more information?
Even if you cannot see the difference?
The game of scrabble is a good illustration. Common letters (a,e,i,o,u,s) have 1 point. Uncommon letters (z,x) are 10 points. All letters have a different point score based on their frequency of use in the english language. (At least for the english version of the game. I know the scores on the letters are different for the German version at least.)
All of the modern compression algorythems work on this principle. They detect the parts of your "signal" that contain the least information, and convert them to a smaller form. Of course you have to know a bit about your signal before you can be good at all at predicting what is common or not.
LZW compression, for instance, is great at compressing text. Images OTOH LZW is not so good at. At least color images (GIF actually uses LZW.)
For full color images we use JPEG, which breaks the image into 8x8 tiles and then compares the tiles to the output of the inverse cosine transform. So instead of storing the actual RGB information it actually stores the coefficients of the transform needed to reconstruct the tile, and the varience of the original from the ideal.
MPEG uses a JPEG-like compression for key frames, and then simply stores what pixes change in between frames. Some implementations also attempt to compensate for motion, which is starting to get beyond what I can explain in the space provided.
Suffice to say information is the level of surprise inside a signal. It doesn't really matter what form of signal it is.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
The thing about information is that it's not quite so easy to count as the article suggests. If the question is solely one of how many magtapes to buy, sure the exabyte thing is interesting enough. But in a "human" sense, that's not all that interesting.
For example, the article cites 18 exabytes of what is basically analog data--sound and images--over telephone, radio, TV. It claims that 98% of that is in telephone calls, essentially all, in other words.
First thing is that most telephone calls are not recorded. Well, I dunno, maybe Carnivore and Eschelon are even worse than I think. So mostly this is just a question of how much bandwidth AT&T and MCI need to buy; I'm sure they care about that question, but most people have no reason to. Maybe how many tape drives the NSA needs to buy too.
Just how much information *IS* there in a telephone call though. At a certain level, ten million calls about the same snowstorm aren't really that information rich. But I understand that you want to hear YOUR sister complain about shoveling the snow, not somebody else's sister do so. But just at a technological level, how much is there to a phone call?
If I record the call as CD-Audio WAV format it comes to something like 9 MB a minute. But then, if I compress it to MP3, or Ogg Vorbis, or AAC, I'm down to something more like 1 MB a minute. In fact, if I go for a 56k bandwidth, or something along those lines, I can probably get it down to less than half a MB... and that's not really much different from what I could discern originally on my cell-phone on a noisy street, or over my old wiring in my house. So far, we've reduced the "information content" by 20 times by purely technial means. Then again, it's not clear if this is fair... in those cop shows where they reconstruct background noises to filter the gunshot or car crash in the background, they probably want the full original data... but do *I* care about that when I talk to my sister?
Moreover, audio compression is just the start. There's this old thing called TRANSCRIPTION that compresses quite a bit more. A stenographer (or maybe a computer program, at least at the NSA) can type up our conversation perfectly well. How much information is lost by reducing the "data" to:
Lulu's Sister: We got over 10" of snow, and it took me an hour to shovel it.
Even at the highest audio compression I can find, I need tens of kilobytes to encode this remark... as text we're down to a couple tens of BYTES. Maybe I've lost a little of Sis's inflection, but how much INFORMATION was there really, to start with? Some probably, but is it worth a thousand words? Moreover, I expect some lossy compression to reduce that text by at least another half.
Depending on just what you think is information, perhaps 300,000 times compression is possible. That brings exabytes down to gigabytes. Given some automated transcription technology, maybe I can store the whole last year of family chats on my local harddisk!
Buy Text Processing in Python
obviously they didnt count the software makers, or the people who encode movies and redistribute them (illegally, but still, it doesnt mean it doesnt happen)
I wonder how much of that report they left out..
I know 800 mb is an average, but it still seems a bit small for what most people do these days.
Look at how many more of them we need just to tell us how to store things on magnetic and optical media.
KFG
I mean, my thesis that I and a friend wrote, a full years work (half year x 2) was 2.2 megabyte, including front page, illustrations, graphs and tables. Yet when I digitize a short video clip from my video camera, it's literally gigabytes, at least until I compress it.
Unless you're a hard disk manufacturer, does its size have any relation whatsoever to its value? If the figure is correct, I can store all the data I create, over my entire life (say 100 years to make it simple) on a 80GB hdd.
Which is of course, complete rubbish. I could set up a DV cam (hey, even a webcam would do fine) and record everything that happens here, and I'd create Terrabytes of information per year. Why don't I? Because it'd have no value to me.
I'd rather wager that the reason the information "amount" increases is better tools for collection, not much else. Like getting a 6 Megapixel digital camera over your 2 Megapixel. Maybe that counts as "more" information, but is it really those extra pixels that have value, or the picture itself reminding you of the occasion?
Kjella
Live today, because you never know what tomorrow brings
Could a large part of this "growth" be file size bloating as opposed to more information being stored? An office XP *.doc file is 3 times as large as a *.txt file (In my quick test of a two page document).
Simple: I live in a university residence, and all my floormates have a Safeway card. Every few weeks or so we'll swap our cards just to screw with Safeway's market reserach. We all have some pretty different purchase habits, so I can imagine what kind of info the store gets from our cards :P
--Mike--
but I pirated 80000 MBs of data.
From an information theory perspective, if you make a copy of a perfectly compressed 100 MB file, the amount of information you have created is just a few bytes, the copy command and the paths of the files, and even then there's a lot of redundancy. I have a large portion of my 60 GB hard drive filled with oggs, but you could just as well describe them all by listing all the albums I've ripped. That would just take a couple k, and would also be highly redundant. I've probably created well over 100 GB of data this year, but I wouldn't say that much of it was new information.
WARNING: there is a trojan on your
...but the thesis was in English. Though it couldn't have been all that bad since we got an A (about 40% get an A, typically). And yes, I proofread that one a bit more, in fact the abstract I think has the lowest words/manhour ratio of anything I've ever written.
Kjella
Live today, because you never know what tomorrow brings
$5 / month hosted VPS on linux = awesome!
I mean 800mb can't really hold much porn.
pooped in the toilet. I make more data than that sleeping. call me when it's over half a terrabyte per person...
I've just had to archive the web logs for last month , 12GB. Come one you guys why do I have to do all the work!
Generating 800 Megs is easy. And it says nothing. The researchers better count the amount of 'information' or entropy in bits. But I guess that would be too hard, because a lot of data on my harddisk(s) is duplicated on other disks (for example OS's and programs) or has lots of overhead (think of the huge sizes of a Word document with nothing in it, or the recurring use of the brackets in a XML document).
-- The Internet is a too slow way of doing things, you'd never do without it.
Seems to me there's a lot of redundant information 'round.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
Seems to me there's a lot of redundant information 'round.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
-1 Redundant.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
... when you go to cash a check with your tin-foil beanie on?
I'm being facetious, of course. Or is it fascist? I do worry about it a little, what with the Patriot act, etc.
I wonder, have you tried putting some Elmers glue or something on your finger before letting it scan your fingerprint? I bet this could be a fun thing to mess with. I'd be willing to try a gelatin fingerprint transplant like we heard of for foiling finger scanners before. I'll give someone my fingerprint to cash a check at Wachovia. Would this be considered some type of fraud?
I love how commentaries on the volume of data generated\transferred on the 'net inevitably provide a handy reference to how many Libraries of Congress it would fill, or how many tonnes of paper it would take to print. Has anyone else noticed that libraries aren't full of pointless posts like this, journals nobody reads, and well, any of these? It's not information, damnit. Archaelogists 2000 years from now are not going to dig up some kid's Angelfire site looking for documentation of our race. (Heh, at least I hope not).
Hey Guys does anyone know a resource where u can see the estimated size of information of the internet? By the way Berkley research is quiet interresting, but wouldn't it be more interesting how the value of this information might be/is messured. Easy for everyone to create Information, but what about good information eh?