800 Megs of Data Per Person Last Year?
Ant writes "Growing net, computer and phone use is driving a huge rise in the amount of information people generate and use.
US researchers estimate that every year 800MB of information is produced for every person on the planet.
Their study found that information stored on paper, film, magnetic and optical disks has doubled since 1999.
Paper is still proving popular though. The amount of information stored in books, journals and other documents has grown 43% in three years."
i bet half this increase is due to the number of slashdot reposts also increasing over the same timespan.
...but didn't we see this the day before yesterday?
Given a choice between free speech and free beer, most people will take the beer.
The article fails to address the issue of redundant information, so this number I'm sure is inflated. It raises an interesting question though, to what extent are we becoming more redundant in our data storage? Once we answer that, we also answer exactly how much new information is being generated per person, per year.
__________________________________________
Take comfort in your ignorance.
Grandmaster Plague
It's no shock since this message is data that I'm generating, my dentist appointment generates data, my email generates data, etc.
With data/harddrives so cheap now a days, most people don't even take notice to what they are filling up. I can only see this number growing since there is very little reason not to have data for people (since the data is so cheap)
but I'm no expert.
The Slashdot editors are trying to achieve 900 megs of space per user by duplicating the article over and over.
:)
Then they can duplicate the 900 megs article.
...all warm and fuzzy inside doesn't it? All that data about you and yours? Think it'll get any better? Think again.
It amazes me how much people don't think about privacy anymore. How the concept of supermarket sales has given way to 'Bonus Cards' which track what you buy. Few understand how this information can be used to piece together a bigger picture.
Some Wachovia Bank branches are now requiring a FINGERPRINT before you can cash a check. The situation is this: If you are not a customer, you are now required to give them a an electronic finger scan to cash a check made out under Wachovia.
Where does it end? Should I just give them a hair sample now or wait until my implant is required?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
800 megabytes per person, and yet most people have 40+ gig hard drives... Is there something wrong here?
eclecti.cc
That doesn't sound like a lot. Slightly more than 1 CD, or about 12 hours on the phone every year.
And how much of that data was a duplicate?
Too much coffee this morning?:P
An audio producer may lay down gigs of tracks for one song. In my research lab we burn a DVD almost full of new data each day. In the hospital we record more and more detailed information into our systems.
Sadly, an assload of this information is useless, useless, useless. I spend more time detailing information in the medical chart than I actually spend with the patient.
In my lab, more data is better... however, when it's just useless information to keep the shark lawyers off my back it's a bad thing.
Davak
Damn... Go read the first article from like frou days ago...
United States of America, good ol' backers of world peace.
How about we talk about something else?
How about them sporting events?
It would take 500,000 Libraries of Congress to equal five exabytes. Since when does a 'Library of Congress' qualify as a unit of storage? Yes, i realize that they were trying to give a comparison, but it looked very odd to me.
The Braying and Neighing of Barnyard Animals Follows.
News for nerds, news that reapeats....
"WebTV: bringing the Internet into the shallow end of the gene pool since 1995" - Martin Bishop
Where do I subscribe again?
Do you even lift?
These aren't the 'roids you're looking for.
Also, if If everything is posted twice, like on slashdot, and like THIS story, that 800 MB is really 400 MB.
to find out how many mb of data people write on paper per year. I suppose you'd have to take a sample of about 10000 people, then enter all the things they write in a year into some handwriting recognition program. Of course it'd take less time than normal handwriting recognition because the program would only have to scan for the number of letters, rather than what the letters actually were.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
If it's intelligent reflected work ppl produce, I would think it's less. If it's blogging, grocery lists, Slashdot articles (yes, dupes make the number rise quite a lot probably), then that's probably a lot more than that.
...
Oh well, besides, I don't really know what that amounts to, the official Internet storage unit being the Library of Congress
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
The parent post is an obvious example of the cause of the information explosion. :)
Who cares though? As data expands, it gets cheaper and cheaper to store it...
Just woke up, still in bed actually - not all that ergonomic... :)
3 /1 0/29/1355259&mode=thread&tid=137&tid=188&tid=1 98
25mg of dexedrine is a bit better then coffee
Here is the link if someone hasn't posted it already:
http://developers.slashdot.org/article.pl?sid=0
As discussed in this previous thread - following an article ABOUT THE SAME SUBJECT - the new unit is the Great Pyramid.
Someone should program a calculator to convert all these units from one to another. Elephants to Great Pyramids, Great Pyramids to K-Marts, K-Marts to Libraries of Congress... Now that'd be innovation for ya !
United States of America, good ol' backers of world peace.
800 Megs? I grab that in pr0n every day!
Ha, I bet no one else noticed that this article was a dupe! I'll ebt he first to make a witty comment on dupes accounting for half of all the data! ...oh, wait...
Hmmmm...
:)))
800 megs of data... Would that be the server logs of the FTP's which I get my stuff?
Hmmm. Well, that and probably the 20+megs of spam I get to have every month would make it easy to put it up to 800 megs/year...
Koos
Doesn't that seem ... I dunno, a low? Granted, I have hundreds of gigabytes of hard drive storage filled, but I didn't create any of it ... the movie studios, TV studios, and game studios did.
However, the stuff I do create are digital pictures. Lots of them. I take everything in 1600x1200 resolution, so each image is about 800KB, and my camera has a 128MB flash card. I fill it up quite often. I'd say I take on average 20 pics a day (which averages out to around 6 GB per year), and that's just in pictures!
What about save game files - do those count? And I also create text files, but those probably don't total over a few megabytes per year.
I don't shoot videos or record songs or anything - but yet I do enough data creation for 10 other people. I shudder to think how much information people who shoot digital videos are creating.
Cyde Weys Musings - Scrutinizing the inscrutable
i downloaded enough music for 31.43875 people...
it's not like we have to read it all. Most of it is as important as receipts for toilet paper (and production, shipping and marketing data for said ass-wipe).
... and if we look for patterns we'll see the forest and know what are the important bits. Plus we'll have the ability to search for individual trees instantaneously.
The medium is the message
world: USING LINUX since 1991!!
sco: SUING LINUX til 2011!!
No way, I get that much per movie.
come on now, don't you guys check?
;)
This is an obvious duplicate...
must be damn slow cuz there's no SCO news
If in generating the average they could discount the extremes.
Some of us go through a truly silly amount of data. There's a nontrivial number of people reading this discussion who exhaust their dorm's 1 GB bandwidth cap every day.
On the other hand there's somewhere a barefoot palestinian refugee child for whom not so much as a piece of paperwork was generated since he was born.
These two extremes would probably tend to distort things. It would be interesting to find out if the study was based on usage of storage data as it appears and these extremes were included in the study, or if they just (being Americans) couldn't be bothered when compiling their study to talk to geeks and starving african children. If the former, i'd be curious how their results would change if they could somehow just like chop off the ends of the bell curve.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
I thought the pr0n I was in amounted to WAY more than 800 MB last year...
For your security, this post has been encrypted with ROT-13, twice.
Please don't read my journal
You have to wonder how storage business data would take if you could compress powerpoint/word docs into ascii.
The stuff in this post was already discussed in Info Glut - Five Exabytes of Data Created in 2002
Please enlighten me as to what your sig means, I tried decoding it myself but couldn't, and don't want to try all the decryption methods I know :)
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
With information growning exponentialy, one must wonder if we're on the edge of the Singularity as anticipated by Vernor Vinge.
Shh.
Too bad the number of trees has NOT increased 43% in the past 3 years. At this rate we'll have a naked planet in no time.
Banjo - The more I know about Windoze, the more I love *nix
I'm pretty sure we'll have at least 800MB worth of "dupe article" posts by people who think such posts get funnier every time they do it! :\
Join the TWIT army now!
They keep talking about all this information, but data would probably be more accurate. My guess is the actual information created in 2002 amounts to less than 1 kb/person, and that's probably being optimistic amount humanity.
(My contribution to the glut: long-shrift.blogspot.com
This is the article a few days ago ...
Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
Only after the last tree has been cut down
Only after the last river has been poisoned
Only after the last fish has been caught
Only then will you find you cannot eat data
Do all the virus/worm generated mail that I get counted against my 800M since it's sent to me, or against the poor Microsoft user who didn't patch their machine in time?
And does eliminating spam/virus email make a noticable hit in the numbers, or is it not even counted?
Only after the last oil rig has been sunk
Only after the last supertanker has called port
Only after last gas station has closed
Only then will you find Greenpeace doesn't sell beer at night
Any sufficiently advanced libertarian utopia is indistinguishable from government.
They do much more than 800MB's.
;-)
Perhaps they are the academics of the future?
I'm gunna marry that genious I saw the other day.
800 Megs of Data Per Person Last Year - is this the amount of surveillance data the Department of Homeland Security was generating per person ?
No, it's less. No duplicate is really a duplicate.
The first time around, the article gets a bunch of discussion.
The second time around, the article gets the same discussion, same arguments. Then it gets an equal amount of "dupe" talk.
X + 2X = 800. "Content" = 800/3.
(insane referece) That's like 17 volkswagon beetles.
...how much info is destroyed each year to offset these numbers. I mean shredded files, stuff thrown in trash, bills, deleted data files, discarded/lost storage media, etc... In the end (of each year), I wonder, what is the actual increase in stored information?
Interested in open source engine management for your Subaru?
That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 GB. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 GB (or more if you're using higher than 128kbps MP3s). And I'm not even commenting on pr0n.
Interested in open source engine management for your Subaru?
While I used Pngcrush to squeek out the last few %, even the moderate compression offered by Photoshop was enough to make her happy for a week.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Thats a lot of porn...
"Under US law the cops don't need a warrant for anything you willfully disregard..."
I agree, too little too late, but I wasn't talking about law enforcement; I was talking about a corporation! Big difference, or at least I think so.
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
There are some exhibitionists out there, I'm sure.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
If you store the same image in two different resolutions, does the high-res version contain more information?
Even if you cannot see the difference?
The game of scrabble is a good illustration. Common letters (a,e,i,o,u,s) have 1 point. Uncommon letters (z,x) are 10 points. All letters have a different point score based on their frequency of use in the english language. (At least for the english version of the game. I know the scores on the letters are different for the German version at least.)
All of the modern compression algorythems work on this principle. They detect the parts of your "signal" that contain the least information, and convert them to a smaller form. Of course you have to know a bit about your signal before you can be good at all at predicting what is common or not.
LZW compression, for instance, is great at compressing text. Images OTOH LZW is not so good at. At least color images (GIF actually uses LZW.)
For full color images we use JPEG, which breaks the image into 8x8 tiles and then compares the tiles to the output of the inverse cosine transform. So instead of storing the actual RGB information it actually stores the coefficients of the transform needed to reconstruct the tile, and the varience of the original from the ideal.
MPEG uses a JPEG-like compression for key frames, and then simply stores what pixes change in between frames. Some implementations also attempt to compensate for motion, which is starting to get beyond what I can explain in the space provided.
Suffice to say information is the level of surprise inside a signal. It doesn't really matter what form of signal it is.
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
The thing about information is that it's not quite so easy to count as the article suggests. If the question is solely one of how many magtapes to buy, sure the exabyte thing is interesting enough. But in a "human" sense, that's not all that interesting.
For example, the article cites 18 exabytes of what is basically analog data--sound and images--over telephone, radio, TV. It claims that 98% of that is in telephone calls, essentially all, in other words.
First thing is that most telephone calls are not recorded. Well, I dunno, maybe Carnivore and Eschelon are even worse than I think. So mostly this is just a question of how much bandwidth AT&T and MCI need to buy; I'm sure they care about that question, but most people have no reason to. Maybe how many tape drives the NSA needs to buy too.
Just how much information *IS* there in a telephone call though. At a certain level, ten million calls about the same snowstorm aren't really that information rich. But I understand that you want to hear YOUR sister complain about shoveling the snow, not somebody else's sister do so. But just at a technological level, how much is there to a phone call?
If I record the call as CD-Audio WAV format it comes to something like 9 MB a minute. But then, if I compress it to MP3, or Ogg Vorbis, or AAC, I'm down to something more like 1 MB a minute. In fact, if I go for a 56k bandwidth, or something along those lines, I can probably get it down to less than half a MB... and that's not really much different from what I could discern originally on my cell-phone on a noisy street, or over my old wiring in my house. So far, we've reduced the "information content" by 20 times by purely technial means. Then again, it's not clear if this is fair... in those cop shows where they reconstruct background noises to filter the gunshot or car crash in the background, they probably want the full original data... but do *I* care about that when I talk to my sister?
Moreover, audio compression is just the start. There's this old thing called TRANSCRIPTION that compresses quite a bit more. A stenographer (or maybe a computer program, at least at the NSA) can type up our conversation perfectly well. How much information is lost by reducing the "data" to:
Lulu's Sister: We got over 10" of snow, and it took me an hour to shovel it.
Even at the highest audio compression I can find, I need tens of kilobytes to encode this remark... as text we're down to a couple tens of BYTES. Maybe I've lost a little of Sis's inflection, but how much INFORMATION was there really, to start with? Some probably, but is it worth a thousand words? Moreover, I expect some lossy compression to reduce that text by at least another half.
Depending on just what you think is information, perhaps 300,000 times compression is possible. That brings exabytes down to gigabytes. Given some automated transcription technology, maybe I can store the whole last year of family chats on my local harddisk!
Buy Text Processing in Python
obviously they didnt count the software makers, or the people who encode movies and redistribute them (illegally, but still, it doesnt mean it doesnt happen)
I wonder how much of that report they left out..
I know 800 mb is an average, but it still seems a bit small for what most people do these days.
Look at how many more of them we need just to tell us how to store things on magnetic and optical media.
KFG
I mean, my thesis that I and a friend wrote, a full years work (half year x 2) was 2.2 megabyte, including front page, illustrations, graphs and tables. Yet when I digitize a short video clip from my video camera, it's literally gigabytes, at least until I compress it.
Unless you're a hard disk manufacturer, does its size have any relation whatsoever to its value? If the figure is correct, I can store all the data I create, over my entire life (say 100 years to make it simple) on a 80GB hdd.
Which is of course, complete rubbish. I could set up a DV cam (hey, even a webcam would do fine) and record everything that happens here, and I'd create Terrabytes of information per year. Why don't I? Because it'd have no value to me.
I'd rather wager that the reason the information "amount" increases is better tools for collection, not much else. Like getting a 6 Megapixel digital camera over your 2 Megapixel. Maybe that counts as "more" information, but is it really those extra pixels that have value, or the picture itself reminding you of the occasion?
Kjella
Live today, because you never know what tomorrow brings
Could a large part of this "growth" be file size bloating as opposed to more information being stored? An office XP *.doc file is 3 times as large as a *.txt file (In my quick test of a two page document).
--Mike--
but I pirated 80000 MBs of data.
It doesn't seem that anyone is aware of ROT13 "encryption".
.... some more junk ... and here it goes... so how's the weather today? If you are seriously reading this, please just stop now. I told you to stop ... You are still reading this aren't you .... ok ... I give up!
LBH NER VA IVBYNGVBA BS GUR QZPN
YOU ARE IN VIOLATION OF THE DMCA
Here is some more stuff to overcome slashdot's "yelling" lameness filter. Sure, it's like I am actually yelling
- - - - - - -
Orppf urp mf y.ppcxn. yflcbi otcnnov C am yflcbi yr n.apb Ekrpatv (Dvorak -> Qwerty)
From an information theory perspective, if you make a copy of a perfectly compressed 100 MB file, the amount of information you have created is just a few bytes, the copy command and the paths of the files, and even then there's a lot of redundancy. I have a large portion of my 60 GB hard drive filled with oggs, but you could just as well describe them all by listing all the albums I've ripped. That would just take a couple k, and would also be highly redundant. I've probably created well over 100 GB of data this year, but I wouldn't say that much of it was new information.
WARNING: there is a trojan on your
...but the thesis was in English. Though it couldn't have been all that bad since we got an A (about 40% get an A, typically). And yes, I proofread that one a bit more, in fact the abstract I think has the lowest words/manhour ratio of anything I've ever written.
Kjella
Live today, because you never know what tomorrow brings
$5 / month hosted VPS on linux = awesome!
Actually not a bad idea. Collecting data may be invaluable to them, but inaccurate data is worthless.
pooped in the toilet. I make more data than that sleeping. call me when it's over half a terrabyte per person...
doesn't look like it...
Advanced users are users too!
Generating 800 Megs is easy. And it says nothing. The researchers better count the amount of 'information' or entropy in bits. But I guess that would be too hard, because a lot of data on my harddisk(s) is duplicated on other disks (for example OS's and programs) or has lots of overhead (think of the huge sizes of a Word document with nothing in it, or the recurring use of the brackets in a XML document).
-- The Internet is a too slow way of doing things, you'd never do without it.
Seems to me there's a lot of redundant information 'round.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
Seems to me there's a lot of redundant information 'round.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
-1 Redundant.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
... when you go to cash a check with your tin-foil beanie on?
I'm being facetious, of course. Or is it fascist? I do worry about it a little, what with the Patriot act, etc.
I wonder, have you tried putting some Elmers glue or something on your finger before letting it scan your fingerprint? I bet this could be a fun thing to mess with. I'd be willing to try a gelatin fingerprint transplant like we heard of for foiling finger scanners before. I'll give someone my fingerprint to cash a check at Wachovia. Would this be considered some type of fraud?
I love how commentaries on the volume of data generated\transferred on the 'net inevitably provide a handy reference to how many Libraries of Congress it would fill, or how many tonnes of paper it would take to print. Has anyone else noticed that libraries aren't full of pointless posts like this, journals nobody reads, and well, any of these? It's not information, damnit. Archaelogists 2000 years from now are not going to dig up some kid's Angelfire site looking for documentation of our race. (Heh, at least I hope not).
Hey Guys does anyone know a resource where u can see the estimated size of information of the internet? By the way Berkley research is quiet interresting, but wouldn't it be more interesting how the value of this information might be/is messured. Easy for everyone to create Information, but what about good information eh?