Info Glut - Five Exabytes of Data Created in 2002
securitas writes "If you had any doubts that you are overwhelmed by the volume of information in your life, a new Berekley study (PDF) shows that five exabytes of data were created in 2002, twice the 1999 total. That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet. Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future. The study was conducted by University of California-Berkeley's School of Information Management and Systems professors Peter Lyman and Hal Varian. More at CNet, Infoworld, ByteAndSwitch and The Register."
I looks like they are counting every tiny email about "going to lunch". Lots of DATA little INFORMATION.
That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 Gb*. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 Gb* (or more if you're using higher than 128kb MP3's). And I'm not evern commentin on pr0n.
(*I said roughly...comment on the comment, not the mathematical precision of the statement.)
"Draco dormiens nunquam titillandus."
...and most of it is still sitting in my Inbox at work right now.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
With all the time I spend at work, it seems like I've created about half of that.
That's a lot of porn. Though I think their stats are off a bit, as I have 800gb of porn, not mb. Oh well, better luck next year!
Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
a new Berekley study (PDF) shows that five exabytes of data were created in 2002,
:-)
Shoot, it felt like my doctoral dissertation was responsible for at least 2 of those 5 exabytes.
Visit Jonesblog and say hello.
here is the aritcal
Geminatron
Subject says it all
The ultimate network admin tool needs HELP!
Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.
In 70, 60 maybe even 50 years it might be difficult accesing todays hard-disks with the futures technology. And of course (as always) it brings about the problem of how long the data lasts before it's corrupted.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.
Well, why won't they just print it ? Sheesh...
United States of America, good ol' backers of world peace.
Hooray for exponential curves! It is daunting, though. As an illustration of this, I read that the White House has already turned over 2 million pages of documents relating to 9/11 to the independent investigation panel.
How about temporary and efferent data, like SSH keys and data passed through X11, used for short point-to-point transfers? It might be just me, but if this doesn't take into account that data, the total could be much higher...
I'm the Devil the Windows users warned you about.
as i just received another couple of letter asking for assistance from the war torn regions of africa, how much of this is spam and related garbage?
oddly enough the most useful information is often the most concise. duck!
Hmmmmm.... I think I might know where all that 'new data' came from.
"Lawyers are for sucks."
- Doug McKenzie
Subject says it all for me but since this requires a body...
For those curious the dictionary's definition of data is as follows.
Factual information, especially information organized for analysis or used to reason or make decisions.
Computer Science. Numerical or other information represented in a form suitable for processing by computer.
Values derived from scientific experiments. Plural of datum.
I have a Cig, but do you have a light?
But how many {VW Beetles, encyclopedias, football fields, Coke cans, DVDs, hours of porn} is that?
All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. --Carl Sagan
I hope that Varian, et. al. realize that by publishing this study, they are adding to the problem.
In the long run, the second law of thermodynamics will take care of this.
From the article Verian (an economist) states:
``We're producing all this information, but we don't necessarily have the tools to use it most effectively,'' he said.
What does it mean to use data "effectively", and is the "We" producing the data the same "We" using it? My first instinct on not having the tools to use this data most effectively is "that's good". My second instinct tells me that data is already being used TOO effectively. Personally, I hope that cross-reference of mass data stores containing personal information does NOT become more effective.
But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site?
sulli
RTFJ.
So what the writeup is saying is that there's a whole lotta data, which is a problem, and that 92% of that data probably won't survive that long, which is a problem. It sounds like these two problems cancel each other out! (That is, as long as the 8% that does survive is the useful stuff.)
I think more needs to be preserving the important e-mails of government for posterity. The DoD and other agencies do not backup or retain e-mails in any meaningful way nor does the Whitehouse or National Archive have any kind of e-mail policy, AFIK. Hard disks and, by extentsion, e-mail suffer from the time limit of magnetic media...eventually all those ions disappear and there is no *magnetic* in the media.
CDs have the translation problem...what happens in 150 years when the standards are corrupted or lost and nobody can acknowledge the binary code in any meaningful format?
I work at EMC, and this fact (along with projections for similar growth in the future) is a big marketing strategy for the company, especially toward investors. The storage market grows with the amount of information produced... it's gotta be stored somewhere!
-3Suns
~~~~
The Revolution will be Slashdotted
Is that 5 Exabyte 8505's or did they use 8505XL's?
This page accidentally left blank
People ALWAYS have prospects somehow. You just have to think about it some more and get some help from friends or professionals if you have problems figuring out what to do.
...of course, if you still wanna kill yourself, jumping off of some very high thing is the most beautiful way out... but still, don't do it :)
United States of America, good ol' backers of world peace.
I gave my 200 GB. What did you give.
--"Sorry for the inconvience." Gods Last Words to his Creation
DNA, So Long and Thanks for all the Fish
- Many large companies are building VERY large data warehouses, to capture and analyze every iota of information about every transaction. In a year or two, much of today's data will be largely irrelevant, and will likely be summarized and deleted.
- People send a lot of email, and post a lot of messages, about day-to-day stuff that has no long-term value.
- Surveillance video is used more than ever. This is not going to be stored long-term, except perhaps in the most security-sensitive areas.
Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.You only get to count data you have generated yourself, anything you got from somewhere else (99% of porn, everything on P2P apps) doesn't count.
As such, I think I'm under my one-cd-per-person (800mb) limit for the year, but I do know a few friends (artists) that would definitely be over :P
Another interesting question is whether data conversion counts - If I copy a CD to oggs, or a DVD to Divx, does that cound as new data created for the purposes of this study?
http://www.wired.com/wired/archive/11.09/full.html
If Slashdot were chemistry it would look like this:Cadaverine
How much of that was in kids' artwork for the refrigerator door? Cause that would store a lot better in a vector file format...
the major advances in civilization are processes which all but wreck the societies in which they occur - A.N. White
But how many football fields long is that?? Let's try to put that in some context that Joe Sixpack like me can understand!
There's a Mercedes gap too. I want one and can't afford one, but it's not government's job to do anything about it.
I think the more interesting thing to study would be to determine how much unique data is being generated. I mean who cares if two million people have the latest Britanny Spears song in mp3 format? And that's not even talking about "information", but just simply raw "data". I also wonder if they took into account "data in transit" (being transmitted over the ethernet) and temporary data (caches, etc).
...how much info is destroyed each year to offset these numbers. I mean shredded files, stuff thrown in trash, bills, deleted data files, discarded/lost storage media, etc... In the end (of each year), I wonder, what is the actual increase in stored information?
At Fermilab where I work, the larger experiments are expecting to generate 1PB/year of data in around 2005, up from somewhere around 300TB/year currently.
- "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
Wow, that sounds even more than Gazillion!
My new harddrive will be no less than 1.2EB...
Tera, Giga, Exa, Don't give it to me in those terms. Put it in terms I can understand!
Just how much of that was porn?
-Goran
Carpe Scrotum - The only way to deal with your competition.
500,000 Libraries of Congress, huh? I've always had several problems (SI questions aside) with this unit of measurement. The Library of Congress is constantly expanding & adding new material. What year Library of Congress do they mean? I imagine they aren't working w/ up to the minute data and that the libary is expanding much faster now. Not to mention the fact that everyone always makes exabytes ~2.4% smaller than they really are (and with numbers this big, it actually makes a difference!)... So call me the new number nazi troll already and get it over with...
Webmaster Wanted - Entropic Reactions
Why is it that everything that is data is related to either/or x libraries of congress or y encyclopedia brittanicas, as if either of those is actually an approachable figure. I want to lobby for a new measure, such as x two hour porn dvd's or y illegally downloaded songs.
The truth about Scientology, Xenu, and you: Operation Clambake
pr0n + spam + kazaa
!(^((ri)|(mp))aa$)
It repeatedly calls malloc() and free(), storing information in RAM, which may create an interesting problem for historians and archaeologists of the future.
Then think of how many bytes of that number are actually backed up if they are irreplacable?
I'd bet not much. And what is backed up may only have a shelf life of about 20 months if on poor CD-R or Floppies.
Saskboy's blog is good. 9 out of 10 dentists agree.
Damn- that puts some stuff in perspective... 800 MB per person is really not that much... just over one CD per person on the planet.
I personally burned over 500 CDs last year, filled a couple of hard drives, and sent God knows how much email...
I think this goes to show what a wealthy little world we computer people live in.
... how much of it was porn? :)
Hey, way to add another 800k to the glut with this pdf file!!
I just did another backup, so the figures are right at hand.
I'm a news photographer, shooting digital.
In 2002 I saved 78,742 photos to disk. (Bad images were not saved.)
That worked out to 122 gig. The output was transferred fromt he CF cards and archived to DVDs.
But how much of that 122 gig is really information? The image file saved by the Canon 1d is mostly empty air, as far as I can tell. There is also EXIF data and IPTC, and who knows how much hidden BS is included a'la Microsoft Word documents?
Simple compression was able to whittle that down to 33.2 gig. So that's my contribution.
The main beneficiary is the DVD-R blank disc makers and Western Digital, I guess.
It really makes you wonder how much of that data is just redundant waste.
How many other sysadmins out there are tired of hearing this? Every time I go to a company and even suggest quotas on the file server, the engineering group always says, "Disk space is cheap, or "you can get an 80GB disk for cheap."
Of course, this never takes into account backup media and the whole backup infrastructure (anyone price decent commercial backup software recently?).
I'm surprised it's only five exabytes. The admins of the world should go ahead and put a 400MB Quota on all 6.3 Billion people. That way, we'd be down to 1999's storage levels....
It's about 6 Exabytes.
It's a joke..
Friends don't help friends install M$ junk.
If we used analog computers instead of digital, how would this be measured?
tasks(723) drafts(105) languages(484) examples(29106)
Doesn't just one experiment produce 45 zillion
megabytes. (Don't quote me on that.)
An mp3 is usually about 1 meg a minute. But a raw wav file is several times more. The same goes for raw video verus mpg2 or quicktime.
I suppose the number could be much larger if you expand data before counting it.
I don't understand, how many elephants does an exabyte weigh?
Looks like 599, assuming said motion picture is a complete rotting turd. Thanks for gems like this one, MPAA!
-Looking for a job as a materials chemist or multivariat
:: Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.
Even though it knows the Answer to Life, the Universe, and Everything and number of feet in 10 metres, it can't convert 10 libraries of congress into feet of books.:(
I demand that this be fixed immediately!;)
Is this a sigs-optional kind of place? 'Cause I am totally down with that if you know what I mean.
ln2(5 exabytes) is a little over 62!
(62.3 for RAM style exabytes or 62.1 for HD style exabytes).
Thoughts on tech, Software Engineering, and stuff
Not least for those historians who want to know what my Amazon.com session ID was on the day that my Runescape character hit mining level 33.
Shop as usual. And avoid panic buying.
What's the big deal? That's only five 8mm tapes, isn't it?
Call me old fashioned, but I like a dump to be as memorable as it is devastating - Bender
...how many golf balls falling on said stack it would take to knock it over. And if you laid all the bits in the data side by side, I wonder how many times it would go around the earth?
-Looking for a job as a materials chemist or multivariat
I'm a library science student. I'll have my MLS in December, and I've found a lot about this topic. In fact, I'm sitting in the library science library right now.
For books, the standard is that any book should last for at least 500 years (Though this is a problem, what with all the acidic wood pulp paper publishers have used since the mid-1800s). The much-hated microfilm has that same lifespan.
But we are nowhere close to finding a viable archival format for electronic information.
This is a problem. There is so much important stuff, but digital formats change so fast we can't keep up. And the reliability of computer hardware is another can of worms.
Libraries and Archives would bow down to anyone who found a format that remains viable, readable, and usable for perhaps the next century.
Now, here's a little math for you
United States of America, good ol' backers of world peace.
Don't worry about being able to read old legacy data formats. If there's any interest in the data, there's somebody somewhere who will write an interperter / converter / emulator for it. Just look at the 8-bit emulation scene.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
5 billion files are created every day.
3 billion of them will never be found again.
Poor files...
When men used to be men
long x;
{ for (true)
x = rand();
send_to_info_glut(x); }
Please send the data created to Info Glut, and while you're at it, send it to all the spammers and to SCO. With some luck, you might DDOS them off the internet.
Wh47 d1d j00 541, 31337 15n't t3h r0xor5 ne m0r3???
> ...it's gotta be stored somewhere!
/dev/null is the prime choice of storage medium. This should really be an opportunity companies producing high speed, high capacity null-devices.
For most of it
Where are the VC when one needs them?
And,
Floppy disk volume: 0.0889m * 0.0889m * 0.015875m = 0.00012546345875m^3
VW Jetta Cargo capacity: 368.119 liters = 0.368119m^3 (assuming all seats in place, and NOT the wagon model)
So, 763549741511.11 floppies * 0.00012546345875m^3 = 95797591.4976523121517125m^3
divide that by the 0.00012546345875m^3 per Jetta, and we get:
~7.635 x 10^11 Jettas required to ferry the floppy disks to the dump site!
And all I want is a VW minibus. makes me seem quite modest..
free speach
Did you mean: free speech
I'm 173.205 percent sure these numbers are not very accurate. I'm 314.159 percent sure that they won't affect how I sleep. And I'm 628.318 percent sure that the funding for this kind of "research" has an upper bound.
WWJD for a Klondike Bar?
I'm still attempting to figure out how to hook up my 20MB hard drive from my first computer (Its not IDE) and get one very small (less then 100k)file.
:) :)
Being the usual procrastinator it gets more and more difficult to retreive this file.
The hard drive was hooked up to a 286 through used two cables (one small, one large, not including power of course) and went to a daughter board.
If anyone has any suggestions on how to retreive this data that would be super
-Steve
Candle burns its brightest in the dark
I just downloaded a WinXP "patch" - better chalk up another exabyte.
Well, common papyrus wasn't exactly the most durable material. That's why most of our information on egypt comes from carved stone sources, and remnants of painted symbols on stone. Copying and re-copying literature was an important job in the middle ages, which is one thing that gave the Church so much power... the monopoly on reading data.
Check out Frederik Pohl's Gateway series... humans find a remnant of an alien outpost on venus, and a ship on autopilot that takes them to a hangar of spaceships on an asteroid. On the stations, they find that the aliens were in an awful rush to abandon the place, and they left behind all these metallic folding fans and other widgets. The humans said "wow, neat, these must have been their toys" and went and sold them as novelty items to the public. It wasn't until the third book or so that the humans "discover" that the fans are solid state storage devices, and that the Heechee had left behind all of the manuals to their machines when they left, and that the fans were newspapers, books, videos, art, etc... everything about their culture... but because the humans had nothing to read the data, they had no concept what the devices represented.
I just want to point out that 800 MB per person works out to 1,600 slices of 512x512 CT data (the standard size of CT slices at 16 bits per voxel) - which means that this amount of data is roughly the same thing as about a 1mm * 1mm * 1mm CT scan of every human on the planet.
Education is the silver bullet.
in 2002 I personally created about 400-500GB of data.
sometimes, I really have to wonder about studies like these and where they get their info from. . .
the history of the world
Statistics like this only serve to amaze and astound pointy haired boss types. Oh my God! They shriek. Do we REALLY??? Meanwhile, the world keeps turning, we all keep getting up in the morning, and I keep wishing I could get laid. Just once. I mean, REALLY!
Seriously, though, I bet the breakdown is something like this:
1. Most of the "information" is probably composed of music and film. We all know how much bandwidth and disk space music and film take up. Here's another thing: different sites might have different copies of a film, so there's probably a lot of duplication. Not to mention the zillion copies of any given song that are being passed around. I really don't think of this stuff as "information". It's more "entertainment" than anything else. Some of it may be interesting for archival purposes (news footage, for instance) but the news companies already do this. THIS AIN'T A PROBLEM, FOLKS.
2. Another large chunk of the "information" they're kvetching about is probably (almost certainly) composed of transitory messages like emailed messages and IM. This stuff was never meant to be hoarded. And it doesn't matter. It's used, it disappears, that's it.
3. Yet another large chunk of this "info" is probably control messages passed around the web as internal controls (ICMP, etc). Again, this stuff is transitory, like emailed memos. Who cares?
4. Getting into the "real stuff", you have all the ecommerce going on. But each company handles its own backup and storage. This is not a societal problem, this is an individual problem. Companies can deal with their own information storage problems. If they design their applications well, they won't have to store so much. But this isn't even that serious a problem there; it's just part of doing business.
5. Then you have informational web sites, and personal sites, and blogs, etc. They come and go -- they always have. Everything interesting gets cached or mirrored anyway. This isn't much of a problem either.
6. Finally, you have real paper documents, like those used by the bank and the government. Ok, some of this might add up. But they've got procedures in place (and they've had them for hundreds of years) to deal with this. Digital technology is actually making this easier, not harder, so that's a good thing, right?
Overall, who cares how much information is generated? It's a useless statistic, like the tonnage of toilet paper people use annually. It might work as filler for, say, a "Ripley's Believe it or Not" strip in the sunday paper, but that's about it. Who cares? If someone started screaming "OH MY GOD, do you know how many TONS of TOILET PAPER America uses in a single YEAR??? IT'S A CRISIS!" wouldn't you slap that person? I would. Unless she was a hot chick (see paragraph 1).
Farewell! It's been a fine buncha years!
Maybe more research could be done into a marketable multi-century (millenial?) storage. For corporate purposes, several decades of fidelity, perhaps a century or two, would be fine - but government will need better than that.
Yeah right. The government wants all historical data distroyed as soon as it is created.
Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.
They fail to mention that also of note is that 99% of that informations is in the form of pr0n! That's a lot!
If I say zettabyte and yottabyte did I just create new measurement terms?
Silly reporters!
I design user interfaces for a free network management application,
Dangit, Cowboyneal! I told you to turn off that packet sniffer at MAE East!
Now look what you've done.
-Adam
What if you take a page with text and scan it? It can take a size anywhere between 30-1000 KB. The same text can be written in an text editor in 5-6 KB. In MS word in 60 KB.
2 years back, CD-R's were the in thing. Everyone and anyone was storing data on it. Since its size was 700 MB, files were generally smaller and compressed. Higher broadband connections and DVD recorders(alongwith faster processors) are becoming common, people don't care so much about file sizes.
Regarding duplicate data- ask five people to compare what files take up how much of their hard disk.
Maybe slashdot could do a poll on this, asking what percentage of space do movies, music etc take up on the hard disk. This would give a rough guide as to how much data duplication takes place.
If you go to IRC servers, you will see bots with uploading speeds of 2-5-10 Mb/s..
Lots of people download files from there.
Stuff that is interesting to one might be interesting to millions of others on the net.
Similarly, if you check the files downloaded from download.com, you might see a 15 MB application downloaded millions of times.
That is a lot of data duplication.
If the data on the web is say 1 exabyte, then there must be a corresponding amount on the hard drives/backups of people, organisations... who put this stuff on the web in the first place.
If poster had carefully read the report it is noted in the report that the comparison is to the print collection of the Library of Congress. If you add in their audio and film collections they have at least two orders of magnitude more data. Even the LOC doesn't seem to be sure how much their entire collection is.
No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
In a weird way, this reminds me of the Jumping Jesus Phenomenon
Five exabytes of data is a meaningless figure if you consider that probably 52% of that was pr0n. The other 35% was source code (non-human readable data). And the remaining 13% was made up of spam, web logs, and e-mail to grandmaw.
Un-news
Regarding web pages:
You read that right, 28% of the internet sampled appears to be porn. Anyone surprised? Read on...
Regarding P2P networks:
This follows my general idea that part of the reason that the internet is as large as it is, is due to the fact that it allows anonymous connection to taboo material.
Clinton made me a Republican. Bush made me a Libertarian. Trump is making me question reality.
just wait until the HDTV porn files start swapping...200 exabytes here we come (no pun intended)
Most optical media does not have any better longevity than magnetic media, and in many cases is actually worse. There are a multitude of problems. For stamped discs, the most insidious is oxidation of the aluminum reflective layer, which reduces the contrast ratio between the pits and lands to a level too low for normal drives to read the discs.
For dye-based writable discs (e.g. CD-R) there is the same problem (though with regard to the pregroove and general reflectivity rather than data pits and lands), and the dye will eventually undergo the same chemical reaction used to write the disc due to ambient temperature and aging.
For phase-change discs (e.g. CD-RW) I expect the temperature and aging problems to be reduced due to the higher activation energy needed for the phase change. However, I am not aware of any actual studies on longevity of phase-change media.
Discs with a gold reflective layer are basically immune to the oxidation problem, but how much of the 8% of data that is not on magnetic media is actually on gold phase-change discs? Probably only a trivial percentage of it.
hmmm....p0rn.
reminds me of that one ep of the simpsons where Bart starts drawing Angry Dad cartoons and Leny says "It's the number 1 non-porn site on the web; 1 trillionth overall"
it's the other way around.
harmonious design
I admit I take more pictures than most, but I haven't gotten a video camera yet... just think of the Terabytes I'll consume with that bad boy.
--Mike--
Helium is the preferred method.
Search for hemlock society.
No GF is no reason to kill oneself anyway.
From the 400 or so years that are classed as the Old English (upto abotu 1150 AD), we have a total of 5 million words in texts. That would probably fit on less floppy disks than Windows 3.11 and its Dos. Or in my telephone. It's true that not all bits are equal.
Now, are you using the current Library of Congress Measurement, or are you using an old one? I mean, new books must be coming in. I presume that's not just the ASCII, but scans of the pictures as a decent resolution.
How will I ever do the proper conversions if you aren't using the up-to-date standards?
=Brian
There is nothing so good that someone, somewhere, will not hate it.
Holy crap! There's a lot of everything in the world. Why is data much more exciting?
Dividing 95,797,591m^3 of floppies by 0.368119m^3 per Jetta, the requirement is 260,235,389 Jettas to transport them all there. Or one Jetta, preferably one more reliable than my old thing, 260,235,389 times.
(Is the cargo capacity really that little? I would think it's over a cubic meter. Maybe they reduced the capacity in newer models.)
sulli
RTFJ.
That there can't be an accurate data representation of the data in the Library of Congress because THEY don't know how much stuff they have. My cousin worked there this past summer, and he said they still have a large portion of the basement filled up with (unorganized, mind you) stacks of CD's that they haven't even put into their database yet. Same goes for books. It'll be awhile until anybody knows how much data the LoC has.
I belong to the ______ generation.
Cuz I know the guy hosting this file is going to have a huge bandwidth bill.
You want the truthiness? You can't handle the truthiness!
Okay, call me... A dork, but wouldn't a film real technically be media and not data?
I mean, come on, why not count all the stuff kids write on paper... Oh wait... Nevermind that comment.
How about the little, itsy-bitsy electric impulses running around in my brain? That's data.. Kinda-sorta... Okay, okay... Most of it is cobwebs, but still.. If a duplicated film real (aka MEDIA) is counted, then you have to start adding other things to the mix.
Historians/anthropologists/archaeologists are interested in the ways in which the past created its future.
They're not interested in analyzing every lump of dung a past civilization created.
If they have 3 lumps of dung from a million individuals, it's something they'll study. If they have a million lumps of dung from 3 individuals, no.
Just how many copies of the goatse.cx picture do you need to archive, anyway?
The incredibly long thin strip of plastic with the tiny holes running along the edges is the media. The sequence of pictures is the data. What they did was figure out how big of an mpeg-2 file file would be needed to encode the movie. A lot of what this study is, is not so much how much data was generated, but how much new data storage capacity was generated. For example, if the industry produced 1 million blank cds, the study would show 700 million megabytes of new data.
"I'm not impatient. I just hate waiting." - My Dad
Rambling in my head: "Maybe I should have read the article first.... Boy do I look stupid!"
They built fudge factors in for this. I read through some of the methods they used. For their internet figures, for example, they sampled 9800 websites of the supposed 61 million URLS compiled by the Internet Archive (enough to get a 95% confidence level), wget/mirrored them to thier own servers (dropping links to other domains), and then analyzed the files for creation date, size, and uniqueness. For television We estimate about 1/4 of the programs are "original,". For CDs, they estimate that 1 in 20 gets trashed. Presumably, these figures are statistically based.
"I'm not impatient. I just hate waiting." - My Dad
Couldn't you know how many hard drives have been shipped, their capacities, estimate how long they last, and then take some random samples of how full people's hard drives are and then make an estimate?
Is that what they did?
Avoid Missing Ball for High Score
Of course there are people running dataless. A fifth of the planet gets by on less than $2 per day... Those people are not storing much data... To make up for it, the rest are actually storing several gigs, at least... Likely much more if you're reading /.
It seems wherever you go these days we're taking photos of it - these days usually in digital. Having become the (proud) owner of a Canon 300D 6MP camera in the last few days I am amazed that in the good old days of the 8086, where Wordperfect 4.2 and DOS 3.0 didn't quite full a 10MB drive - today that same drive would hold only ~ 3 JPG photos from the camera ...
and then there is the old saying that junk *will* fill the space provided.
Jon - TheSpork
... is that 2 of those exabytes was just data created by the researchers to discern the amount of data made in 2001.
does that count?
Some days at my job I create gigs of test data every few minutes.
... 5000 years from now?
in their eyes, this century will hardly exist.
But only a fraction of that will make it onto my web site - I have maybe 60 megabytes of photos (cut-down to around 100k each) online and 10 megabytes of text on my web sites, and would be adding less than 40 megabytes a year to that.
Maybe I'll get a video camera, though, or put up some MP3s of my gamelan group...
Danny.
I have written over 900 book reviews
and how much was lost due to people not creating backups?
I've experiments to run, there is research to be done on the people who are still alive.
1. Get 500 TB raid array 2. Mount at /data
3. cat /dev/urandom > /dev/data/file.dat
4. Wait a while
Does this count as "creating" 500 TB of data? I don't think so. Simmilrarly much of these comments about Kazaa and P2P are stupid... just because theres 500 TB of data on Kazaa doesn't mean theres 500 UNIQUE TB.. probably over 90% of it is duplicates of other data, after all that's how P2p functions.