The Ultimate All-In-One Storage Solution
karnifex writes "Filled up your LaCie Bigger Disk already, and looking for a little more storage space? Good news! The Petabox is ready! 'The petabox by the Internet Archive is a machine designed to safely store and process one petabyte of information (a petabyte is a million gigabytes).' And luckily, as the Internet Archive notes, it's shipping-container friendly (20' x 8' x 8'). So save on delivery costs and order two!"
Will we find one of these things in eBay in 10 years selling for $10 and feel all nostalgic about those days when that amount of storage media was the size of a room?
If you have to ask, you can't afford it. Just remember that. It might come in handy again someday. :)
From the site:
PILOT STATUS 5/2004
* The first 100TB Rack is up and running!
* The second 100TB Rack will be up by the end of May
* Thermal Targets have been met
* Systems Booted from USB Dongle
* Reiser FS running
* PC-based Router running
Maybe I'm missing something but this looks to me like they don't really have a Petabyte of storage working but plans to incorporate a Petabyte of storage with only 100 TB up and running now. Not that 100 TB is anything to brush off.
I know the pull is to get these things as big as you can get but i would love to see hard drives that will work for ever. Now I know everything breaks but I mean in 400 years how is anyone going to know what we were like if all the data on us slowly goes away because the hard drives or the cds don't really last very long
just because your a schizophrenic doesn't mean people arn't really out to get you
Assuming 2 layered disks that is 10 GB per disk (feeling generous).
100 disk -> 1 TB
15000 disks -> 150 TB.
Netflix has a "mere" collection of 15000 disks. Your patebyte disk is only 1/6th full.
You upload all music CDs: 1 GB per disk (feeling generous).
How many CDs can be in print? Maybe a 500,000?
That is only 500 TB. Now your disk is 2/3rd full.
Lets upload all printed material. May or may not fit in the rest.
Then again, if you want to archive the internet: ~6G pages. 10kB each. 60 TB. each run. Store the last 16 versions -> 1TB.
Code poet, espresso fiend, starter upper.
...just mount /dev/random as a petabyte drive. Admittedly it might be hard to find your data in there - but chances are it is in there somewhere.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
So, about $1.3M (10 racks)
What would be interesting is to know the estimated maintenance costs as well. With than many drives, I imagine you'd be changing them like light bulbs, especially as time passes and the probability of each drive failing get's higher and higher.
If one was really clever, they could use the failure rate of a typical hard disk and Moore's Law to estimate monthly replacement costs for the next 100 years or so. I would expect them to rise in the short term as the drives age, but fall in the long term as moore's law catches up.
Life is too short to proofread.
The power requirements are also quite hefty. It shouldn't be necessary to run all those drives (and the computers behind them) unless the unit is near capacity and access is random (which I'm sure would rarely be the case). Instead, they should be dynamically powering drives and computers up and down, and migrating data to a reasonably small 'working set' of drives.
On the hardware front, the device in this article also incorporates 800 "low-end PCs." IOW it's a big cluster that happens to be heavy on storage. If all you want is the storage, surely there is some way to get rid of all those motherboards and CPUs with their fault-prone, power-hungry fans. They need to develop a controller that can directly handle, say, 64 hard drives, analogous to a big network switch.
Anyways, it sounds like a fun project!
You're complaining that these hard drives won't run forever and you're right. Neither will CD's. However, I would also like to point out that the vast majority of ancient egyptian papyrus isn't around today. Also, don't start goign off on using clay or stone tablets, because they break (even the Rosetta stone is broken).
Honestly, computers are still far superior to what we were using before. It's not like we've got Homer's original version of the Illiad sitting in a museum somewhere; we just have many duplicated copies that have been reproduced over the years. You're right that hard drives fail and CDs break, but we can keep updating onto new media. Besides, when a monk drops an iota when transcribing the Bible, Jesus goes from being God to godlike. When a computer adds an iota, the checkbit fails and the data is resent.
Somebody is also going to point out that, as systems change, data can become unreadable. Heck, I had a professor who couldn't update his lab instructions because the software that read the lab printouts wouldn't run on new machines and the fileformat wasn't understood by any other software. So, want to stop our data from becoming unreadable? Well, let's just do what the Etruscans did! Of course, we don't have a clue what they did because nobody can read Etruscan. For a more familiar example, think of heiroglyphics before the Rosetta stone. It's pretty common for data to become lost and unreadable. Also, this bring us back to the solution. Along with the data, include the source code for the software that can read it. If you really want to be anal, you could even include the source to an emulator for the machien it was designed to run on.
Still, you might point out, 400 years from now, we'll still lose 99% of that do to failures of whatever nature. Once again, you would be be right. However, do you honestly believe that we have 1% of all the data that was collected in 1604? Hell, most of the people couldn't even right, so we don't know ANYTHING about their lives. I'm sorry that we can't digitally preserve our wonderous society for all of eternity, but it's completely blind to believe that this makes us in ANY way different to any other culture. Read Percy Shelley's Ozymandias before complaining about how people in the future won't know what our lives were like.
If you expect a hard drive to fail after three years (I'm guessing) but these occurances are randomly distributed (an assumption that will be true after running this thing for a year or two) you can then expect that the 4000 hard drives in this array would have about 3 failures per day. This thing would never be at full speed! it would be constantly restructuring its RAID. Also, it would cost about $300 just in hard drives (not to mention controllers, power supplies, et cetera).
The last thing you want with a setup like this is having to haul hardware around or disconnect stuff if you for any reason can't boot of the disks anymore. And you certainly don't want to reduce density by wasting space that could be filled with disks with other stuff.