PetaBox: Big Storage in Small Boxes
An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.
The Internet represents a great historical tool. Case and point is what happened on 9/11. Being able to go back and see the progression, paranoia, patrotism, and early iraq/afgahanistan/binladen/hussien posts and opinions on various new sites is amazing. cnn, fox, the ny times, all are archived several times on 9/11 on archive.org.
I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!
The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.
The internet archive rocks... really hope this project continues full blast.
- Peace
'Truth' is linked in a circular relation with systems of power which produce and sustain it...
Yeah, but the thing is that the storage is spread out between lots of different 1U units, each with either 1 or 1.6Tb. So to make a RAID5 over 1.6Tb in size, you'd have to cross over multiple machines, adding a serious overhead, especially when you have to calculate parity for the parity drive. On the other hand, if you only did RAID 5 in the individual units, it'd be pretty pointless, because with that many units you'd be crazy to rely on no entire machine failures.
So, while yes, if it really was just one giant supercomputer with a bajillion hard drives in it, RAID 50 would be an ideal solution (as long as the stripes were large enough to prevent too many accesses crossing too many drives, the one big advantage of JBOD here), but that's not what's really in use here.