PetaBox: Big Storage in Small Boxes
An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.
For all the jokes out there about people 'downloading the internet' it's good to know someone is actually doing it.
If, If only I could get a hold of one of those, I could Rival GOOGLE! Yes! I can become the next internet craze with my super, duper search engine crawling the web! I have the space, now I just need a connection in the middle of Alaska fast enough to rival google...
Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.
R. Kelly was scrambling to find the company's phone number.
Wait, is a petabyte sized file called a petafile?
If so, then this storage must be for all the recent Michael Jackson coverage.
Show me on the doll where his noodly appendage touched you.
Imagine a beowulf cluster of these babies... ...oh, it already is one. nevermind, I'll get my coat.
They do a lot more than that! I've just been downloading some Warren Zevon shows from their Live Music Archive.
LOAD "SIG",8,1
Not to sound like an advocate or anything... But how is it that the Internet Archives project resists claims of copyright infringement and the likes when they have copies of entire websites in their records?
"I'm a philosophy major. That means I can think deep thoughts about being unemployed." -- Bruce Lee
Isn't that what naked girls climb out of to protest fur coats?
Thank you, I'll be here all week.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Everyone, please feel free to chime in so I don't feel like such a goon for saying this, but damn, these big systems are a reason to live, aren't they? I mean, I saw the rack of red on the link, and it just makes me drool. It's not so much the storage, but the logistics of the thing. I mean, I get the same feeling when I watch Jurassic Park, and whats-her-name is pumping up the electrical charger to get the main switch going. Charging a switch? Welcome to flavour country.
Anywho, just wanted to expose my hard-on for hardware, my raison d'etre. Someone, give me a job a datacenter, or a power plant. I beg you.
Forget the jokes. That setup kicks the ass out of any beowulf cluster. Heh.
UTF-8: There and Back Again
Haven't read TFA yet, but what are they doing with regard to redundancy? With that many drives whirring around more than a couple are likely to go bad over time. Do they have a set of dedicated redundant drives to serve as backups?
What kind of power bill are those guys getting and is their service really worth it?
1.5 petabytes? That hardly enough to hold a decent porn collection.
Right, sure, like anyone believes that you want that much storage for music. You just want to use it for pr0n.
Seriously, I think archive.org deservese sutch a storage system. I have very often wanted to go back to view an archive of a website a while ago, but the cache on Google was from yesterday. It also gives multiple archives of the website based on day which can be quite handy, especially for news related sites. I think they quite well deserve it.
Didn't realize the moderators were Michael Jackson supporters.
Which reminds me of why Michael Jackson likes twenty eight year-olds. Because there's twenty of them.
Not "periodic", continuous. Own a website? Check your logs for the user-agent "ia_archive".
Free of Flash! Free of Flash!
this would actually be useful!
Sorry, I'm just bitter after almost a decade of Sistina's promises to get their global file system working 100%. We were one of their victims, err, customers.
I give 72 hours tops before one of those fettish case modders makes a 'peta' case. Oh shit, I was thinking chia.
Stop invalid scientific research. Ask your local scientists to feed their lab rats with a phytoestrogen-free chow.
So the inventor of the microprocessor dies and suddenly the definition of 'small box' for computer components is again reduced too 'fits in a big room'....
fish and pipes
"nobody needs more than a perabyte of storage"
I am more than slightly concerned about the lack of RAID in the system. They said that they had some sort of painful experience with RAID 5 not scaling to petabyte-size storage and therefore recommend JBOD. I wouldn't expect RAID 5 to scale to petabyte-size storage because of the parity all being done at once and in the same place but there has to be a way around this that still allows for redundancy. Take a RAID 50, with a lot of RAID 5 arrays in the hundred-terabyte range and a RAID 0 array striping over them, still provide redundancy with only slightly greater inefficiency and dividing up the parity process to the smaller RAID 5 arrays. Also, $2/GB seems kind of high to me, given that hard drive prices are down to $0.33/GB and you're putting 4 in each mass produced box.
So if the storage is JBOD then what about redunancy when a drive fails?
Wow - have they calculate how much is the running cost per day ? I might just stay with my iPod instead for the time being~
Haha~
Where can you purchase 600GB drives these days? (1.5PB / 2500 drives)
The math doesn't work when you multiply the number of systems out either: 600 systems * 1.6TB/system = 960TB. That's just under a petabyte, or am I missing something?
Also, if you've got those in a RAID5 setup, you're 'only' talking about approx 800TB of usable space. That's far less than the 1.5 petabytes claimed.
800TB is a lot of space, but there must be a cheaper/easier way than purchasing 600 systems to do it.
You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
1. According to the specs this thing is 600 1.6TB JBOD array's.. They must handle redundancy on top of the storage mechanism, but they don't mention it anywhere..
2. The blurb says that they have roughly 1.5PB of storage space but by my calculations it comes out to roughly 1 PetaByte (40 servers per rack, 15 racks of systems = 600 servers ( 4 * 400GB per server ) = 1 PB
Mirrordot to the rescue :)
d 70a9ef91f0d/index.html
http://mirrordot.org/stories/83ede29a5f303f8c47d1
I've actually read TFA. They recommend JBOD configurations to their clients. One drive goes titsup and you've lost 400GB of data. Do they at least offer some kind of mirroring/redundancy solution to back the data up to another array?
The Internet represents a great historical tool. Case and point is what happened on 9/11. Being able to go back and see the progression, paranoia, patrotism, and early iraq/afgahanistan/binladen/hussien posts and opinions on various new sites is amazing. cnn, fox, the ny times, all are archived several times on 9/11 on archive.org.
I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!
The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.
The internet archive rocks... really hope this project continues full blast.
- Peace
'Truth' is linked in a circular relation with systems of power which produce and sustain it...
are going to make a killing of the IA when they have finished, it isn't like they haven't made enough money off others as it is, so they may let this one slide in the name of conserving data. On that note, is the IA downloading EVERYTHING or selectively downloading to prevent such issues as copyright infringment?
Go ahead. Try Slashdot in the wayback machine.
Slashdot has looked virtually identical since 1998!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Once peak oil arrives there will be a total economic collapse. Companies like Capricorn will go bankrupt as Americans just try to save enough for food.
I remember seeing this box at the Univ. of San Francisco Flashmobcomputing event. Brewster Kahle (founder: IA) was showing it off. I saw it a few weeks later running at IA's Presidio office. This was a while ago...
1 http://www.doskir.de.vu/
Since when can 16 racks be described as small?
Okay great achievement and all, but the title is simply not pedantic enough....
...a beowolf cluster of these.
Sorry, it had to be done.
This is not the sig you are looking for...
Namaste
"We experimented with hot-swap, but found it caused as many problems as it solved. It actually induced failures, so we backed away."
Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?(we) "tried then backed away from RAID, instead opting to recommend JBOD"
"We had a painful experience with RAID 5, which does not scale well to petabyte-level storage."
A friend of mine used to work for Sony... he swears this is a true story:
:)
Sony had a petabyte tape backup system they wanted to sell into North America... called the "Peta-file". Thankfully, Sony NA managed to have the name changed prior to it's introduction here.
So, PetaBox is slightly better... slightly.
MadCow.
I used to have a sig, but I set it free and it never came back.
Built a VIA based storage cluster as a test some time back. Surprised that they have decided to go into production. 2 harddrives on an IDE channel is not a good idea, VIA boards are not highly reliable, 100Mb ethernet is just too slow if you want to copy the contents of one machine.
Also they havent really worked on the software side - its just a bunch of machines you have to rsync to, which really gets to be a pain to manage when you have that many.
http://hardware.slashdot.org/article.pl?sid=04/05/ 11/2050247&tid=198&tid=137
Also, the article says they don't like RAID, due to bad experiences with RAID5, and the system is configured as JBOD (Just a Bunch Of Disks). It doesn't say why, or what users should do to get equivalent protection. My guess is that depending on RAID within a box means you're still vulnerable if the box's CPU or disk controller decides to scribble the disks, or the power supply decides to catch fire or short out and deliver 240VAC on the +5V line or whatever. So if you want a RAID-like set of redundancy, set up your applications or file system mounting or something to calculate the protection disk in software and hand it off to another 1U box for storage.
The overhead of the motherboards here is not that high - they're about $150-200, and support 4 disks that probably cost $200-300 each, so they're only about 20% of the cost, which is not bad. The article didn't say they're using SATA, and it sounded like it's some IDE variant instead, but if you're only using 100 Mbps Ethernet to connect to the box and not the optional GigE, it's not the bottleneck anyway. If you wanted an alternative design, you could probably do something with a couple of 4-way SATA controllers per CPU, with a lot of disks stacked vertically in a 3-4U box looking like an X-serve or something. But that wouldn't necessarily have much of an advantage.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I read the article, and the website of the company, but I couldn't find out how you're supposed to access all this data? It's hardly practical that every node exports it's own NFS, is it? Is it supposed to use some kind of cluster file system such as (Open)GFS?
Or is the user expected to do some kind of in-house thingy, like google or (presumably) the internet archive?
I can't think of a single reason to use a JBOD setup when you could just as easily use RAID 0.
If you don't need redundancy, great, fine, you can be redundant elsewhere. I'm down with that. But RAID 0 is so easy to implement as opposed to a JBOD setup and works so much better that there's essentially no reason to ever use JBOD except pure laziness.
I mean, with either one, if you lose a drive, you lose the array, but at least with RAID 0 you get the benefits of striping in both read and write operations, basically doubling your throughput speed.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
I saw the topic and I thought what the hell those animal guys have to do with slashdot...?
Manojar - pronounced like Manager
1. Buy a petabyte system
2. Backup the internet
3. ????
4. PROFIT!!!
It was 3 or 4 years ago when I saw a 600 terabytes (0.6 petabytes) tape-based storage system at CERN.
this post contain no useful information, no need to mod it down
looks like generic mini-itx, but who makes the 1u? custom built?
2,500 spinning drives!!! These folks are located in San Francisco... if there's ever an earth quake the gyroscopic effects could flip the building over! Perhaps they should mount every other drive upside down to cancel out the effect to prevent serious injury ;)
Signatures are a waste of bandwi (buffering...)
Depends heavily on your purpose of the system, of course.
If you need something that is highly aviable and have good performance, then raid is wonderful. But archives don't need to be highly aviable, they just need to be highly redundant and backed up to several places.
For instance if you have a RAID 5 array, then a single harddrive failing couldn't take it out. But a single controller failing could. If one drive starts spewing out nonsense then that corruption could be replicated automaticly between harddrives on a array before anybody notices or hardware monitors shutdown everything.
So in this sense simply having multiple copies on different computers on different disks is actually preferable to raid setup. It is simplier, as long as you have high quality distributed filing systems, it's easier to restore materal. It'll be easier to access down the line.
It just won't have the higher performance or high aviability that raid will provide.. but then again it doesn't realy need it.
And remember:
RAID != backups.
First off, this isn't quite an example of a company suddenly deciding to donate stuff to the Archive. As can be seen on their own website, Capricorn was spun off from the Archive on July 1, 2004. To a large extent, Capricorn exists for the specific purpose of providing storage to the Archive, and if that same storage can be sold to others so much the better.
Second, what about interconnects and performance? The product descriptions say nothing about SCSI or FC or other storage-oriented connectivity, so one must assume that the connection to these boxes is through a network. That would mean each node is an NFS server (or similar), serving up 1.6TB using a 1GHz C3 processor, a maximum of 1GB of memory (for caching etc.) and what appears to be a single GigE link. Can you say unbalanced? The Internet Archive might be the only system with an access pattern so sparse that the ratio between capacity and performance wouldn't be crippling. Don't try using one of these with any other kind of application if performance is a concern...and BTW they don't seem to say anything about high availability or other storage functionality (e.g. integrated backup or snapshots) either. Capricorn's big play seems to be power consumption, but there are other players that can beat them on density (e.g. Copan with 224TB per rack) and multitudes who can offer better performance/functionality. I hate to sound negative, but this is a product so specialized as to be uninteresting.
Disclaimer: I think I met some of the Copan guys once and they seemed cool enough, but there's no other relationship between me and them. That just happened to be the first name I thought of in this space.
Slashdot - News for Herds. Stuff that Splatters.
What kind of metastructure do they put on the disks to achieve that kind of large filesystem, and improve reliability?
The first version will be called Capricorn One.
"Reality is that which, when you stop believing in it, it doesn't go away." - Philip K. Dick
it mentions that they backed away from raid in lieu of jbod? obviously there must be some redundancy, though. a few of your 2500 sata disks will certainly die...maybe there's redundancy between nodes somehow?
-notext-
Please, for the good of Humanity, vote Obama.
But that doesn't make any sense. They talk about needing to replace drives, and opting out of the use of hotswap, saying that it caused more problems than it solved... but JBOD means no redundancy at all. (And with that many drives, there will be frequent failures, as TFA also stated.) So how do they deal with data failure? There has to be some solution for redundancy; what is it? Okay, so RAID 5 didn't scale---I'd think they'd use a sort of hierarchical RAID, but JBOD isn't any sort of enterprise-level solution.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
If they delivered the 1.5 petabytes in one week (7.5 days) that's about .2 petabytes a day or 1.6 petabits a day... which is roughly 333.33 Tb/s
Impressive!
Shouldn't have used those backup tapes for streamers, I guess.
Or was it backup CDs for coasters/frisbees?
(CDs don't work well for frisbees. In my experience they break after just a few brick walls, and it costs a stroke, and makes it harder to get par.)
Exam 4/C again. Maybe I'll do better this time.
That's "case in point". Like "under scrutiny" or "off topic". Which is what I should be modded.
Sorry.
Raise your children as if you were teaching them to raise your grandchildren, because you are.
I was driving to work. It wasn't a long drive, but more than 5 minutes.
"Macarena" was on the radio when I started the car. A few minutes later "Macarana" was still on, and I was thinking that the song must be longer than I thought, or something. About then the DJ came on and said "We're playing 'Macarena' until you vomit." Then played the song again.
After that iteration of the song the DJ came back and played some phone calls of people begging him to change the song, but he just said that it was "Macarena" until you vomit.
I don't know when the thing started, but by the time I got to work it was the 17th or so "Macarena" in a row.
Exam 4/C again. Maybe I'll do better this time.
...so come on, tell us - how many?
"Despite its large size, the IA's PetaBox installation draws only about 50kW of power."
Hell, hydro's included in my apartment. I'll take two.
I am a biomedical engineer for Cardiology at a top 25 research university medical center. One of my primary responsibilities is maintaining the cardiac PACS for the medical images we create. We generate about 2TB of data a year, and Radiology does probably ten times that amount. Our data, stored in DICOM format, is static; by law, we cannot change it (the patient demographic information is included in the file header and if a nurse mispells a patient name, etc, we only update the image location database, not the image file itself). Once created, the images are accessed several times a day until the patient goes home, when it might not be retrieved for weeks or months at a time. However we have a legal obligation to keep the image available for seven years (for kids, it is until they turn 21) so cheap storage is a good thing for us. The current DICOM standard archive media is DVD-R and we use 200-disc rack-mountable changers. We researched going with an EMC Centera NAS unit but our cardiac PACS vendor wouldn't certify it because data flows through and is altered by a gateway server. If we had direct access to cheap storage, we wouldn't be affected by the performance imbalance.
that's the only thing that's missing from the current internet archives, here's hoping they devote some resources to indexing
Yeah, but how many of them would you want to see naked? Unless you have a chub fetish, you're unlikely to find the US demographic pool particularly attractive.
On the other hand, you could just go grab a Livejournal account, join the communities "kaizersoze125" and "show_your_boobs", and marvel at the quantity of amateur porn folks throw out there for free.
Seriously. There's some high quality out there. Some of it's not even members-locked (earningtails, for instance).
--grendel drago
Laws do not persuade just because they threaten. --Seneca
Funny thing about your sig---I just noticed that, as your wishlist is on Amazon.co.uk, the items say things like "Usually dispatched within 24 hours". In US English, we say 'shipped' instead of 'dispatched'. I never knew that was a UK-ism.
Learn something new every day, I suppose.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
About ten years ago, driving through Califorinia's central valley, a DJ announced "KGARTH---all Garth, all the time."
I figured it for a gag, making fun of the overplaying of the singer of the week as he played a couple in a row.
At about the fourth song, the joke was old, and I found another station (for crying out loud, he only had 2 or 3 albums at the time!).
An hour or two later I checked (morbid curiosity). They were *still* playing Garth Brooks.
hawk
This is simply a large collection of commodity-quality hardware. There is no RAID and no hotswap, so a hardware failure results in large chunks of data being unavailable for extended periods while data is restored. Useless for truly critical data.
Gamingmuseum.com: Give your 3D accelerator a rest.
"AS been".. i see... the notion of petabytes must have increased the illiteracy factor of the editor.
Copying music isn't necessary to listen to it on the radio.
;-)
Yes, it is. You, by playing your radio, create a copy of the sounds that were made in the recording studio.
The singer doesn't just shout really, really loud, you know.
Copying a movie isn't required to see it on HBO.
The television, again, creates a copy of the movie from the signal it reads from it's attennna, or over the cable it's connected from.
You can try this experiment if you don't believe me. Get two TVs. Turn them to the same channel. They'll both work at the same time, and they'll each independantly display a copy of the movie. Seriously. Trust me. It really works!
Making a copy of html from the internet IS part of getting your computer to display it.
The computer makes a copy from the signal it gets over the phone line or other cable that it's connected to.
In what material way is translating a signal sent to a computer screen different than translating a signal sent to a television screen or radio speaker? In all three cases, a new copy of the work is made from the signal recieved.
As for your last point, books, yes, we do keep a copy of the text in your minds as we read. The law, however, chooses to pretend that the copy in our head doesn't exist, or at least, doesn't apply it to copyright law.
However, if you read a book out loud, you do violate copyright, at least in my country. There's even a special exemption in the Copyright Act that says that teachers can read "reasonable portions" of a book out loud for purposes of academic study. Yay, copyright!
So, reading books to a class makes a copy. Listening to the radio makes a copy. Watching TV makes a copy. And so does downloading HTML to a computer.
In other words, a copy is a copy is a copy. The internet isn't a special case. Google probably is risking prosecution under copyright law, but will probably win if prosecuted, because judges like Google. Judges are ex-lawyers, and as such, are quite good at protecting their own interests: and everyone uses google, even judges.
Short answer: in this case, the good guys will probalby win, because the bad guys find them useful.
--
AC
to keep a copy of every forked distro with source and commentary on same going back to the beginning and onward for at least the next five... minutes. Too bad most of my floppies are history. Anyone still got a complete set of Yggdrasil files?
If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
I wonder how redundant the archives actually are.
For example, if they backed up Google cache, that would be absolutely redundant, they would waste space.
On the same note, does google cache cache its own cache ? (stupid tongue twisters)
"Petabyte" just sounds so dirty. :)
Anybody knows how this thing works? I mean, I understand you can use LVM to bind together the 4 hd in the same computer as a logical Volume, but how do you put together devices from many nodes (computers) as a single logical volume?
Last I checked I can get 200gb drives for about $100. That's me, just some guy buying one drive, paying 50 cents a gb, so I'm guessing they're paying a lot less buying a petabyte at a time, so what's with the 4x pricing? I understand the need for profit, but 4x?
my karma will be here long after I'm gone
Has nobody noticed that this solution isn't really any better than competing solutions?
Take Apple XServe for example. Whereas the PetaBox is $2/GB, Apple's much more advanced (much more reliable and redundant) solution costs only $2.27/GB... And I bet that if you were buying XServes in groups of 12 to match the 64TB PetaBox offering, Apple would give you a bulk discount taking it down to $2 or under.
I say more advanced because Apple's solution supports 2gbit fibre channels, hardware RAID, redundant PSUs, cache battery backup, redundant cooling, and the PSUs and hard drives are hotswap.
Oh, and while the Petabox is 1.6TB per 1U, Apple's solution is ~1.9TB per 1U (5.6 in 3U).
So, it would seem that in all respects, Apple's XServe is immensely superior. Price is only slightly higher and the bulk discounts that you could probably get in order to match the PetaBox offerings would probably make it no more expensive.
So what is special about this PetaBox stuff? Why would anybody buy it? How can they claim to be the best when they are clearly not? Why do we CARE about it? Why did archive.org choose it?
Apple has announced the new 1.5 Petabyte iPod. Holding 250,000,000 songs encoded in 96kbps AAC, or 3 songs in Intels new High Definition Audio format. Although it is the size of a small delivery truck, Apple has not increased the battery size from previous iPods to "conserve battery power". Even with Apple's newly implemented power conservations schemes, the iPod 1.5p gets aproximately .0034 seconds of battery life, before the battery melts in a spray of toxic superheated lithium and acid. Several have already been purchased by Michael Jackson, Sting, and Ben Afleck.
For this application, the performance differences aren't significant, and any CPU utilization differences don't matter either, but SATA's a lot more convenient mechanically, which is important when you're trying to cram thousands of parts into a manageable space, especially if you want the drives to be removeable.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Here is a link to their storage.. and so what, you still need to buy a computer to do anything with it.
A ppleStore?family=XserveRAID
http://store.apple.com/1-800-MY-APPLE/WebObjects/
Insane? No... Without considering extras it is not that much more expensive than the PetaBox solution.
I realize that you still need a computer, however you can hook up multiple XServe RAIDs to a single server via a fibre switch. I guess it comes down to bandwidth; if you load up an rack with 13 XServe RAIDs, 1 fibre switch, and then 1 server, you'll have 72.8TB in the rack, yes, but you'd be limited to whatever the max bandwidth of that one server is. Probably 1 or 2 gigabits, though you could go for something more customized.
Regardless, the XServes would still be immensely more reliable than the PetaBox nodes. If you want maximum uptime and reliability, then PetaBox is a disaster waiting to happen, unless they use software mirroring. And you'd have higher admin time for replacements, and worse management/monitoring tools.
Better hope that that La Femme Nikita chick doesn't have rabies.
The higher the technology, the sharper that two-edged sword.
Wouldn't something like Coraid's ATA-over-Ethernet based product EtherDrive product make more sense for building massive storage array like this?
There's good stuff on there. If you don't like the chub, don't look behind the cut when you see thenewwavechick or whatever; wait for i_like_sharks to post again. Not to mention that they have a policy now about wang-warnings on the cuts. Or, if you're allergic to even the possibility that wang may be lurking behind an unclicked cut, there's always show_your_pussy.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
If you made a document that filled the whole storage unit, would it be a "petafile"?
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year