Build Your Own $2.8M Petabyte Disk Array For $117k

← Back to Stories (view on slashdot.org)

Build Your Own $2.8M Petabyte Disk Array For $117k

Posted by Soulskill on Wednesday September 2, 2009 @02:09AM from the we-know-exactly-what-you'd-do-with-that-much-storage dept.

Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"

389 of 487 comments (clear)

Min score:

Reason:

Sort:

Not ZFS? by pyite · 2009-09-02 02:10 · Score: 2, Insightful

Good luck with all the silent data corruption. Shoulda used ZFS.

--
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
1. Re:Not ZFS? by anilg · 2009-09-02 02:41 · Score: 4, Interesting
  
  Get both Debian and ZFS.. Nexenta. Links in my sig.
  
  --
  http://dilemma.gulecha.org - My philospohical short film.
2. Re:Not ZFS? by Lord+Ender · 2009-09-02 02:59 · Score: 2, Insightful
  
  Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
  
  --
  A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
3. Re:Not ZFS? by chudnall · 2009-09-02 03:13 · Score: 3, Interesting
  
  What do you mean by more expensive? OpenSolaris with ZFS costs the same as Linux. And yes, You'll have to get up a lot less often in the middle of the night, since a few bad sectors aren't going to force a fail of the entire disk.
  
  --
  Disclaimer: Evolution comes with NO WARRANTY, except for the IMPLIED WARRANTY of FITNESS FOR A PARTICULAR PURPOSE.
4. Re:Not ZFS? by ajs · 2009-09-02 03:15 · Score: 5, Interesting
  
  Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
  Well... yes and no. When you've worked with high-end arrays, you learn that storage is only the beginning. NetApp and EMC provide far, far more. I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
  Since that time (over 10 years ago), the state of the art has improved over and over again. If you're buying a petabyte of storage, it's because you have a need that breaks most basic storage models, and the average sysadmin who thinks that storage is cheap is going to go through a lot of pain learning that he's wrong.
  Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity. Until then, there are demands placed on you when you administrate that much storage which demand a very different class of device than a Linux box with a bunch of raid cards.
  As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
5. Re:Not ZFS? by TooMuchToDo · 2009-09-02 03:17 · Score: 1
  
  Your bad sectors should be automatically reassigned via the drive's firmware anyway. And the RAID cards should be doing surface scans on a fairly regular basis. So no, you don't need to worry about individual errors on storage nodes when you're storing a chunk of data across several servers. Google for Hadoop and read how it handles storing chunks of data (two copies in one rack on different servers, one copy on a node in a different rack).
6. Re:Not ZFS? by mollog · 2009-09-02 03:20 · Score: 4, Insightful
  
  I have worked in disk storage design. This was a very cool project. This looks like a promising start and in some ways represents the future of storage; COTS parts. Others have pointed out some areas of improvement, cooling and the like.
  
  And I think I would use dual micro ATA motherboards, perhaps in their own cases to make them replaceable in case of failure.
  
  I realize that the layout of the drives was done with an eye toward airflow, but I personally don't like to see drives set on their edges. It's probably a personal bias, but I like to see drives set flat. The bearings seem to last longer that way. Just my personal experience.
  
  And, one final point, storage density is reaching the point where we can jam a lot of storage into a small space. Perhaps we have reached the point where we can start to spread things out and do things like put the drives in a separate enclosure or multiple enclosures. It makes designing, installing, and servicing easier. Use eSATA ports on the SATA cards to make external storage easier.
  
  --
  Best regards.
7. Re:Not ZFS? by ImprovOmega · 2009-09-02 03:43 · Score: 2, Informative
  
  I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
  Agreed. We've had similar experiences with HP EVA systems here at work with things like that, it's wonderful =)
  
  Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity.
  As much as I want to believe this, I know that just as in the past the business will find a way to fill an array of such drives. They'll decide to do something silly like 24/7 recording of 1000 different cameras, or hourly snapshots of critical systems going back 3 months "just in case", or something. If you have seemingly unlimited amounts of cheap storage, the business *will* find a way to fill it.
8. Re:Not ZFS? by pyite · 2009-09-02 04:04 · Score: 1
  
  Google for Hadoop and read how it handles storing chunks of data (two copies in one rack on different servers, one copy on a node in a different rack).
  I admit I don't know much about Hadoop, but how do you know a block wasn't corrupted? It very well might store checksum data at the object level, but ZFS stores at the block level. Without checksumming, even if you have multiple copies of your data, it's hard to know which is the right one if one copy gets corrupted silently.
  
  --
  "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
9. Re:Not ZFS? by NotBornYesterday · 2009-09-02 04:05 · Score: 3, Insightful
  
  As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
  Or they have been bought by one of the bigger storage companies.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
10. Re:Not ZFS? by TooMuchToDo · 2009-09-02 04:14 · Score: 1
  
  http://my.safaribooksonline.com/9780596521974/ch04
  
  4.1.1. Data Integrity in HDFS
  HDFS transparently checksums all data written to it and by default verifies checksums when reading data. A separate checksum is created for every io.bytes.per.checksum bytes of data. The default is 512 bytes, and since a CRC-32 checksum is 4 bytes long, the storage overhead is less than 1%.
  Datanodes are responsible for verifying the data they receive before storing the data and its checksum. This applies to data that they receive from clients and from other datanodes during replication. A client writing data sends it to a pipeline of datanodes (as explained in Chapter 3), and the last datanode in the pipeline verifies the checksum. If it detects an error, the client receives a ChecksumException, a subclass of IOException.
  When clients read data from datanodes, they verify checksums as well, comparing them with the ones stored at the datanode. Each datanode keeps a persistent log of checksum verifications, so it knows the last time each of its blocks was verified. When a client successfully verifies a block, it tells the datanode, which updates its log. Keeping statistics such as these is valuable in detecting bad disks.
  
  Aside from block verification on client reads, each datanode runs a DataBlockScanner in a background thread that periodically verifies all the blocks stored on the datanode. This is to guard against corruption due to "bit rot" in the physical storage media. See Section 10.1.4.3 for details on how to access the scanner reports.
11. Re:Not ZFS? by obarthelemy · 2009-09-02 04:29 · Score: 1
  
  a petabyte should be enough for everyone.
  
  --
  The Cloud - because you don't care if your apps and data are up in the air.
12. Re:Not ZFS? by Anonymous Coward · 2009-09-02 05:02 · Score: 1, Interesting
  
  That is scary as hell. You didn't know the drive failed??? Why?? How the heck did they know? Do you really provide them access to your data 24/7?? That's crazy!
  The biggest argument against the large storage companies, is that large, dynamic companies don't use them. Amazon doesn't. Google doesn't. Facebook doesn't. Think smarter, not more expensive.
13. Re:Not ZFS? by ajs · 2009-09-02 05:06 · Score: 1
  
  Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity.
  As much as I want to believe this, I know that just as in the past the business will find a way to fill an array of such drives.
  Being able to treat it as a commodity doesn't mean you won't have a need for arrays of them. We'll store full video of every doctor's appointment you've ever had. We'll store the annotated DNA of everyone on the planet. We'll store a feed of the entire sky 24/7 at 50x zoom in order to track events in our solar system.
  Certainly there are some services that are probably already in need of multiple petabytes (I'm looking at you, Flickr and YouTube).
14. Re:Not ZFS? by lukas84 · 2009-09-02 05:19 · Score: 1
  
  Amazon and Google are both IT companies at the core.
  Facebook, well, too, to some extent.
  However, most IT deployments are not in IT centric companies. They don't build their own software, like Amazon, Google and Facebook do.
15. Re:Not ZFS? by codeguy007 · 2009-09-02 05:26 · Score: 1
  
  They aren't using RAID cards. Just SATA interface cards.
16. Re:Not ZFS? by iphayd · 2009-09-02 05:28 · Score: 2, Interesting
  
  On a similar note, they claim that they will backup any one computer for $5/month. Well, my one computer happens to be the backup node for my SAN, so they're going to need about 15 TB (It's a small SAN) to have 30 day backups for me. Please note, that all of the files on my SAN are under 4GB and I have a SAN, not a NAS, so my servers see it as a native hard drive.
17. Re:Not ZFS? by TooMuchToDo · 2009-09-02 05:34 · Score: 1
  
  My mistake. In that case you need to ensure checksums are being done, and you have the data stored on more than 1 box.
18. Re:Not ZFS? by Zerth · 2009-09-02 05:35 · Score: 1
  
  You can get 5-bay esata enclosures for around $150 now. 2-bay have dropped a ton in price, $20-$30.
  I got a 5-bay silicon image-based one(same family of backplanes as the article) when they were closer to $200 and it has been awesome except for the fan noise. But that's why it's external, so I can stick it behind a wall.
19. Re:Not ZFS? by FoolishBluntman · 2009-09-02 05:40 · Score: 3, Interesting
  
  I have news for you. The high end boxes from EMC, NetApp and the like have silent data corruption too!
20. Re:Not ZFS? by codeguy007 · 2009-09-02 05:48 · Score: 1
  
  Actually they are RAID cards just really cheap ones. I doubt they scan the drives periodically.
21. Re:Not ZFS? by FoolishBluntman · 2009-09-02 05:50 · Score: 3, Informative
  
  >That is scary as hell. You didn't know the drive failed??? Why?? How the heck did they know? Do you really provide them access to your data 24/7?? That's crazy! No moron, high end disk arrays "phone home" either by dedicated phone line or email when a disk failure occurs. The disk array immediately starts rebuilding a RAID set using a hot spare. The disk you receive in the mail or from an on-site call is to replace the failed drive. They don't need access to your data, just the status of the array subsystem. >The biggest argument against the large storage companies, is that large, dynamic companies don't use them. Amazon doesn't. Google doesn't. Facebook doesn't. The only company in your list that doesn't use a large storage company is Google. Most companies don't have the in-house expertise to keep trace of their data. They out source a lot of the work so they can concentrate on their core business.
22. Re:Not ZFS? by DamnStupidElf · 2009-09-02 05:59 · Score: 1
  
  15 TB is a drop in the bucket to them. Only a third of the capacity in their $8000 4U storage box.
23. Re:Not ZFS? by fiji · 2009-09-02 06:01 · Score: 1
  
  The explicitly mentioned mdadm so they are not using the "raid" on the cards (anyway, at that price it almost always is software raid implemented in the driver).
  With the linux md software raid you can kick off background scans. Read the md(4) man page and linux/Documentation/md.txt for the gory details. There is a lot of good stuff in there.
  I love Linux software raid. It has been fast and robust.
24. Re:Not ZFS? by anegg · 2009-09-02 06:04 · Score: 2, Informative
  
  NetApp provides a function in the storage servers that they sell whereby significant events such as drive failures as well as general health check information can be sent to NetApp if you choose. The information is sent via e-mail or an HTTP POST (if I recall correctly). If you have support services, they monitor your installation via these messages, and will automatically send out a new drive if you have drive replacement services, for example. They do not have remote command access to your storage server (unless you chose to give them that by making the interface available outside of your firewall).
25. Re:Not ZFS? by iphayd · 2009-09-02 06:12 · Score: 2, Informative
  
  So you are saying that they're happy to get their return of investment on their hardware alone in 44 years? I doubt it.
26. Re:Not ZFS? by jisom · 2009-09-02 06:13 · Score: 1
  
  I was think similarly. A15 drive enclosure with built in power supply and with esata. Call it a drive brick :)
27. Re:Not ZFS? by nxtw · 2009-09-02 06:24 · Score: 1
  
  Actually they are RAID cards just really cheap ones.
  They are using Silicon Image SATA controllers, which do not have hardware RAID. They provide a software RAID implementation as a Windows driver and a BIOS on the card that can configure arrays and boot Windows off of a fake RAID array.
28. Re:Not ZFS? by rnturn · 2009-09-02 07:10 · Score: 2, Insightful
  
  Because, you know, ZFS cures cancer and stops bad breath, too. No to be too snarky but jeez... what did everybody do before ZFS came along?
  
  --
  CUR ALLOC 20195.....5804M
29. Re:Not ZFS? by phoenix_rizzen · 2009-09-02 07:14 · Score: 1
  
  http://forums.freebsd.org/showthread.php?t=3689
  We did something similar using off-the-shelf hardware, FreeBSD, and ZFS. Ours came out to just under $10,000 CDN, but we used 3Ware RAID controllers instead of generic SATA controllers, Tyan server-class motherboards, server-grade redundant/modular PSU, and Intel multi-port gigabit NICs.
  It's still a lot less expensive than some of the backup systems I've seen school districts "convinced" to use.
30. Re:Not ZFS? by greg1104 · 2009-09-02 08:11 · Score: 1
  
  It's not silent when those errors get caught by their "deduplicate" step. You can certainly still get corruption with ZFS (imagine bad RAM flipping a bit after the checksum is verified but before it's sent over the network), so you can't rely just on it to keep your data safe. Since you need a higher-level integrity check anyway, it's completely reasonable to design a system such that all of the integrity checks are at the application level instead, particularly if you plan to store duplicate copies of the data.
31. Re:Not ZFS? by ToasterMonkey · 2009-09-02 08:16 · Score: 1
  
  You seem to be laboring under the mistaken impression that this company is in the storage array business. It is not. It is an online backup service.
  Low-end hackjob My First SAN (TM) arrays, and backup service provided on My First SAN (TM) arrays are both equally concerning in my book. One just lets you hide that fact better, but these guys are out bragging about it. Not smart.
32. Re:Not ZFS? by psm321 · 2009-09-02 08:28 · Score: 1
  
  SAS is way too expensive. Remember, they were trying to do things cheap. (I looked into SAS myself for personal use, and while SAS controllers are affordable, expanders, which you need for any large setup, are not.)
33. Re:Not ZFS? by Cramer · 2009-09-02 08:39 · Score: 1
  
  And if you had it setup to send you those email alerts, you would've known about the failure before they did.
  What BackBlaze leaves out is the quality of equipment from the "expensive" vendors. They built their toy with the cheapest shit they could find -- except for the power switch, that was $30! NetApp, EMC, et. al. don't go down to Best Buy and get a shopping cart full of drives (I've done that, for the record.) The big boys use much more stable and reliable SCSI hardware -- Fibre Channel and SAS. Lemme tell ya', today's SATA/IDE drives are complete shit compared to those FC/SAS drives. I personally have 10 year old FC drives still in service today. I don't have a single IDE drive that has lasted more than about 3 years; and if you leave one sitting on a shelf, your data is decaying by the nanosecond.
  There's a very good reason why some things are cheap and some things are expensive.
  If their infrastructure can support multiple drive failures per day, then it might be worth it. I'm not in that business, so I've not taken a detailed look at the numbers. I'm an admin who wants my data to be there when I wake up every morning. And I don't want the headache of having to keep multiple copies on multiple servers for the times drives and servers fail.
34. Re:Not ZFS? by Anonymous Coward · 2009-09-02 09:47 · Score: 1, Informative
  
  As long as there is enough intelligence in their software that runs higher up in the stack that it can accommodate for the expected device failures, they should be good to go.
  Having read their blog post on the hardware, I'm going to just go ahead and assume that their software stack is every bit as half-assed, cheap, and hackish as their hardware.
  I mean, these guys are pinching pennies to an unbelievable extent. Did you see how rickety their mechanical support system for the drives is? Real hardware vendors build things which can actually be shipped. You could never ship these guys' box anywhere -- it is so fragile looking that I'd be afraid to do so much as tip it on its side once it's assembled. They probably only get away with it by hand building each box on site.
  They used a consumer grade Core 2 Duo motherboard, the Intel DG43NB. By chance, this is the board I used to build my girlfriend's gaming PC earlier this year. It is a reasonable (but not great - I was rushed and it was what Fry's had in stock) board to use for a medium performance gaming/email/web PC. It is NOT, however, a server board. Not even close. Not even if you're trying to build storage on the cheap.
  As a consequence of this motherboard choice, they had to use shitty SATA cards to talk to the drives. Three consumer grade 1x PCIe 2-port cards, and - get this - one 4-port PCI card. Yes, that's right, they bottleneck 40% of the SATA ports through a single 32-bit 33 MHz PCI slot. Noooo, it's just too spendy to buy a $500 board actually designed for server purposes with lots of PCIe slots... they just HAD to have the $85 consumer grade motherboard!
  They used two gamer ATX power supplies instead of selecting a server grade 4U power supply.
  On and on. Everything about it screams "we have no actual experience building hardware". In fact, everything also screams "We don't even have any experience operating decent server grade hardware", because they'd run away screaming from their own hardware design if they had said experience.
35. Re:Not ZFS? by eharvill · 2009-09-02 09:57 · Score: 1
  
  Umm....JBOD, anyone?
  
  --
  At night I drink myself to sleep and pretend I don't care that you're not here with me
36. Re:Not ZFS? by plover · 2009-09-02 10:04 · Score: 3, Interesting
  
  They're betting on the MTTF of the drives, on RAID, and on redundant system backups.
  Yes, it's cheap hardware. Yes, cheap hardware fails more often than expensive hardware. Yes, cheap hardware is slower than expensive hardware. But you have to look at the offsets: they are building a backup service, where they don't need "instant" data access speeds. As for drive failures, I have some experience there. I have 57,000 cheap-ass consumer drives in service, and over 10,000 of them are 11 years old. They're dying at the rate of about ten failures per day. The key is to build your processes to tolerate and handle failures.
  As long as your redundant systems are keeping copies of the data, and you understand exactly what the impact is of a failed component as well as have a recovery plan in place, why not use cheap hardware? Let's do a bit of math. The guy had a photo of himself standing behind about 18 of these boxes. That's 810 drives. If we lowball cheap drives at 300,000 hours MTBF, he'll see an average of two failures per month. It might take him $200 and an hour to recover each failed drive. We could keep doing the math on each component, but I suspect this is still a complete and total bargain that will meet his business needs very well.
  It may not be as shiny as EMC or NetApp, and you have to do the legwork yourself, but why spend the extra money on a system that would provide him with "too much service"? From an ROI perspective, this guy is probably going to do very well, even though he may drive a few sysadmins crazy in the process.
  
  --
  John
37. Re:Not ZFS? by SETIGuy · 2009-09-02 10:44 · Score: 1
  
  Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
  Essentially yes. When our primary file server was a NetApp that was on a service contract, the way were notified of a disk failure was that someone from the loading dock would come in and hand us a disk drive that arrived via overnight FedEx or a same day courier, depending upon when the failure had happened. Our job was to go to the filer, pull out the drive with the red light, and pop in the new drive (which arrived already installed in the drive tray.) The new drive would automatically become a hot spare without operator intervention.
  Given infinite money, I would definitely go back to NetApp. Therein lies the rub.
  
  --
  Support SETI@home
38. Re:Not ZFS? by poopdeville · 2009-09-02 11:57 · Score: 1
  
  They ate breath mints and died of cancer.
  
  --
  After all, I am strangely colored.
39. Re:Not ZFS? by therufus · 2009-09-02 12:10 · Score: 2, Interesting
  
  You need to look at the grand scheme of things. Sure, you may get 5-10% of customers using massive amounts of data (over 500Gb) but when 90-95% of your customers are home users and small businesses who don't have their own data centers, and they may only have a 50Mb backup, their lack of use offsets the heavy users.
  Imagine if in a 1Pb server, 750Tb of data was used by 10,000 individuals paying $5/mth and the other 250Tb was used by 50 individuals paying $5/mth. I failed at mathematics at school, but I'm sure the 10k will pay the data center costs that would be incurred by the 50.
  
  --
  You moved your mouse. Please restart Windows for changes to take effect.
40. Re:Not ZFS? by Onthax · 2009-09-02 12:43 · Score: 1
  
  It's like wilson's law Wilson's law states that expenses expand to fill the income available kinda the same thing here, storage needs expand to fill the diskspace available
41. Re:Not ZFS? by isorox · 2009-09-03 02:36 · Score: 1
  
  Essentially yes. When our primary file server was a NetApp that was on a service contract, the way were notified of a disk failure was that someone from the loading dock would come in and hand us a disk drive that arrived via overnight FedEx or a same day courier, depending upon when the failure had happened. Our job was to go to the filer, pull out the drive with the red light, and pop in the new drive (which arrived already installed in the drive tray.) The new drive would automatically become a hot spare without operator intervention.
  On our HP servers we have onsite spares. Detect the drive has failed (nagios goes red), walk to server, replace drive, buy new spare.
  For our sun servers, we have a gold support contract. We phone up Sun, and if we're lucky we get an idiotic engineer turn up within a few hours. Assuming there's a spare drive in London, which there wasn't last time. They replace the drive and watch the light blink lots, which requires just as much manpower on our end, takes longer, but does save the cost of a replacement drive.
  By the time drives are no longer available, we have replaced the system.
42. Re:Not ZFS? by chrish · 2009-09-03 02:49 · Score: 1
  
  Requesting links to $20-30 2-bay enclosures, please. Can't seem to find any in Canada that are under about $50...
  
  --
  - chrish
43. Re:Not ZFS? by Zerth · 2009-09-03 04:44 · Score: 1
  
  Here's one.
  $20 but that's USD. Shipping and import fees might bump that up to 35-40 CA.
  I've never used that model, so no clue about the reliability.
44. Re:Not ZFS? by hardwarefreak · 2009-09-03 09:33 · Score: 1
  
  I have worked in disk storage design. This was a very cool project. This looks like a promising start and in some ways represents the future of storage; COTS parts. Others have pointed out some areas of improvement, cooling and the like.
  These guys will either become a Nexsan or NetAPP. Or, they'll starve and die. Anyone, _anyone_ can build out a huge set of cots parts, sell it, and leave the management burden to the customer. Unless this customer is a U.S. nuclear weapons lab or university environment with a ton of essentially cheap doctoral candidate talent and 'free' man-time, this kind of solution fails every time because it's not manageable at scale. Once you add the management capability to the hardware and write the management software interface, you ARE the Nexsan of about 10 years ago. Currently, Nexsan is the closest thing you can get to "cots" prices while still getting some value add capability and single source warranty contact.
45. Re:Not ZFS? by drsmithy · 2009-09-03 19:09 · Score: 1
  
  If we lowball cheap drives at 300,000 hours MTBF, he'll see an average of two failures per month. It might take him $200 and an hour to recover each failed drive.
  Personally, I'd be _extremely_ interested in seeing those numbers, because from my back-of-the-envelope calculations, it would take anywhere from 2-4 weeks to rebuild the arrays in each node (ie: in case of disk failure, or as part of initial commissioning), and on the order of 1-2 weeks to replicate all the data on a single node (in case they decided to write off an entire node).
  My gut feeling, based on the above, is that the only thing keeping these guys from a catastrophic data loss event is luck (or they've already had some, but the customers weren't high profile enough to matter). Either that or each pod's worth of data costs a hell of a lot more than ~$180k to store (the power and cooling costs alone would be a small fortune).
46. Re:Not ZFS? by plover · 2009-09-08 10:51 · Score: 1
  
  I meant an hour of labor cost to recover the failed drive, spent swapping the failed hardware and kicking off the rebuild. A human doesn't have to sit there for $75 per hour babysitting a rebuild, whether the rebuild takes one hour or five hundred hours. (One hour might be lowballing it, because there will obviously be followup checks to make sure the rebuilds are completing.
  Where it's going to get interesting for him is when he loses two drives in an array simultaneously. At that time he's on the hook -- can he get both drives recovered before he loses a third, thus losing data? And I didn't go over the SATA expander to RAID map, but I'm assuming he's spread his risk across multiple expansion boards. It'd truly suck if one of those boards went out, corrupting three or more drives from the same array simultaneously!
  I don't know. The guy says he has a lot of custom software to help with building and rebuilding the systems. Maybe he's got exactly what it takes.
  
  --
  John
You know why Amazon charges that much? by Nimey · 2009-09-02 02:12 · Score: 4, Insightful

Support.

--
Hail Eris, full of mischief...

E pluribus sanguinem
1. Re:You know why Amazon charges that much? by bytethese · 2009-09-02 02:31 · Score: 5, Funny
  
  For the 2.683M difference, that support better come with a "happy ending" for the entire staff...
2. Re:You know why Amazon charges that much? by drooling-dog · 2009-09-02 02:41 · Score: 4, Funny
  
  Damn. I was going to offer support for half of that price until I saw this new requirement...
3. Re:You know why Amazon charges that much? by machine321 · 2009-09-02 02:46 · Score: 3, Funny
  
  For 2.683M, you can probably afford to outsource that part.
4. Re:You know why Amazon charges that much? by Richard_at_work · 2009-09-02 02:47 · Score: 5, Insightful
  
  And backup, redundancy, hosting, cooling etc etc. The $117,000 cost quoted here is for raw hardware only.
5. Re:You know why Amazon charges that much? by johnlcallaway · 2009-09-02 02:51 · Score: 4, Insightful
  
  It's great having someone tell you they will be there in three hours to replace your power supply, that you then have to dedicate a staff person to be with when they go out on the shop floor because some moron in security requires it. If they had just left a few spare parts you could do it yourself because everything just slides into place anyway.
  
  That 2.683M also pays for salaries, pretty building(s), advertising, research, conventions, and more advertising.
  
  I could hire a couple of dedicated staff to have 24x7 support for far less than 2.683M, plus a duplicate system worth of spare parts.
  
  This stuff isn't rocket science. Most companies don't need high-speed, fiber-optic disk array subsystems for a significant amount of their data, only for a small subset that needs blindingly fast speed. The rest can sit on cheap arrays. For example, all of my network accessible files that I open very rarely but keep on the network because it gets backed up. All of my 5 copies of database backups and logs that I keep because it's faster to pull it off of disk than request a tape from offsite. And it's faster to backup to disk, then to tape.
  
  BackBlaze is a good example of someone that needs a ton of storage, but not lightening fast access. Having a reliable system is more important to them than one that has all the tricks and trappings of an EMC array that probably 10% of all EMC users actually use, but they all pay for.
  
  --
  I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
6. Re:You know why Amazon charges that much? by interval1066 · 2009-09-02 02:56 · Score: 5, Insightful
  
  Backup: depends on the backup strategy. I could make this happen for less than an additional 10%. But ok, point taken.
  Redundancy: You mean as in plain redundancy? These are RAID arrays are they not? You want redundancy at the server level? Now you're increasing the scope of the project which the article doesn't address. (Scope error)
  Hosting: Again, the point of the article was the hardware. That's a little like accounting for the cost of a trip to your grandmother's, and factoring in the cost of your grandmother's house. A little out of scope.
  Cooling: I could probably get the whole project chilled for less than 6% of the total cost, depending on how cool you want the rig to run.
  I think you're looking for a wrench in the works where none exist.
  
  --
  Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
7. Re:You know why Amazon charges that much? by MrNaz · 2009-09-02 02:56 · Score: 5, Insightful
  
  Redundancy can be had for another $117,000.
  Hosting in a DC will not even be a blip in the difference between that and $2.7m.
  EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.
  
  --
  I hate printers.
8. Re:You know why Amazon charges that much? by geekprime · 2009-09-02 02:58 · Score: 1
  
  SO you're saying EMC provides all that?
  Please think before you post.
9. Re:You know why Amazon charges that much? by MoonBuggy · 2009-09-02 02:59 · Score: 4, Interesting
  
  The lowest cost of an (apparently) comparable solution on their site is from Dell, at $826,000 per PB. That includes hardware and support but still requires hosting, cooling and so on at extra cost. To quote backup and redundancy as part of the cost seems misleading, since none of the solutions appear to include that.
  Basically, in order to compare favourably to the Dell units simply requires that one can get support for less than $709,000. If you want to throw in backup and redundancy, then buy twice as many units - you've still got change from half a million compared to the single Dell unit in order to cover the extra power, support and cooling costs, not to mention that support costs don't necessarily scale linearly.
10. Re:You know why Amazon charges that much? by MadKeithV · 2009-09-02 03:21 · Score: 5, Funny
  
  Just make sure the wife doesn't catch you unit testing the outsourced part.
11. Re:You know why Amazon charges that much? by Richard_at_work · 2009-09-02 03:53 · Score: 1
  
  The wrench already exists, because the summary expanded the scope to comparisons with Amazon, not myself.
12. Re:You know why Amazon charges that much? by NotBornYesterday · 2009-09-02 04:07 · Score: 4, Funny
  
  "Sorry, I have to stay late tonight honey, ... I'm hard at work."
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
13. Re:You know why Amazon charges that much? by Anonymous Coward · 2009-09-02 04:17 · Score: 2, Funny
  
  Our support model is close to that. We give you the lube, then we tell you to $&%* yourselves.
14. Re:You know why Amazon charges that much? by adisakp · 2009-09-02 04:18 · Score: 1
  
  Not just ongoing support.... there are large initial costs involved besides "material expense" -- especially when you're talking Petabyte data centers. For example, the cost to assemble the hardware, interconnect everything, install the software on hundreds of servers (even cloning takes time), testing the configuration, etc. Plus buying or renting the real estate to house the unit. Oh, and making sure the building has adequate filtered power and air conditioning.
15. Re:You know why Amazon charges that much? by treat · 2009-09-02 05:15 · Score: 1
  
  For the 2.683M difference, that support better come with a "happy ending" for the entire staff...
  EMC during the dot-com boom would indeed see to this for you if you bought a few million in storage.
16. Re:You know why Amazon charges that much? by Slashdot+Parent · 2009-09-02 05:21 · Score: 1
  
  What if I need to store a petabyte of data right fricken' now? As in, not after you cobble together the parts for 2 of these Backblaze storage pods and get them set up at geographically diverse colo facilities (we need geographic redundancy, remember. With S3, I can fedex Amazon as many eSATA or USB 2.0 devices I want, and they will load the data into S3 for me over their 500Mbps links.
  What if I need this data to be immediately accessible to millions of users at edge locations that are geographically close to them for low-latency?
  What if I want to delete a petabyte of data tomorrow? Do I get to stop paying for the Backblaze?
  Is your Backblaze solution subject to bit rot? Can you be sure?
  Amazon S3 is not the solution to every storage use case, but neither is Backblaze's solution. I'm not trying to argue that Backblaze would be better served by Amazon S3--only that S3 has its place in the world.
  
  --
  They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
17. Re:You know why Amazon charges that much? by AMuse · 2009-09-02 05:26 · Score: 1
  
  It's great having someone tell you they will be there in three hours to replace your power supply, that you then have to dedicate a staff person to be with when they go out on the shop floor because some moron in security requires it.
  Not to pick apart your comments too much, but I wouldn't allow a support (sub) contractor unrestricted access to the floor of our datacenter; there's too much they can accidentally screw up and then claim it wasn't them because no one was looking. If they're given permission to be on the floor with an open rack to do maintenance, someone should be watching them. I don't think that qualifies as moronic.
18. Re:You know why Amazon charges that much? by Score+Whore · 2009-09-02 06:32 · Score: 4, Interesting
  
  Redundancy can be had for another $117,000.
  Hosting in a DC will not even be a blip in the difference between that and $2.7m.
  EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.
  First these aren't even storage arrays in the same sense that EMC, Hitachi, NetApp, Sun, etc. provide. The only protocol you can use to access your data is https? WTF! Second the Hitachi array in my data center doesn't put 67 TB storage behind half a dozen single points of failure the way this thing does. Third the Hitachi array in my data center doesn't put 67 TB behind a dinky gigabit ethernet link. My Hitachi will provide me with 200,000 IOPS with 5 ms latency. I can hook a whole slew of hosts up to my SAN. I can take off-host, change-only copies of my data so backups don't bog down my production work. I can establish replication between the Hitachi here in this building and the second array four hundred miles away with write order fidelity and guaranteed RPOs.
  Comparing this thing to enterprise class storage is like some sixteen year old adding a cold air intake and a coat of red paint to his Honda civic then running around bragging that his car is somehow comparable to a Ferrari ("look they're both red!") Every time I see something like this the only thing I learn is that yet another person doesn't actually "Get It" when it comes to storage.
  HelloWorld.c is to the Linux kernel as this thing is to the Hitachi USP-V or EMC Symmetrix.
19. Re:You know why Amazon charges that much? by profplump · 2009-09-02 06:33 · Score: 1
  
  If you're going to FedEx drives to Amazon, couldn't you just FedEx the Backblaze around instead?
  I'm also wondering how S3 gets you data at geographically diverse locations any faster than overnight shipping a box full of disks -- if you're moving large amounts of data from the US to Russia FedEx is probably the fastest, cheapest solution no matter how you host the data once it gets there.
20. Re:You know why Amazon charges that much? by Sandbags · 2009-09-02 06:51 · Score: 1
  
  That would be equally required under either solution. It's a wash.
  What is NOT a wash, is all the "higher level" dedupe, replication, high avaialbility, indexing "magic." Which, that's what you're paying for when you buy a Tier 1 parallel array system from EMC/HDS/IBM (not so much NetApp).
  If you're deploying a few hundred TB, don't have an internal team to write that software for you, and don't have the patents or licenced IP to use such technology, then you spend big bucks acquiring it as part of the array solution.
  If you;re deploying Petabytes, and have a dedicated storage team and on staff people that can write a unique, highly available, high performance dedupe and excryption system for you without getting sued by a patent troll, then you can stand up $100K PB arrays...
  
  --
  There is no contest in life for which the unprepared have the advantage.
21. Re:You know why Amazon charges that much? by Sandbags · 2009-09-02 06:55 · Score: 2, Informative
  
  "Redundancy can be had for another $117,000." ...plus the inter SAN connectivity ...plus the SAN Fabric aware write plitting hardware and licensing ...plus the redundancy aware server connected to that SAN fabric ...plus the multipath HBA licensing for the servers ...plus multiple redundant HBAs per server and twice as many SAN fabric switches ...plus journaling and rollback storage, and block level deduplication within it (having a real-time copy is useless if you get infected with a virus). ...plus another real-time asynchonously replicated SAN at an offsite location at least 100 miles away ...plus the ISP connection to the offsite ...plus the staff to support an additional site and all the complex software and clusters ...plus cluster aware operating systems
  This is why Tier 0 arrays cost in the millions...
  
  --
  There is no contest in life for which the unprepared have the advantage.
22. Re:You know why Amazon charges that much? by Nimey · 2009-09-02 07:12 · Score: 1
  
  Your assumptions aren't even wrong.
  
  --
  Hail Eris, full of mischief...
  
  E pluribus sanguinem
23. Re:You know why Amazon charges that much? by cffrost · 2009-09-02 07:15 · Score: 1
  
  EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.
  They're customers.
  
  --
  Thank you, Edward Snowden.
  
  "Arguments from authority are worthless." —Carl Sagan
24. Re:You know why Amazon charges that much? by Huge_UID · 2009-09-02 07:18 · Score: 1
  
  Profit is one of the "etc etc" items.
25. Re:You know why Amazon charges that much? by Vancorps · 2009-09-02 07:30 · Score: 1
  
  You've be surprised how cheap long haul fiber can be. Back-end networks for data-centers that are geographically diverse do not transmit over the Internet so bandwidth charges are drastically reduced. Additionally depending on your latency requirements you could easily and fairly cheaply use satellite communication services for bulk uploading. I could see a system where the initial sync is done with satellite and then maintained using remote differential compression over a private fiber link.
  With gigabit fiber between locations you could easily transmit 45tb of data before FedEx could arrive. When you get into larger numbers I could see that becoming feasible again. Of course 40gig Satellite repeaters could be setup which dramatically reduces the synchronization window. When you don't care about 2-6 seconds of latency satellite is great for burstable transfers.
  I suspect shipping costs for a 100lbs box would along with associated hardware failures due to mistreatment and it's cheaper to just light up some long haul fiber. Vegas to Scottsdale is about a grand for 30meg throughput with some places even charging less depending on who you know.
26. Re:You know why Amazon charges that much? by Vancorps · 2009-09-02 07:38 · Score: 1
  
  You don't have security cameras inside the datacenter? That's scary!
27. Re:You know why Amazon charges that much? by AMuse · 2009-09-02 07:44 · Score: 1
  
  It's not an issue of not having cameras in the datacenter (we do), but an issue of having someone right there to -prevent- the tech from doing anything stupid. "Hey, watch out, don't plug your cell phone recharger into that protected outlet" for instance. Security cameras can help you write the incident report after your outage and possibly prosecute malice, but that's not much comfort. The outage has been had, the damage has been done.
  The technicians sent out by support contractors aren't familiar with the potential pitfalls of your datacenter, and often aren't even competent at anything beyond the exact task they're being sent out to do. Electrical guys repairing a UPS battery could easily decide that shutting off the power "just for a second" is an acceptable thing to do, and security cameras are no substitute for supervision.
28. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 08:21 · Score: 1
  
  "Redundancy: You mean as in plain redundancy? These are RAID arrays are they not?"
  They seem not even looking for redundancy at "the brick level". It's obvious from its design those boxes either run or doesn't run; the only redundancy is software RAID6, neither EEC memory nor redundant power suplies... even if it's only a malfunctioning disk it's obvious they wouldn't try to correct it and get the machine recovered (did you see how the disks are packaged?): they'll decomission the node, take it to the lab, refurbish, reflash and redeploy it as a new some few days later. They even say that data integrity is above and beyond the "brick" and that it is not a solution 'per se' but a part of one.
  "Cooling: I could probably get the whole project chilled for less than 6% of the total cost, depending on how cool you want the rig to run"
  From the see of the design with those disks so densely packaged, they'll need almost chilling temperature off the rack to refresh them, specially the disk rank in the middle. The only rack-level photo shows them deployed on opened racks they'd probably would save quite a lot of money by forcing air directly onto the boxes.
29. Re:You know why Amazon charges that much? by jon3k · 2009-09-02 08:32 · Score: 1
  
  I couldn't agree more. If you post some expensive system everyone here will immediately tell you how they could build the same thing in their garage over the weekend for pennies. Post a cheap solution and they're so quick to point out that "great service and support" you wouldn't be getting with the commercial (and almost always proprietary) solution. What the hell is going on here?
30. Re:You know why Amazon charges that much? by ToasterMonkey · 2009-09-02 08:32 · Score: 2, Informative
  
  My Hitachi will provide me with 200,000 IOPS with 5 ms latency.
  While that is just a TAD overkill for disk backup, these guy's $.11/GB is not something I'd trust my backups on.
  
  HelloWorld.c is to the Linux kernel as this thing is to the Hitachi USP-V or EMC Symmetrix.
  You nailed it.
  Service Time/IOPS is less important here than trustworthy and proven controller hardware & software, and built in goodies like replication. That's why I would trust disk backups to Sun, NetApp, Hitachi, EMC, and not these people. Possibly home systems I guess, but bragging about homemade storage is a real turnoff.
31. Re:You know why Amazon charges that much? by johnlcallaway · 2009-09-02 08:35 · Score: 1
  
  So ... I'm trusting a tech to work on something that has data and hardware worth millions of dollars because I don't have the knowledge of the hardware that he has, but I can't leave him alone because he might mess something up?? And how am I supposed to stop him from unplugging the wrong component at the wrong time when I don't know which is the right component?
  
  I agree that I wouldn't let someone I don't know walk around in the data center. But I've worked at companies where I've had the same tech for a couple of years. One company said we had to be with them at all times. I'm not talking about letting him in the door and walking him to the rack after discussing what he is going to do, I'm talking about not being able to leave his side the entire time he is working on a machine.
  
  It is moronic to have a highly paid staff member waste hours watching over someone when that staff member knows less about the hardware than the tech does and when the operations staff is just on the other side of the freakin' window and can see him at all times too.
  
  I'd rather take 1/5th of the 2.2M dollars and have my own full time support staff to manage the disks that I don't have to watch over, and get immediate response times.
  
  --
  I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
32. Re:You know why Amazon charges that much? by Vancorps · 2009-09-02 09:01 · Score: 1
  
  I would argue that plugging a cell phone charger into any accessible outlet should not cause down time. Cabinets should be locked. Most techs are usually pretty careful especially if you have big support agreements with them.
  I still agree that it would be best to have someone there with the tech if for no other reason than to know for sure what part went bad. For the times when this isn't possible though the risk should be minimal and there is a reason we maintain off-site business continuity.
  For those with smaller shops that can afford that level of redundancy then yes, by all means, supervise the hell out of the tech because any failure can cause a lot more damage in smaller networks.
33. Re:You know why Amazon charges that much? by AMuse · 2009-09-02 09:18 · Score: 1
  
  Perhaps it's moronic to have a key and highly paid staff member watching over a tech in the datacenter, but I never said we had to have it be a highly paid senior person. We generally have our interns and junior folks supervise tech staff. They're not there to supervise the tech in their area of expertise, they're there to keep watch over the tech so they STICK to that area and don't accidentally muck up the rest of the datacenter.
34. Re:You know why Amazon charges that much? by robateastridge · 2009-09-02 09:39 · Score: 1
  
  "I don't know how long this will take, honey. Tech support is helping me take my equipment up and down a few times..."
35. Re:You know why Amazon charges that much? by johnlcallaway · 2009-09-02 10:02 · Score: 1
  
  I must have worked for the only moronic security department that required the person who made the tech call to be the one that watched the tech. The excuse was that anyone else (i.e. junior, intern, operator, etc.) wouldn't know enough or have enough incentive to watch over them. Security departments in some companies don't have to make sense because they make the rules and tell anyone who doesn't like it to stuff it.
  
  My own opinion is that if someone can't trust a Sun or EMC rep that they have known for awhile to only work on the machine they are being asked to work on and not muck up the rest of the data center, then why are they buying their equipment and paying them support to fix it.
  
  Of course, this is the same security department that let anyone that walked in the door and was given sys admin privs access to the floor and all machines as soon as the paperwork cleared. Even though the reality was the a sys admin rarely needed access to the data center floor and a new tech probably would not be allowed to work on a production machine for awhile. That is what labs are for.
  
  I guess as long as we could fire them and make them a scapegoat, it was OK to give them access even if they didn't know what they were doing.
  
  --
  I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
36. Re:You know why Amazon charges that much? by sloth+jr · 2009-09-02 10:03 · Score: 1
  
  You both don't "get it", if you can't see that different types of storage serve radically different needs. The system described in the article is ideal for near-line, at-rest data. Sweet spots would include data warehousing, log storage and aggregation, HDFS backing, bulk content distribution store, even a small-to-medium business LAN file server (if coupled with a redundant server).
  
  You wouldn't want to be putting OLTP systems on this environment - you could, but it's not its sweet spot, for many of the reasons you mention. Some of the products you mention do a better job of serving that market's needs, and does so at exorbitant prices that for most companies are significant barriers to adoption.
  
  The article is quite clear that the system is a COMPONENT that does not address redundancy, in and of itself, but is used as a building block coupled with their glue that addresses their market's needs for availability, reliability, and accessibility. Use the right tool for the right job, and make it fit into the budget.
37. Re:You know why Amazon charges that much? by AMuse · 2009-09-02 10:11 · Score: 1
  
  Ouch! Generally we use interns and junior staff to watch over the techs on the floor. This policy stands mainly because it's not just Sun coming in to maintain Sun equipment, it's a vast range of vendors and suppliers. A/C guys to come change the A/C filters, fire guys to check the fire system, electrical guys, safety guys, structural guys for earthquale checks... you get the picture! Quite a lot of those folks are NOT at all capable of knowing not to (for example) lay a big plastic sheet across the air intake to a cooling system while they're trying to inspect the fire sprinklers.
  Even our junior staff may not know the specifics of the board being replaced in the E4k by the Sun guy - but they've had datacenter care and respect driven into their skulls by the time they've been there a month, so they can keep watch.
38. Re:You know why Amazon charges that much? by Score+Whore · 2009-09-02 10:20 · Score: 1
  
  You both don't "get it", if you can't see that different types of storage serve radically different needs.
  I get it. At my house I have terabytes of data using a similar type of system (SATA drives, single paths to controllers, commodity mobo, etc.) I know what it can do. At my job I have Hitachi & NetApp & Sun. I know what they can do. I know the differences between the types of systems. And unless your goal is to make a glorified tape system, I know that random IOPS matter. A lot.
  I wasn't the one comparing my one dimensional line to Hitachi's three dimensional cube. I was merely pointing out the absurdity of comparing this science fair project to enterprise class storage.
39. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 13:06 · Score: 1
  
  "Comparing this thing to enterprise class storage is like some sixteen year old adding a cold air intake and a coat of red paint to his Honda civic"
  Maybe, but they are the ones successfully driving their bussiness on operation costs an order of magnitude cheaper than your "enterprise class" solution. Pufff! How unprofessional from them!
  "Every time I see something like this the only thing I learn is that yet another person doesn't actually "Get It" when it comes to storage."
  Yes, that's why I think too... about you, of course. You obviously don't "Get" that there's more than a class of enterprise needs -even regarding storage, and that choosing the one that best fits the needs is indeed quite "enterprisey".
40. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 13:12 · Score: 1
  
  "If you post some expensive system everyone here will immediately tell you how they could build the same thing in their garage over the weekend for pennies."
  Maybe. But in this case they are telling how they have in fact *done* it, comparatively for pennies and fully covering their bussiness needs. Being that it works as per the needed specs, are you really trying to tell us that "the clever thing" is paying 10x in order to get the same bussiness value?
41. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 13:20 · Score: 1
  
  " I wouldn't allow a support (sub) contractor unrestricted access to the floor of our datacenter"
  That's quite true.
  But then you need to have somebody round there in case a (sub) contractor needs to go in there. Given overall operation costs, the difference from the guy on almost minimal wages (or maybe not, if he needs to be trained and carrying weapons) to the technician that could do it himself X the risk of lowering the avaliability of your solution because the four hours reaction time (if they are indeed four hours instead of three fifty on the phone, then three fifty for the technicial appearing, then three fifte again to go for the proper spare part) versus the guy being already there maybe makes inhousing your support quite a sensible choice. These guys obviously do think so, and you?
42. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 13:26 · Score: 1
  
  "I wonder if this $117k solution will do (...)"
  No, it doesn't.
  "then there is the support of having someone onsite in less than 4 hours repairing the problems."
  They are clever enough not to need such kind of support.
  "These are just a couple of advantages these EMC / NetApp / HDS / IBM provide"
  They might be advantages *if* you need them, which this company doesn't. If you don't need all those "advantages", then they are only money paid for nothing, which usually is not a sensible bussiness choice.
43. Re:You know why Amazon charges that much? by jon3k · 2009-09-02 16:37 · Score: 1
  
  My problem is people comparing this to solutions like Amazon's S3 or even EMC/NetApp/3PAR.
  
  Amazon is a service. Multiple data centers, power, racks, cooling, sales, marketing, etc etc etc. It's like comparing a cash register to walmart. I also don't see that they offer service similar to S3 (eg a storage API). It's one very tiny component, you're not looking at the total cost of ownership. Compared to the major SAN vendors, the feature sets couldn't even begin to be comparable. Anyone who's purchased storage from major SAN vendors knows all the bells and whistles you get that you don't get from a box of disks.
  
  For their needs, it's fantastic, brilliant and as a techie I absolutely love it. But the argument they're making is a red herring.
44. Re:You know why Amazon charges that much? by turbidostato · 2009-09-02 20:30 · Score: 1
  
  "For their needs, it's fantastic, brilliant and as a techie I absolutely love it. But the argument they're making is a red herring."
  Well, you have a point. But see, there is no single messege in all this saying something on the lines of "you had such missplaced price tags because you were not looking at the right places: [disk cabin vendor X] provides 1PB of data per [price tag] which it's on your own league". I thought some would mention i.e. Coraid. This probably mean that they went to the most obvious providers so telling they are 10x more expensive than the solution they finally came with (disregarding all the cool features they didn't need) becomes a fair comparation.
A Very Shortsighted Article by eldavojohn · 2009-09-02 02:12 · Score: 3, Insightful

Before realizing that we had to solve this storage problem ourselves, we considered Amazon S3, Dell or Sun Servers, NetApp Filers, EMC SAN, etc. As we investigated these traditional off-the-shelf solutions, we became increasingly disillusioned by the expense. When you strip away the marketing terms and fancy logos from any storage solution, data ends up on a hard drive.
That's odd, where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad, what happens when maintenance needs to be performed, what happens when the infrastructure needs upgrades, etc. This article left out a lot of buzzwords but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?

You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.

We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.

--
My work here is dung.
1. Re:A Very Shortsighted Article by SatanicPuppy · 2009-09-02 02:23 · Score: 4, Informative
  
  The focus of the article was only on the hardware, which was extremely low cost to the point of allowing massive redundancy...This is not an inherently flawed methodology.
  If you can deploy cheap 67 terabyte nodes, then you can treat each node like an individual drive, and swap them out accordingly.
  I'd need some actual uptime data to make a real judgment on their service vs their competitors, but I don't see any inherent flaws in building their own servers.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
2. Re:A Very Shortsighted Article by Desler · 2009-09-02 02:26 · Score: 5, Insightful
  
  The point is that the costs of services like Amazon or NetApp, etc include the costs for support, server maintenance, upgrades, etc. That they are only comparing this to just the bare minimum price for this company to construct their server is highly misleading.
3. Re:A Very Shortsighted Article by staeiou · 2009-09-02 02:27 · Score: 4, Informative
  
  We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
  They actually do talk about that in the article. The difference in cost for one of the homegrown petabyte pods from the cheapest suppliers (Dell) is about $700,000. The difference between their pods and cloud services is over $2.7 million per petabyte. And they have many, many petabytes. Even if you do add "a few hundred thousand a year for the people who need to maintain this hardware" - and Dell isn't going to come down in the middle of the night when your power goes out - they are still way, way on top.
  
  I know you don't pay premiums because you're stupid. But think about how much those premiums are actually costing you, what you are getting in return, and if it is worth it.
4. Re:A Very Shortsighted Article by TheLinuxSRC · 2009-09-02 02:28 · Score: 1
  
  In the article he does mention that this solution is not for everyone and that failover and other features are outside the scope of the article. However, for his particular usage this is a nice solution.
  
  My question is, where does one acquire the case he uses? My company currently stores a lot of video and the 10TB 4U machines I have been building are quickly running out of space. This would be an ideal solution for my needs.
5. Re:A Very Shortsighted Article by Tx · 2009-09-02 02:28 · Score: 4, Informative
  
  We don't pay premiums because we're stupid. We pay premiums because we're lazy.
  There, fixed that for you ;).
  Ok, that was glib, but you do seem to have been too lazy to read the article, so perhaps you deserve it. To quote TFA, "Even including the surrounding costsâ"such as electricity, bandwidth, space rental, and IT administratorsâ(TM) salariesâ"Backblaze spends one-tenth of the price in comparison to using Amazon S3, Dell Servers, NetApp Filers, or an EMC SAN.". So that aren't ignoring the costs of IT staff administering this stuff as you imply, they're telling you the costs including the admin costs at their datacentre.
  
  --
  Oh no... it's the future.
6. Re:A Very Shortsighted Article by parc · 2009-09-02 02:37 · Score: 3, Interesting
  
  At 67T per chassis and 45 drives documented per chassis, they're using 1.5T drives. 1 petabyte would then be 667 drives.
  The worst part of this design that I see (and there's a LOT of bad to see) is the lack of an easy way to get to a failed drive. When a drive fails you're going to have to pull the entire chassis offline. Google did a study in 2007 of drive failure rates (http://labs.google.com/papers/disk_failures.pdf) and found the following failure rates over drive age (ignoring manufacturer):
  3mo: 3% = 20 drives
  6mo: 2% = 13 drives
  1yr: 2% = 13 drives
  2yr: 8% = 53 drives
  Their logic is probably along the lines of "we're already paying someone to answer the pager in the middle of the night," but jeez, you're going to have to take a node offline ever 2-3 days for the first year and then almost 2 a day after that!
7. Re:A Very Shortsighted Article by joelmax · 2009-09-02 02:38 · Score: 1
  
  FTA The Cases were custom built.
8. Re:A Very Shortsighted Article by fulldecent · 2009-09-02 02:49 · Score: 2, Informative
  
  >> You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
  >> We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
  Or... you could just buy ten of them and use the left over $1m for electricity costs and an admin that doesn't sleep
  
  --
  -- I was raised on the command line, bitch
9. Re:A Very Shortsighted Article by IGnatius+T+Foobar · 2009-09-02 02:54 · Score: 1
  
  I know you don't pay premiums because you're stupid. But think about how much those premiums are actually costing you, what you are getting in return, and if it is worth it.
  It's called a "Cost/Benefit Analysis" and every PHB in the world knows how (and when!) to do one.
  
  --
  Tired of FB/Google censorship? Visit UNCENSORED!
10. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 02:57 · Score: 4, Insightful
  
  You will more than likely NOT have to take a node offline. The design looks like they place the drives into slip down hot plug enclosures. Most rack mounted hardware is on rails, not screwed to the rack. You roll the rack out, log in, fail the drive that is bad, remove it, hot plug another drive and add it to the array. You are now done.
  They went RAID 6, even though it is slow as shit, for the added failsafe mechanisms.
11. Re:A Very Shortsighted Article by zcold · 2009-09-02 03:00 · Score: 1
  
  Still cheaper than 2.8 million... 111 000$ plus your 300 000 for maintenance... hmmm... close to 2.8 mill yet?
  
  --
  you know you can fry stuff putting things into things that dont like the things you put into it...
12. Re:A Very Shortsighted Article by NotQuiteReal · 2009-09-02 03:03 · Score: 1
  
  swap them out accordingly
  
  I hope you don't mind twiddling your thumbs for days, while transferring your data to your backup drive...
  
  --
  This issue is a bit more complicated than you think.
13. Re:A Very Shortsighted Article by machine321 · 2009-09-02 03:05 · Score: 1
  
  That doesn't really answer his question... who custom-built the cases?
14. Re:A Very Shortsighted Article by SatanicPuppy · 2009-09-02 03:09 · Score: 3, Insightful
  
  Why would you bother? Just start off by writing the data to three nodes, and then you can swap new ones in and out silently. If your space really is cheap, then that's not a problem.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
15. Re:A Very Shortsighted Article by QuantumRiff · 2009-09-02 03:11 · Score: 1
  
  Thats very true, and if you are saving a few million a petabyte, a few hundred thousand sounds darn cheap.. heck, you could buy 5 petabytes, have massive redundancy, so anytime you have a problem, you just yank the power out of hte machine, and replace the drive or part.. and you would still save a millions.. this is why google kicked the crap out of MS and yahoo in search.. Cheap, disposable systems that are managed by very smart software, with massive redundancy..
  
  --
  
  What are we going to do tonight Brain?
16. Re:A Very Shortsighted Article by parc · 2009-09-02 03:13 · Score: 1
  
  At first glance it didn't look like they were using rails in the picture, but now that I really stare at it, it does seem possible they're hiding some rails in there.
  So I take it back: the drives DO seem hot-swappable. That doesn't mean I'd want to (or be confident in) doing it :)
17. Re:A Very Shortsighted Article by MrNaz · 2009-09-02 03:15 · Score: 1
  
  a) Your math is wrong, as they are using RAID6 and you failed to account for redundant drives.
  b) In the article they state that they use internal software to break up data amongst their many pods. Presumably, redundancy in their system means that all data is still accessible even when i pod goes offline, kind of like the way a RAID6 array can still be online while a dead drive is being replaced. They're using software to eliminate the need for high priced hot-swap gear.
  
  --
  I hate printers.
18. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 03:18 · Score: 2, Informative
  
  The hardest part will be identifying the bad drives. That is ANOTHER feature that you pay for on expensive disk systems. The controllers will alert you to where the failed drive is, as well as often times alerting the manufacturer of the failure. There have been times I have been called by a vendor to let me know a part and on site engineer was being dispatched for a failure my users were not even aware of yet due to it being off hours (and ops were asleep at the wheel).
19. Re:A Very Shortsighted Article by TooMuchToDo · 2009-09-02 03:22 · Score: 1
  
  Yes. You get a beer. Cheap resiliency is better than expensive redundancy.
20. Re:A Very Shortsighted Article by Anonymous Coward · 2009-09-02 03:25 · Score: 1
  
  The focus of the article was only on the hardware, which was extremely low cost to the point of allowing massive redundancy...This is not an inherently flawed methodology.
  If you can deploy cheap 67 terabyte nodes, then you can treat each node like an individual drive, and swap them out accordingly.
  I'd need some actual uptime data to make a real judgment on their service vs their competitors, but I don't see any inherent flaws in building their own servers.
  Good luck keeping 67 terabytes synched up.
  At gigE speeds - which you're NOT going to get on any WAN - it'll take you a week to transfer 67 TB.
21. Re:A Very Shortsighted Article by Dare+nMc · 2009-09-02 03:26 · Score: 1
  
  you're going to have to take a node offline ever 2-3 days for the first year and then almost 2 a day after that!
  They would have one node become degraded by one drive every 2-3 days. 45 drives per chassis, if you had 4 pure spares in the 45 then the 1.2 drive failures per node/year would give 3* the life per node before maintenance desired. I think using your numbers they would then start averaging 2.1 node failures (lack of a hot spare) per year (with 4 hot spares to start with). So actual drive failure node maintenance shouldn't be that horrible. Granted that means they would have a extra node to make the same storage, to make up for the hot spares...
22. Re:A Very Shortsighted Article by ianpatt · 2009-09-02 03:36 · Score: 3, Informative
  
  From the credits list: "Protocase for putting up with hundreds of small 3-D case design tweaks", which I assume is http://www.protocase.com/.
23. Re:A Very Shortsighted Article by binarylarry · 2009-09-02 03:39 · Score: 1
  
  http://www.protocase.com/
  Pretty cool service (no, I'm no affiliated in any way)
  
  --
  Mod me down, my New Earth Global Warmingist friends!
24. Re:A Very Shortsighted Article by bhima · 2009-09-02 03:49 · Score: 1
  
  A few weeks / months ago, Nature Magazine had a small series of articles on gigantic data-sets, data mining, and the new studies based on meta investigations of those data-sets. So when I read this article I read in that context. So I had assumed that most people needing to store a few petabytes have most the infrastructure (power, personnel, polices) you are describing... only they are facing budget shortfalls. I think that shortfall could be the motivating factor to employ these things.
  Though truthfully, I don't have petabyte requirements, so I have no real idea.
  
  --
  Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
25. Re:A Very Shortsighted Article by rijrunner · 2009-09-02 03:50 · Score: 4, Interesting
  
  Having a couple decades of working both sides of the Support Divide, I am now of the opinion that the sole purpose of a Support Contract is to have someone at the other end of the phone to yell at. It makes people feel better and have a warm fuzzy. But, having had to schedule CE's to come onto site to replace failed hardware, I have generally found that that adds hours to any repair job. I would guess that you could power off this array, remove every single drive, move them to a new chassis, reformat them in NTFS, then back to JFS and still finish before a CE shows up on site. I recall that in the winter of 1994, *every* Seagate 4GB drive in our Sun boxes died.
  What happens now when a drive goes bad now is that a drive goes bad. You spot it through some monitoring software. You pick up the phone and call a 1-800 number. Someone asks a few questions like "What is you name? What is your quest? What is your favorite color?", then you hear typing in the background. After a bit, if you're lucky, they have you in the system correctly and can find your support contract for that box. Then, they give you a ticket number and put you on hold. Then, after a bit, an "engineering" rep will come appear and say "What is the nature of the emergency" and you then tell them the same stuff, except you get to add works like "var adm messages" or something. They'll tell you to send them some email so they can do some troubleshooting. You send them what they ask for. About an hour or so later, you get an email or call back saying that the drive has gone bad and need replaced, which is pretty much the same thing you told them when you called in. They then tell you that you are on a Gold Contract with 24/7 support and that the CE has a 4 hour callback requirement from the time the call is dispatched to the CE. By this point, you are about 3-4 hours after the disk drive failed in the first place. Finally, the CE will call back after some amount of time to schedule a replacement. And here comes the real kicker.... In almost every instance for the last 10 years, we have had to do all maintenance during a scheduled window. At 1AM.
  What happens now when something breaks is that someone fixes it.
  Any business is faced with a Buy-It-Or-Build-It dilemma for any service or equipment. Since this was their core business, it certainly makes sense. And, it makes sense for any business of a certain size or set of skills. The reality is that the math is favoring consumer electronics for most applications because they are good enough for 85% of the business needs out there. The whole Cost-Benefit analysis must be periodically re-addressed. If you do not have $1 million a year in billed repair from a Support contract, is it worth $1 million a year for the contract? Seriously.. Even if you have a support contract, you're probably going to get billed time and materials on top of everything else.
  With the math on this unit, you can build in massive layers of redundancy to greatly reduce even the possibility of the data being inaccessible and still come in far, far cheaper than any support contract and you can schedule downtown because you have redundancy across multiple chassis.
26. Re:A Very Shortsighted Article by PiSkyHi · 2009-09-02 04:02 · Score: 1
  
  I must admit, their design using Silicon Image cards is exactly what I would have done with an array that large.
  I use the same hardware on a smaller scale.
  A lot of people here claiming the maintenance costs would justify the price from commercial storage vendors, however, they are mostly still using ancient tech to manage their storage and they don't realise that by adding managed redundancy, using consumer grade components combined with an open source OS and some management scripts makes a lot of sense - a single hard drive is dispensable.
27. Re:A Very Shortsighted Article by TooMuchToDo · 2009-09-02 04:09 · Score: 1
  
  I don't think RAID6 is going to be all that slow across 15 SATA drives. Bonus if the controller has acceleration to speed up parity calculations for faster writes.
28. Re:A Very Shortsighted Article by mpapet · 2009-09-02 04:14 · Score: 1
  
  Bullsh!t
  costs for support
  Well, they have to support their application/infrastructure anyway. The worst case scenario is this is a marginal cost.
  server maintenance
  Ditto.
  upgrades
  Hardware is still getting faster and cheaper every year. Storage is getting cheaper by the terabyte every year too. So, this cost goes down AND they get more storage if they choose to do nothing to their solution.
  There are clearly situations where EMC has something that cannot be done for less or done better than EMC's products. But that's not the situation here. Why pay more for something you can do for far less? Is it the luxury of getting to blame someone else? I don't get your way of thinking, please provide some facts to support your ideas.
  
  --
  http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
29. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 04:17 · Score: 1
  
  They discussed this was being done in software, and while that is great, you still will have a slowdown with RAID 6. Will it be unbearable? surely not. is it RAID 10? Not a chance, but I think they care more about storage than performance for this. Also, with RAID 6 in software with mdadm they did not discuss the raid stripe configuration and whether it was left or right sync or async, etc.
30. Re:A Very Shortsighted Article by furby076 · 2009-09-02 04:21 · Score: 1
  
  Oh I agree. There is a difference between the harddrive we buy at newegg and the hard drive from EMC. It is a matter of quality and reliability. You pay more for EMC because they gaurantee better reliability.
  
  As for paying a premium paying someone to wake up at 3AM to fix a bad hard drive...that's why you have salaried employees. 24/7...same paycheck
  
  --
  
  I do not support "The Man". I also do not support your irrational stupidity
31. Re:A Very Shortsighted Article by TheRaven64 · 2009-09-02 04:23 · Score: 1
  
  That's great. How are you handling synchronisation between nodes? If they get two writes to the same block in a short window, what are you using to ensure that they are committed in the same order on both? How is your choice affecting throughput and reliability?
  
  --
  I am TheRaven on Soylent News
32. Re:A Very Shortsighted Article by Anonymous Coward · 2009-09-02 04:23 · Score: 1, Interesting
  
  I've never spent more than 10 minutes on the phone with the Dell Guy(tm) for a failed drive.
  Always had them in under 4 hours, too.
  Just my MMV.
33. Re:A Very Shortsighted Article by NotBornYesterday · 2009-09-02 04:26 · Score: 1
  
  If you DIY, you have yourself to rely on not just for the break/fix stuff, but for the testing of compatibility issues. What happens if a couple years from now, a controller card goes bad, and the updated firmware on the replacement card they buy from Newegg causes an unexpected problem? Say what you want about cost, but the OEMs typically have a lab full of people and stuff that check this stuff exhaustively and publish compatibility matrices. If you as a DIY guy can vet out 99.9% of potential problems in your homebrewed storage, and the OEM can give a 99.99% assurance, that may be a relatively small difference in terms of absolute percentage, but it is a large amount of business exposure. Since downtime = -$, there is incentive to pay big bucks to avoid it.
  
  Also, there is political risk. If your homebrewed storage node fails, you own the stink. If it has an EMC badge on it, the stink belongs to someone else.
  
  A drive or storage node failure in any enterprise storage environment shouldn't result in "your data is still gone and your business is down the drain", because you're doing backups, right?
  
  Don't get me wrong; I love what they've done. But there's a reason major systems vendors charge (and get) the money they do.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
34. Re:A Very Shortsighted Article by TheRaven64 · 2009-09-02 04:27 · Score: 1
  
  Would you even bother replacing them? If you're using something like ZFS that can rearrange the redundancy across disks in the pool then just overprovision by 10% and leave the failed drives in there. In two years, add a 2PB array for the same price and migrate your data over.
  
  --
  I am TheRaven on Soylent News
35. Re:A Very Shortsighted Article by NotBornYesterday · 2009-09-02 04:37 · Score: 1
  
  It would have been nice if they had designed them more for serviceability. No hot swap power supplies (even relatively small storage arrays have had that for years) or disks. Sure, they save on engineering and hardware costs, but it's going to cost them in downtime.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
36. Re:A Very Shortsighted Article by kjs3 · 2009-09-02 04:50 · Score: 1
  
  We do. All the time. And we don't put critical infrastructure on things we build out in the garage.
37. Re:A Very Shortsighted Article by NotBornYesterday · 2009-09-02 04:51 · Score: 1
  
  Sorry for responding to my own post here, yes, I know, bad form and all that.
  
  Holy crap! I glanced briefly at the architecture diagram, saw two power supplies, and assumed that they were there for redundancy. Nope! One supplies the MB, boot disk, some disks & fans. The second powers more disks & fans.
  
  Folks, there is no way in hell this box is anywhere near being an enterprise-class storage box. I am sure that for their purposes, this works fine, and they are happy with it. Heck, I'd love to have one of these, and I don't even come close to needing one. But the comparisons to NetApp, EMC, and others are apples to oranges. Yes, you can build an off-the shelf system more cheaply than you can buy one retail at the same capacity. Yes, you could stack lots of these to achieve redundancy, but in the end, these things just aren't engineered like a branded box.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
38. Re:A Very Shortsighted Article by codeguy007 · 2009-09-02 04:52 · Score: 1
  
  Do the sata card they are using support hotswap? When we were designing NAS devices back in 2005, only the actual SATA raid cards supported true hotswap. That's why we used them even when building software raid boxes.
39. Re:A Very Shortsighted Article by HeronBlademaster · 2009-09-02 05:05 · Score: 1
  
  But that one box with the dedicated technician is in no way equivalent to the service that (for example) Amazon S3 provides. Their systems automatically replicate data across geographically distant nodes; your one box can't do that. They keep multiple copies of the data within the same data center, to minimize the risk of the data dying within one data center. Your one box can do that, but not if the box "only" has a petabyte of disk space and you want to store a full petabyte of data.
  That means that if you store one petabyte of data on S3, they're actually physically storing three or four petabytes of data - you'd need three or four of those homebrew boxes just to match the storage space for redundancy, not to mention making sure they stay in sync and keeping them in geographically distant locations.
  To get a petabyte, let's say you buy 100 1TB drives. What's the failure rate of those drives? If you end up with four of those 100-disk machines, you could very well be replacing a drive every day - at $90 per drive, that adds up pretty quick.
  Not only that, but they're providing high-bandwidth access to that data from pretty much anywhere. Sure, your one guy with the one box might be able to do that if the box is only accessed over the local network, but if he has to provide storage to multiple locations, then he'd better have super-awesome upload bandwidth from the ISP; you wouldn't want to try serving a petabyte of data over a 2Mbps upload link, would you? ISPs love charging metric tons of cash for high-end upload bandwidth, especially if you're not a data center.
  Then add in electricity, cooling, cost of the space used to physically store the machine, etc etc.
  The point is, it very well could be $2.8 million worth of 'extra', when you take all that into account.
40. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 05:13 · Score: 1
  
  Yes, and the also should support some RAID features, I believe.
41. Re:A Very Shortsighted Article by MartinSchou · 2009-09-02 05:37 · Score: 1
  
  The part they specify is a Seagate ST31500341AS [pdf spec sheet]. That says 750,000 hours MTBF. 667 drives leaves you with one drive failing every 1,124 hours or once every 46 days. That comes out as 16 failed drives in 2 years.
  That's not entirely bad. The drive has a 5 year limited warrenty, so you'd end up with something like 40 failed drives in that time.
  Seagate also claim an annualized failure rate of 0.34%. Going with that, two years down the line you'd expect to have 5 drives fail. And at five years you'd expect to see 12.
  To be honest, I actually expected this system to come out a lot worse than that.
  As for ZFS, you could probably just install on your own. You won't have Sun support behind you, but wouldn't this be completely feasible?
  And there is still a matter of performance and power consumption. I haven't a clue how either the x4550 or this product will fare - but I am curious.
42. Re:A Very Shortsighted Article by PRMan · 2009-09-02 05:45 · Score: 2, Interesting
  
  I used to work at a company that paid a 20% premium on hardware for support from HP that was COMPLETELY WORTHLESS. I told them they would be better off just ordering a 6th computer for every 5 that they bought.
  The guy would show up with no tools, not even a screwdriver, and then he would need to come back the next day (with a screwdriver). Then he didn't have the part (say RAM) that we told them in the first call and the day before. Then he showed up the next day with RAMBUS instead of DDR RAM. After 3 weeks, we got the machine back online.
  Which means, in the meantime, since the person whose machine it was had to have something to work on, we had to cobble together a PC from no spare parts and then try to transfer their stuff off of their drive (because nobody ever heeded the store everything on the U: and S: drive mantra) and we worked like crazy to do it, eating up our whole day.
  If we had had spare machines instead, we could have just replaced her RAM in 1 minute. Or, if it was the motherboard, put her drive in an identical replacement machine in 1 minute.
  
  --
  Peter predicted that you would "deliberately forget" creation 2000 years ago...
43. Re:A Very Shortsighted Article by nxtw · 2009-09-02 05:47 · Score: 1
  
  The hardest part will be identifying the bad drives. That is ANOTHER feature that you pay for on expensive disk systems. The controllers will alert you to where the failed drive is, as well as often times alerting the manufacturer of the failure. There have been times I have been called by a vendor to let me know a part and on site engineer was being dispatched for a failure my users were not even aware of yet due to it being off hours (and ops were asleep at the wheel).
  I identify failed drives on my software RAID setup by labeling each hotswap tray with the drive's serial number. smartmontools identifies the serial number of the failing or missing drive. Using Linux's /dev/disk/by-id to refer to drives instead of /dev/sd[a-z][a-z]? also helps.
44. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 05:53 · Score: 1
  
  Each step adds cost and it is still harder to train a monkey to do any of those things than "look for the red blinking light, pull the handle and slide the drive out. Push in the new drive and walk away
45. Re:A Very Shortsighted Article by SaDan · 2009-09-02 06:06 · Score: 1
  
  No, the hardest part will be opening the case, and reading the key to determine where the drive that needs to be replaced is located. Decent RAID controllers will tell you which drive is bad in an array, and which physical port it is connected to on the controller. Just have a map of what drive ends up on what port on what controller, and it's fairly easy to locate a drive that needs to be pulled.
46. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 06:07 · Score: 1
  
  Didn't say it was a bad choice, however they are also not saving 1 customer's data per node. RAID 10 would be faster, but at a reduced cost effectiveness.
47. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 06:12 · Score: 1
  
  So, how many of you will read the article and find out they are doing software RAID.....? As well, when dealing with 45 drives, ensuring you pull the right one from an open case is not trivial for under skilled "Ops" type people, at least with my experience having them remove the wrong cards after telling them to count 3 from the left (open) and pull the 4th card, they pull the 8th and last card....
48. Re:A Very Shortsighted Article by nxtw · 2009-09-02 06:18 · Score: 1
  
  They discussed this was being done in software, and while that is great, you still will have a slowdown with RAID 6. Will it be unbearable? surely not. is it RAID 10? Not a chance, but I think they care more about storage than performance for this.
  Given a sufficiently fast CPU, enough RAM, and adequate tuning, they should be able to saturate the available bus bandwidth for each array. Sequential reads should be very close to the total bus bandwidth (same as RAID10), and any write that can avoid read-modify-write can be faster than RAID10.
  SATA is currently limited to 3.0 Gbit/sec total per port. I used to use the Silicon Image 3132 PCI-e x1 two-port SATA controller; I found that it was limited to 120 MBytes/sec total per controller.
  I upgraded to a PCI-e x8 Silicon Image 3124 with four SATA ports, and now I can get 240 MBytes/sec from each SATA port multiplier. With Linux software RAID6 and five drives on a single port multiplier, sequential reads are close to 240 MBytes/sec and sequential writes reach at least 140 MBytes/sec with ext3. (At that point, ext3's journaling seems to be a limiting factor. If I am measuring correctly, ext3 actually uses *more* CPU time than the RAID6 implementation at these speeds.)
49. Re:A Very Shortsighted Article by parc · 2009-09-02 06:25 · Score: 1
  
  Yeah, I looked up the manufacturer's projected MTBF before I posted, and saw the normal "ridiculously big number". I then went searching for real-world studies of drive MTBF. You can't really get any better than a study by Google of their actual drive failure rate.
50. Re:A Very Shortsighted Article by Znork · 2009-09-02 06:27 · Score: 1
  
  There are various ways to accomplish that, ranging from exporting the whole node-as-a-disk over iSCSI and use RAID on the importing system to overlay the appropriate level of redundancy and synchronization, to using things like DRBD to manage node-node sync (which could even give you site redundancy).
51. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 06:35 · Score: 1
  
  The issue is not simply the throughput to the drives but the write hole parity calculation issues, as well as resync issues that can cause an outage if 2 drives happen to fail on an array with a third drive in a pending failure state. RAID 10 would offer them the ability to lose 1/2 the drives for a smaller performance penalty than losing 2 drives in a RAID 6.
  Sure, you can pin the bus, and I have a 3132 in my Dell Poweredge at home that can max out around 119MB sec on 5 WD green drives. I am using RAID 10 in software (technically closer to 1E as I have 5 drives).
52. Re:A Very Shortsighted Article by upside · 2009-09-02 06:38 · Score: 1
  
  As I pointed out above, read the FA and you'll find that software is the bread and butter of BackBlaze, not hardware. That's why they can open source their design. Same as for Google, who also build their own hardware from COTS parts.
  
  --
  I'm sorry if I haven't offended anyone
53. Re:A Very Shortsighted Article by DamnStupidElf · 2009-09-02 06:44 · Score: 1
  
  For bulk data, software RAID6 will be a bit faster than software RAID10 because there's less duplication of data and hence less traffic over the bus for a given chunk of data. 13/15 of the bus traffic will be actual data, versus 1/2 of the bus traffic for RAID10, and the CPU overhead is pretty low. My Linux system claims the ability to calculate RAID6 parity at over 1GB/s on just a mobile Core 2 chip, and always writing full stripes removes the RAID5/6 penalty. Reads may be slightly slower, but with their setup it looks like they can easily sustain over 200 GB/s which would saturate the network anyway.
54. Re:A Very Shortsighted Article by SaDan · 2009-09-02 06:47 · Score: 1
  
  Software, hardware, whatever. I read the article, and somehow interpreted the info on the RAID6 stuff incorrectly (why would you do that in software, yack).
  You can still map out drives in software RAID, or monitor the drives using the hardware controllers. I've done both in the past for large arrays I've built for storage or databases.
  My point still stands, even if you have to add a step or two to go between the software RAID and hardware controller level.
55. Re:A Very Shortsighted Article by Zerth · 2009-09-02 06:52 · Score: 1
  
  Is it really that hard? My cheapass Silicon Image sata backplane has LEDs that identify bad drives. It may be there is an even cheaper SI part that they are using that doesn't have those 10 cent LEDs, but I thought I was already scraping the bucket.
56. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 07:02 · Score: 1
  
  You incorrectly assume that RAID 6 uses parity drives. They do not, but instead use parity striping. Every write has to include every drive. RAID 10 is faster and more efficient. As well, RAID 10 can write to either and read from either set and sync that way. You write to fewer drives with RAID 10 and read from just as many but without XOR calcs to slow you down.
57. Re:A Very Shortsighted Article by MartinSchou · 2009-09-02 07:13 · Score: 1
  
  Do you know if Sun, Dell, IBM or any of the other large storage server providers have published anything like it?
  One criticism I remember from the Google study was that they were only using the cheapest available drives and not "enterprise" drives, blah blah blah.
58. Re:A Very Shortsighted Article by mitgib · 2009-09-02 07:16 · Score: 1
  
  I agree they probably should not have compared their product to more enterprise focused offerings.
  
  One point you made about serviceability, I read the article several times as this is a huge area of interest to me, and it would seem, and I am just assuming now, these units are on rails and can be slid out and quickly serviced. Maybe not quite as quickly as a hot swap bay, but you are not going to get the density these Pods are trying to archive either.
  
  Something I am curious about is failure and how it is dealt with. I have many systems with software raid in use in Linux and personally I find it a PITA to fail the drive, shut the system down, replace the physical hardware failure, start the machine back up, partition the drive, then add it back into the array and then the array rebuilds itself. I also have many systems with hardware raid cards, and on those systems, when I do have a drive failure, the raid card emails me, and it sits there and audibly screams until you deal with it, which is a matter of pulling the drive out, and installing a replacement, and the card then has a tall cool glass of STFU and away it goes. Working on a scale such as BlazeBack, I might opt for a very similar solution as they have designed. They are marketing to home users at $5/mo for unlimited storage per computer, and have designed a system to accommodate that market.
  
  I see folks clamouring for ZFS over JFS, neither of which I am terribly familiar with, XFS does me fine with my 32tb home built OpenFiler box for storing and streaming my Blue-Ray collection. I used a 3ware SATA/SAS card and Chenbro SAS expanders to assmble the array, set it up as raid5 with no spares, and keep 4 drives as spares sitting around in their bag, so I don't have to wait for anyone to ship/deliver a drive at 3am, and if the power supply goes, well my data is still sitting on the array, I just can't watch moveis until I fix it, so yes, their is a need for cheap homebrew storage, and also Enterprise class service/support. This is not a black and white subject, there is plenty of room for all the solutions available, and none of them are best for all.
  
  --
  Being a spelling & grammar Nazi is a sign you do not poses the intelligence to contribute to the conversation
59. Re:A Very Shortsighted Article by good+soldier+svejk · 2009-09-02 07:19 · Score: 1
  
  Not only that, they seem to think there is no difference between block and file storage. This is an HTTP only NAS system backed by JFS. That would not be my first choice for many applications. I couldn't use this for most of my data center even if it did have a support infrastructure.
  
  One of the most important concepts here is that to store or retrieve data with a Backblaze Storage Pod, it is always through HTTPS. There is no iSCSI, no NFS, no SQL, no Fibre Channel. None of those technologies scales as cheaply, reliably, goes as big, nor can be managed as easily as stand-alone pods with their own IP address waiting for requests on HTTPS.
  
  --
  It is cowardly, and a betrayal of whatever it means to be a Jew, to act as a white man
  
  -James Baldwin
60. Re:A Very Shortsighted Article by MiniMike · 2009-09-02 07:45 · Score: 1
  
  IANAHDE, but it seems like they could keep an extra drive in each chassis (on the unused port on the 4-port SATA card) as an automatic replacement for a failed drive, and replace the faulty drive at scheduled downtime (or at next drive failure, whichever came sooner). They could also use the SATA ports on the motherboard, as they don't have to worry about the port-replicator compatibility problem, which would also allow them to keep multiple extra drives. Other than the software management issues, any reason they couldn't do this? I'm assuming that they could keep the extra drives at a low-power state, so energy and wear and tear wouldn't be such huge issues.
61. Re:A Very Shortsighted Article by ndpope · 2009-09-02 07:59 · Score: 1
  
  To get a petabyte, let's say you buy 100 1TB drives. What's the failure rate of those drives? If you end up with four of those 100-disk machines, you could very well be replacing a drive every day - at $90 per drive, that adds up pretty quick.
  If you have 400 1-TB drives and 365 fail in one year, you chose a crappy brand of drives. I expect my failure rate on those to be 3%-8% on a yearly basis, so I would expect as many as 32 drives to fail in a year.
62. Re:A Very Shortsighted Article by Sandbags · 2009-09-02 08:00 · Score: 1
  
  It should also be noted, the offers they compared it to included storage virtualization, HBA multipathing, and is based on PRESENTED storage, not physical storage.
  For a Tier 1 array, RAID 10 is typically standard, if not a high performance RAID -6, built on FC disks not SAS/SATA, with a LARGE cache (256MB or more for even a small array, GB on larger systems). Then there's disk journaling space for rollback writes and comitted write caching. To present 100TB of partitioned disk can take 250TB or more of physical disks underneath, and that's without getting into replication to a secondary SAN (which is billed differently from the primary chassis).
  Comparing a single Tier 1 SAN chassis, fully configured and ready to go from EMC, with full support and onsite configuration, to a simple DASD configuration without even accounting for RAID parity drives let alone clustering, virtualzation, and avaialbility software is a faulty comparrison. Also, NO ONE pays retail for EMC hardware (discounts of 35% are NORMAL, up to 50% or more has been our experience).
  Here's what I'd like to see: BackBlaze's cost for a FULLY REDUNDANT, Tier 1, RAID 10 system, with spare drives, split modularly across more than 1 datacenter in the same building and further replicated in real time (geoclustering) to an offsite locartion (not including WAN costs). All the software components for this included and fully configured, including a routine support cycle for software upgrades. Licensing for hundreds of connected servers, and all the SAN and 10G connectivity to the network. Then, compare that to a Symmetrix solution with the same capabilities.
  I understand Backblaze, given their internal code base, can do this cheaper than the could have bought a real system, but can the SELL that solution on a price that is half or less than EMC's QUOTED (not MSRP) competitive price? Then I'd be impressed.
  
  --
  There is no contest in life for which the unprepared have the advantage.
63. Re:A Very Shortsighted Article by rijrunner · 2009-09-02 08:12 · Score: 1
  
  Very correct, btw. That was my main sticking point. I'd be looking at putting in quad 10GB ethernet cards or fiber. Then, running tests to see if the ATA over ethernet process will work.
64. Re:A Very Shortsighted Article by greg1104 · 2009-09-02 08:17 · Score: 1
  
  Netapp released a study as well that included enterprise targeted drives, and in some cases the fiber-channel drives had significantly better error rates. The other interesting studies here are from Google and Carnegie Mellon
65. Re:A Very Shortsighted Article by good+soldier+svejk · 2009-09-02 08:20 · Score: 1
  
  You'd have to write your own software for the thing. They don't even allow ATA over Ethernet. It isn't just IP only, it is HTTPS only. If you want block level access, they don't want you.
  
  --
  It is cowardly, and a betrayal of whatever it means to be a Jew, to act as a white man
  
  -James Baldwin
66. Re:A Very Shortsighted Article by TClevenger · 2009-09-02 08:28 · Score: 1
  
  Not only that, they seem to think there is no difference between block and file storage. This is an HTTP only NAS system backed by JFS. That would not be my first choice for many applications. I couldn't use this for most of my data center even if it did have a support infrastructure.
  Note that this was by choice. They host backups for consumers' data (a la Carbonite.) Therefore, most of the data will be write-once, read-possibly-never. They chose HTTPS for the simplicity at the server end, and they won't have the IOPS requirements or bandwidth requirements that a high-end data center would need. (Carbonite, for instance, uses a fraction of a user's upstream bandwidth to trickle the data up, so that the consumer's Internet connection isn't hammered. There are an awful lot of consumers that have 384k-768k upstream, so massive speed isn't a requirement here.)
67. Re:A Very Shortsighted Article by rijrunner · 2009-09-02 08:28 · Score: 1
  
  I would try vblade http://sourceforge.net/projects/aoetools/files/vblade/10/
  The underlying system they use is Linux. If you try SuSE. you use the steps in http://www.linux.com/archive/feature/55773 and use vlade.
68. Re:A Very Shortsighted Article by ToasterMonkey · 2009-09-02 08:40 · Score: 1
  
  There are various ways to accomplish that, ranging from exporting the whole node-as-a-disk over iSCSI and use RAID on the importing system
  Software RAIDed iSCSI, no thanks, I'd rather have a fork in my nuts.
69. Re:A Very Shortsighted Article by ToasterMonkey · 2009-09-02 08:49 · Score: 1
  
  Note that this was by choice. They host backups for consumers' data (a la Carbonite.) Therefore, most of the data will be write-once, read-possibly-never.
  Most are not willing to lower their expectation below read-at-least-once data. Statistically speaking, you may be right, but read-possibly-never is not something to design backup systems around. The possibility of a full recovery over the Internet is what pushes me towards using local storage, though who says you shouldn't do both?
70. Re:A Very Shortsighted Article by TClevenger · 2009-09-02 09:12 · Score: 1
  
  Most are not willing to lower their expectation below read-at-least-once data. Statistically speaking, you may be right, but read-possibly-never is not something to design backup systems around. The possibility of a full recovery over the Internet is what pushes me towards using local storage, though who says you shouldn't do both?
  I agree, it makes you think twice before trusting this type of provider as your only backup, or worse, your primary data store. However, in instances such as this, I could see these units, further abstracted and made redundant by the software layer above, as a viable alternative to the big storage providers when you need tons of cheap storage and don't care about speed, or about "rolling your own" higher-level software.
71. Re:A Very Shortsighted Article by Cramer · 2009-09-02 09:14 · Score: 1
  
  And for a contrasting story... Cisco Support. A few years ago, while working for a telco, we had a Cisco 2920 fail -- it's a fixed configuration 2 slot Cat 5000. Cisco stopped making those things a thousand years ago, but they kept taking our money for hardware support contracts (we had a dozen of them) which means they're legally required to have spares. After several hours working with a tech -- removing cards that aren't supposed to be removable, it was determined the backplane was bad; so no shipping me parts from a 5500 (which I already had.) I could hear the guy shaking his head over the phone :-) "Lemme check around and see if I can find one. I'll call you back." It took a while, but he found one and had it there the next morning -- which, short of buying it a seat on a plane, was as fast as it could get here. The following Monday, they canceled (and refunded) our service contracts on those things. heh.
72. Re:A Very Shortsighted Article by baptiste · 2009-09-02 09:53 · Score: 1
  
  They weren't comparing their box to enterprise class systems as a solution for others. The point was *THEY* needed lots of cheap storage and quotes they got were insanely expensive. So they rolled their own and shared the design. Would you use one of these for direct storage? Only with high level redundancy on top of a cloud of these. But for their application, these work well IF they're taking the proper precautions at the higher levels (corruption, etc) I think it's great they're trying to pull this off. Don't fault them for trying and offering their design.
  
  --
  Top Most Bizarre/Disturbing Error Messages
73. Re:A Very Shortsighted Article by sloth+jr · 2009-09-02 10:18 · Score: 1
  
  Backblaze understands that. Their redundancy is accomplished via replication to other nodes. A node goes down, for whatever reason - no prob, just use one of the other nodes that has the data. As the article mentions, these are building blocks that can be used to create much larger structures - don't think of it as a stand-alone array, anyone who's engineering their own solution like this already knows what it takes to service their availability or accessibility needs, and will accomplish those needs via other mechanisms.
  
  A server isn't important. A service is. Engineer for availability of the service.
74. Re:A Very Shortsighted Article by sloth+jr · 2009-09-02 10:21 · Score: 1
  
  Yes, perhaps they were wrong to compare themselves directly to enterprise arrays - they service radically different accessibility and availability models. That said, they acknowledge these are components of a system, not the end-all/be-all. They're building a service here: who cares if a node goes down, if I've got two others that are ready to pick up the request?
  
  Think service redundancy, not in-box server redundancy.
75. Re:A Very Shortsighted Article by PAjamian · 2009-09-02 12:32 · Score: 1
  
  Scroll down to the bottom of the article and there is a parts list. The case is custom manufactured and is not for sale but there is a link to the 3d model. Take that file to a local metal shop and get them to make the case for you.
  
  --
  Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
76. Re:A Very Shortsighted Article by turbidostato · 2009-09-02 13:33 · Score: 1
  
  "I sure wouldn't trust these guys to back up so much as one bit of my data after seeing what they're backing it up on. I don't care how good they think their software is, this hardware is unacceptable."
  Good to know you would never use Amazon or Google services, since they both use the same kind of hardware.
77. Re:A Very Shortsighted Article by sholto · 2009-09-02 17:55 · Score: 3, Informative
  
  I'd need some actual uptime data to make a real judgment on their service vs their competitors,
  I did an extensive interview with the Backblaze CEO. No hard data on uptime but he says they lose one drive a week from the whole 1.5petabyte system and have never had a pod fail. They've been running for a year. Here's the link to the story. Also comments about the designing/testing process. http://www.crn.com.au/News/154760,want-a-petabyte-for-under-us120000.aspx
78. Re:A Very Shortsighted Article by NotBornYesterday · 2009-09-02 23:02 · Score: 1
  
  I certainly don't fault them for doing this. I just think it is disingenuous to compare $1 or $2 million enterprise hardware/firmware/software to the box they built, because the two are not directly comparable except in raw TB storage. They essentially built their own Sun 4550 server, without many of the the redundancy and serviceability features, for less money.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
79. Re:A Very Shortsighted Article by NotBornYesterday · 2009-09-02 23:33 · Score: 1
  
  Understood. It's sort of like having a farm of cheap 1u servers instead of a large SMP box, which actually makes more sense in a lot of applications.
  
  That being said, they are focusing on the up-front cost of their own box, while offerings they are getting from the major vendors likely (I can't tell for sure; they don't give a breakdown) include some level of software licensing plus some level of service agreement.
  
  I'm guessing that the sheer number of these things and the likelihood of failure (which even they acknowledge), coupled with the lack of easy serviceability features will result in higher maintenance time and cost. Also, massive redundancy of lots of spinning disks will drive their power/cooling costs higher than it would be otherwise. In the end, I don't think the price advantage will be nearly as great as the article implies.
  
  --
  I prefer rogues to imbeciles because they sometimes take a rest.
80. Re:A Very Shortsighted Article by spinkham · 2009-09-03 04:50 · Score: 1
  
  If you're not using hundreds of petabytes or more, you shouldn't even consider this.
  This would be a fools errand if they needed only one of these. But the probably have many hundreds or thousands, and it's definitely worth it for them.
  Much like Google with their bare-bones servers and custom software, COTS works well at the small scale, but custom is better when you're dealing with huge amounts of hardware.
  
  --
  Blessed are the pessimists, for they have made backups.
81. Re:A Very Shortsighted Article by atamido · 2009-09-03 10:14 · Score: 1
  
  Those are a lot of "what if" needs, so I'm guessing the answer is more along the lines of, "Amazon provides a lot of services that can be well worth the cost if you need them all."
  On the other hand, if you know what you're doing, and your needs are simple, then then there could be significant cost savings. Buying a couple of these and setting up redundancy at multiple locations wouldn't be that difficult, but there are a lot of little things that just can't be done. For instance, EMC has all sorts of options for how data is distributed, options that simply aren't available on any open source package.
  I'm really looking forward to the day when there is a Linux distro that you just install and select to join a storage cluster.
82. Re:A Very Shortsighted Article by drsmithy · 2009-09-03 10:36 · Score: 1
  
  If you can deploy cheap 67 terabyte nodes, then you can treat each node like an individual drive, and swap them out accordingly.
  No you can't. Replicating the ~54TB usable space you'd get per node to another - in case of failure - is going to take at least a week (and probably closer to two). Given how poorly those nodes look to be architected for reliability, the node failure rate must be relatively high. I'd have to say the only thing that's saved them from a business-ending, catastrophic data loss incident thus far is either a) luck or b) lack of high-volume, high-profile customers.
  I'd use them because they're cheap, but I certainly wouldn't want them to have my only copy of anything valuable.
83. Re:A Very Shortsighted Article by drsmithy · 2009-09-03 10:38 · Score: 1
  
  I hope you don't mind twiddling your thumbs for days, while transferring your data to your backup drive...
  Days ? I'd be amazed if they could pull all the data off an active node in much under a couple of *weeks*.
84. Re:A Very Shortsighted Article by drsmithy · 2009-09-03 11:01 · Score: 1
  
  I'd be looking at putting in quad 10GB ethernet cards or fiber.
  Pointless. A single node is unlikely to be able to get much over a couple of hundred MB/sec throughput, even in ideal conditions.
  For any remotely reasonable use case, all each node needs is a pair of GbE NICs (although given their attitude towards performance and SPOFs, probably just the one is fine).
85. Re:A Very Shortsighted Article by drsmithy · 2009-09-03 19:28 · Score: 1
  
  I don't think RAID6 is going to be all that slow across 15 SATA drives. Bonus if the controller has acceleration to speed up parity calculations for faster writes.
  The "overhead" of parity calculations is not a bottleneck in any remotely modern (<10 years old) system.
  The CPU in the average budget _laptop_ machine of today calculates parity multiple times faster than the ASICs on even high-end RAID controllers.
86. Re:A Very Shortsighted Article by qubezz · 2009-09-04 01:17 · Score: 1
  
  RAID 10 would offer them the ability to lose 1/2 the drives for a smaller performance penalty than losing 2 drives in a RAID 6.
  No, two dead drives in the same pair and the array is toast on raid 10. Two drives and the RAID 6 they described survives.
  One of these rack units can survive at least two drive failures, but can survive zero power supply failures. I've pitched many more dead power supplies than hard drives. If the power supply dies during a write (RAID + no battery backup + two parity drives), consider the array corrupted.
87. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-04 04:58 · Score: 1
  
  Please learn more about that which you are speaking. If you have a simle RAID 10, that is 2+2 with 1a, 1b, 2a, 2b you can lose ANY 2 drives without losing data. If you lose 1a and 1b, then 2a and 2b will have identical data to the 1a and 1b drives. Your issue is what would happen in a RAID 0+1 when a drive from both arrays has malfunctioned.
  Also, in my scenario, there is no parity checksum that needs calculating. The writes are just writes and the reads can occur either on a per request or multiple request fulfillment paradigm.
88. Re:A Very Shortsighted Article by badkarmadayaccount · 2009-09-04 22:46 · Score: 1
  
  Any takers to add GPGPU support to Solaris and/or *BSD to offload the XORs? That ought to bust the hardware RAID market wide open.
  
  --
  I know tobacco is bad for you, so I smoke weed with crack.
89. Re:A Very Shortsighted Article by DamnStupidElf · 2009-09-08 12:44 · Score: 1
  
  RAID10 is also striped and all the disks will be in use during a bulk write. Either way, it doesn't matter (for bulk writes) on RAIDs with more than 4 disks because the ratio between data and parity/redundancy increases with RAID6 and stays 1:1 for RAID10. In a 16 disk array of 100MB/s disks, RAID10 can write a maximum of 800MB/s of data to the disks, but RAID6 can write 1400MB/s of data. Like I said, bulk writes are where RAID6 beats RAID10. RAID10 can read 1600MB/s of data versus 1400MB/s for RAID6, and small reads and writes are always going to be faster on RAID10. If you want equal read and write performance for bulk data or just want maximum storage efficiency, RAID6 is the way to go. Otherwise use RAID10.
90. Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-08 16:49 · Score: 1
  
  Yes it is, but the issue with RAID 10, while it costs more in terms of raw monetary outlay and disk available, the benefit is that it can not only continue its performance lead during failure, but also allow easier migration of data, move half the array to another machine and be up with the full data, in a pinch. As well, RAID 5 and RAID 6 both suffer from terrible performance in degraded mode. There is a hit in performance with RAID 10, however not nearly as large. Also, your example assumes synchronous writes on RAID 10. The great thing about it is that you can write to either disk set, as there is not a truly active/passive nature to it, and depending on the algorithm used that is just what happens. Dual writes can be written and mirrored back to each other from cache, however that is not atomic and can cause issues with coherency during a failure. You are also only taking into account raw throughput. The overhead of having to take 14 drives and calculate the Xors for each write, to two drives, to the stripe size, will eliminate the chance of pegging the bus to each drive. As well, in their scenario, they are using multiport, which will also eliminate that advantage, though that is only in their implementation and not all should be judged by the limitations they imposed on themselves. You can eliminate some of the performance issues with RAID 6 by adding more drives, but also, in adding drives you also increase the possibility of a failure. With RAID 10 you are roughly in the same place. As long as you do not lose both data sets you do not lose data and the rebuild process is also a lot less taxing and impacts performance a lot less. Is this all splitting hairs? of course. At that point, I doubt a serious bottleneck is the issue. I would be more inclined to think that the fault tolerance of such a system is wonked out. Sure, they spent the money on fancy cases that do lovely things like brace the drives, but what happens to sagging power to the drives or losing power to one of the supplies. What sort of ILO do they have for management? What sort of central tool do they use to keep it all straight?
Ripoff by asaul · 2009-09-02 02:14 · Score: 4, Insightful

Looks like a cheap downscale undersized version of a Sun X4500/X4540.
And as others have pointed out, you pay a vender because in 4 years they will still be stocking the drives you bought today, where as for this setup you will be praying they are still on ebay

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
1. Re:Ripoff by Anonymous Coward · 2009-09-02 02:29 · Score: 3, Insightful
  
  why wouldn't you just build an entirely new pod with current disks and migrate the data? You could certainly afford it.
2. Re:Ripoff by Anonymous Coward · 2009-09-02 02:38 · Score: 1, Interesting
  
  No, it's the google model: when a drive dies it's dead and doesn't matter anymore; when a server dies it's dead and doesn't matter anymore. The infrastructure built on top of the pods takes care of replicating data so a failure only removes one of several copies of the data.
3. Re:Ripoff by pyite · 2009-09-02 02:38 · Score: 1
  
  why wouldn't you just build an entirely new pod with current disks and migrate the data? You could certainly afford it.
  Maybe because there's no need to update and you just want to be able to replace broken drives?
  
  --
  "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
4. Re:Ripoff by timeOday · 2009-09-02 02:42 · Score: 5, Interesting
  
  Depends on how it works. Hopefully (or ideally) it's more like the google approach - build it to maintain data redundancy, initially with X% overcapacity. As disks fail, what do you do then? Nothing. When it gets down to 80% or so of original capacity (or however much redundancy you designed in), you chuck it and buy a new one. By then the tech is outdated anyways.
5. Re:Ripoff by sarkeizen · 2009-09-02 02:52 · Score: 1
  
  This is the oft repeated rationale. Personally...I don't see it as so cut-and-dried. Four years from now you may throw this thing away but it also realizes it's ROI way sooner than the branded hardware + support contract (considering that the cost of support increases over time it's always possible that you will NEVER get a positive ROI on a product). The truth is to get the most out of your money you have to run the numbers in each case. Not only that but you should do so at each renewal period. For example we own a plethora of Nortel equipment much of which is still useful but is also EOL. We pay a premium in support for these products many of which could be had cheaply in the secondary market (this is not limited to Ebay BTW). These devices are part of a much larger system so system replacement is expensive. The correct solution is to budget for replacement (outright or incremental), calculate your failure rate and buy and store replacement units (don't forget to calculate disposal costs). Instead the admins act stupidly they request hundreds of thousands of dollars to replace the system outright right away. When I ask them to justify this they hem and haw about labour used in maintaining the system, or service interruptions (implicitly falling for the logical fallacy of: 'newer is better"). However they never seem able to come up with figures for this. i.e. How much time do you spend resetting this hardware when it fails? How much downtime do we incur with it?
  
  So we replace it...incurring significant downtime of course.
  
  Anyway all that said I think their device has merit...I think for smaller shops having redundant power would be useful.
6. Re:Ripoff by afidel · 2009-09-02 02:53 · Score: 1
  
  Uh, for the difference in price you buy and could build 3x the number of nodes needed and keep them powered off and still come out hundreds of thousands cheaper. In reality you might need a say 20% extra nodes and about the same in spare HDD's over the 5 year life of the system (any more than 5 years and it's probably not worth the power to keep them going). I have to question why they put the OS on a single HDD, flash would have been cheaper and more reliable. I also have to wonder WTF is up with using non-ES drives, the ES drives only cost a couple percent more and are actually built to run 24x7. Oh and anyone running a storage business with non-ECC ram is NOT someone I'm going to trust my data to!
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
7. Re:Ripoff by Delgul · 2009-09-02 02:54 · Score: 1
  
  Yeah... therefore, what we do at our company is to buy extra drives and put them on the shelf. For the money you save you can easily put a replacement drive (or even two or three, but this is overkill) on the shelf for every drive you put in the array. You will still be saving _massive_ amounts of money...
8. Re:Ripoff by Anonymous Coward · 2009-09-02 02:55 · Score: 1, Interesting
  
  Looks like a cheap downscale undersized version of a Sun X4500/X4540.
  Or, if you also want software in appliance form, along with flash accelerator drives and support, the Sun Storage 7210 which holds 46 TB in its 4U chassis and is expandable to 142 TB.
  Sun has been undercutting NetApp prices with these ZFS-based "Unified Storage" systems, especially since they don't charge for software features (NFS, CIFS, HTTP, replication, etc.) separately like NetApp does.
  By the way, if you want to try the software, there's a VMware/VirtualBox VM image of the storage appliance. You can replace the simulated drives with real ones if you like.
9. Re:Ripoff by zcold · 2009-09-02 03:03 · Score: 1
  
  this is scalable... just upgrade the motherboard and such...
  
  --
  you know you can fry stuff putting things into things that dont like the things you put into it...
10. Re:Ripoff by ciroknight · 2009-09-02 03:17 · Score: 2, Informative
  
  Since most modern commercial-grade HDs come with a 3-5 year or better warranty these days [1], it's easier just to cash those in when the drives go bad and build a new box around the newer-model drives they ship you in return.
  
  This is truly RAID, as Google, etc. have realized and developed. When the drives die, you don't cry over having the exact same drive stocked. You don't cry at all. At $8k a machine, you could actually afford to flat-out replace the entire box every 4 years and not affect your bottom line (since, you know, you're saving better than three times that by not going with one of the 'cloud vendors').
  
  --
  "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
11. Re:Ripoff by Smelly+Jeffrey · 2009-09-02 03:28 · Score: 1
  
  If it doesn't have redundant power supplies, then it doesn't belong in a server rack!
12. Re:Ripoff by nine-times · 2009-09-02 03:37 · Score: 1
  
  I think this is a good point. It's kind of a brute-force approach to the problem, which isn't the best strategy for everyone, but when you're paying 1/10th the price, you can buy 10 spare parts of everything before you even break even. You can have 4 of these things running at different locations and still be paying half-price. If it does all break down in 5 years, storage will be much cheaper and more dense at that point, and you can replace it really cheaply.
13. Re:Ripoff by PiSkyHi · 2009-09-02 04:04 · Score: 1
  
  They are using software RAID, they can manage data on different sized disks without much of a hit.
14. Re:Ripoff by sarkeizen · 2009-09-02 04:19 · Score: 1
  
  It depends - again it's a question of the numbers. Although there is some good evidence that a PS is the most likely part to fail in a rig and as such you should know what your plan is for a failure. A large storage setup could provide similar mitigation with multiple units assuming your storage pool is large enough or a less critical storage pool could be mitigated with a cold spare.
15. Re:Ripoff by moosesocks · 2009-09-02 05:05 · Score: 1
  
  Since most modern commercial-grade HDs come with a 3-5 year or better warranty these days [1], it's easier just to cash those in when the drives go bad and build a new box around the newer-model drives they ship you in return.
  A word of caution: I had an external Seagate enclosure fail on me earlier this year. Because I've seen plenty of enclosures fail (as opposed to the drives inside of them), I opened it up, and installed the drive internally. Alas, the drive was as dead as a doornail.
  Return the drive to Seagate, wait two months, and finally receive the same exact drive back in the mail, because I'd voided my warranty by attempting to recover my data.
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
16. Re:Ripoff by Compuser · 2009-09-02 05:26 · Score: 1
  
  I am missing something here. Why can't you use newer drives as time goes on? Is there no software RAID solution that allows upgrades on the fly?
17. Re:Ripoff by waveclaw · 2009-09-02 07:00 · Score: 1
  
  you chuck it and buy a new one. By then the tech is outdated anyways.
  Only one problem with this "disposable IT" model:
  'It just works' has been IT job security since before Gates and Moore thought x86 was a good idea.
  Outdated is no reason to not continue to pour millions of (otherwise profit) monies into supporting something.
  Hands up for those of you who didn't start a new job at a place with a ancient white elephant?
  You Novel Netware people with 'end-of-life a decade ago and still can't turn it off' servers can put your hands down too. Same for you Windows admins trying to hide those desktop towers running Windows 95 for some ugly little app by a company that died before google.com even got registered in DNS.
  However, I'm betting someone corporate could mention this to their EMC or netapp sales rep and get quite a few free nice lunches out of it.
  
  --
  
  "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
18. Re:Ripoff by PAjamian · 2009-09-02 12:46 · Score: 2, Interesting
  
  Fine then, replace just the broken drives but as far as I'm aware Linux software raid 6 does not require the drives be the same model, or even the same size. You can get newer drives for the same or less cost as the old drives and just plug them in. Who cares if they have more capacity? Just let it go to waste if you must but it'll work just fine and certainly you won't have to be scrounging drives off of ebay.
  Also consider that five years down the road we may have 10tb drives or better, but 1.5 tb drives should still be available on the consumer market (and keep in mind these are cheap consumer drives) for dirt cheap and these guys will probably be quite happy to use their same design with newer high capacity drives available at the time.
  
  --
  Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
19. Re:Ripoff by Wolfrider · 2009-09-03 07:03 · Score: 1
  
  --I'll just say this: the day when we can leave spinning physical disks behind and get to something like cheap, massive SSD (but without the limited-write-cycle penalties) - it will be a Good Day(TM).
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
That's great but what about all the hidden costs? by Desler · 2009-09-02 02:15 · Score: 1, Insightful

That's all fine and dandy but where is my support going to come from when this server has issues? Are they throwing in for free maintenance and upgrades to this server when it no longer meets requirements? If not, this figure is highly disingenuous.
Cool. by SatanicPuppy · 2009-09-02 02:16 · Score: 1, Interesting

Nominally a Slashvertisement, but the detailed specs for their "pods" (watch out guys, Apples gonna SUE YOU) are pretty damn cool. 45 drives on two consumer grade power supplies gives me the heebie jeebies though (powering up in stages sounds like it would take a lot of manual cycling, if you were rebooting a whole rack, for instance), and I'd be interested to know why they chose JFS (perfectly valid choice) over some other alternative...There are plenty of petabyte capable filesystems out there.
Very interesting though. I tried to push a much less ambitious version of this for work, and got slapped down because it wasn't made by (insert proprietary vendor here). Of course, we're still having storage issues because we can't afford the proprietary solution, but at least there is no non-branded hardware in our server room.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
1. Re:Cool. by XorNand · 2009-09-02 02:28 · Score: 1
  
  It's not all that interesting, IMHO. If you read the description, all network I/O is done using HTTPS. The comparison to Amazon's S3 is fair, but it's ridiculous to compare this to NetApp or any of the other SANs they have listed; no iSCSI, no fiber channel.
  
  --
  Entrepreneur : (noun), French for "unemployed"
2. Re:Cool. by SatanicPuppy · 2009-09-02 02:50 · Score: 1
  
  67 terabytes for under 8000 dollars isn't interesting? Ooookay...
  I don't give a damn about iSCSI; this isn't a database server, it's just a flat data file server...Most datacenters are limited by their network bandwidth anyway, not their internal bandwidth, and https isn't any worse than sftp. Paying Amazon a thousand times more, and I'd still be limited by MY bandwidth, not their internal bandwidth.
  If they can deliver more storage for less price, then more power to 'em.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
3. Re:Cool. by TooMuchToDo · 2009-09-02 02:53 · Score: 1
  
  Really? Fiber channel tops out at what? 4Gb/sec? 8Gb/sec? Distribute your data chunks across enough chunk servers, and you can easily compete against that much cheaper.
  Disclaimer: I'm currently doing HPC work at a US accelerator lab as part of one of the LHC experiments. I know how to move data around *fast*.
4. Re:Cool. by denobug · 2009-09-02 03:36 · Score: 1
  
  Very interesting though. I tried to push a much less ambitious version of this for work, and got slapped down because it wasn't made by (insert proprietary vendor here). Of course, we're still having storage issues because we can't afford the proprietary solution, but at least there is no non-branded hardware in our server room.
  The reason the brand name products are purchased are because most of the support staff are either short-handed, lack of time, or still in need of additional training. Dealing with experts on final product designs with high scalability factor will inevitabally encourage custom made product, since it will be cheaper once it is being deployed. Call it the google model if you like. This is really the basis of industrial engineering on a fairly stable statistical modeling. Nothing has really changed except where the theory is being applied.
5. Re:Cool. by bucky0 · 2009-09-02 04:32 · Score: 1
  
  Interesting. I'm doing HPC work at a US accelerator lab too. Looking from your commments, are you one of the guys that's using HDFS at a pretty large Tier-2?
  
  --
  
  -Bucky
6. Re:Cool. by TooMuchToDo · 2009-09-02 04:38 · Score: 1
  
  Couldn't use HDFS due to lack of HSM, so I do HDFS on my own time.
7. Re:Cool. by Wolfrider · 2009-09-03 03:01 · Score: 1
  
  --There was a Slashdot review article on Linux filesystems a while back that made me switch from Reiserfs v3 to JFS for practically everything (except " / " partition - tail packing is great for root - and squid.)
  --JFS is low-cpu and FAST - especially on Firewire/USB external drives. Make sure you enable " noatime " in etc/fstab and do some speed experiments.
  ' mkfs.jfs /dev/sdXX '
  ' mkdir /mnt/jfs '
  ' mount /dev/sdXX /mnt/jfs -onoatime '
  ' cd /mnt/jfs '
  ' time (dd if=/dev/zero of=tmpfile bs=1M count=500;sync) '
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
8. Re:Cool. by Wolfrider · 2009-09-03 03:16 · Score: 1
  
  --Citation:
  http://hardware.slashdot.org/story/04/05/11/134214/Linux-Filesystems-Benchmarked?art_pos=34
  -- ' fsck ' on JFS filesystems is the fastest I've seen, as well -- one more reason to use it. If you look at the JFS tree structure, it's really quite elegant. :-)
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
It's all clear now. by grub · 2009-09-02 02:17 · Score: 4, Funny

AHhh, this is why the EMC guy committed suicide. It wasn't because he was dying of cancer.

--
Trolling is a art,
My math is a bit rusty... by Anonymous Coward · 2009-09-02 02:18 · Score: 1, Funny

...but that doesn't add up. $7,867 / 67 petabytes = $117.42/petabyte, not $117,000/petabyte.
Perhaps they were using the 'new' math.
1. Re:My math is a bit rusty... by Desler · 2009-09-02 02:21 · Score: 5, Informative
  
  It's not your math that's rusty it's your reading skills.
  
  Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.
2. Re:My math is a bit rusty... by ShadowRangerRIT · 2009-09-02 02:24 · Score: 2, Informative
  
  You misread. It's $7,867 per 67 terabytes. So at the hard disk standard for a petabyte (base 10, not base 2), 1000 TB == 1 PB:
  (1000 TB / 67 TB) * $7,867 = $117417.91
  
  --
  $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
3. Re:My math is a bit rusty... by Hijacked+Public · 2009-09-02 03:51 · Score: 1
  
  We need to get that guy into the accounting department at NetApp so he can price out their storage.
  
  --
  "Sacrifice for the good of The State" - The State
My plan comes to fruition! by elrous0 · 2009-09-02 02:20 · Score: 5, Informative

Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.

--
SJW: Someone who has run out of real oppression, and has to fake it.
1. Re:My plan comes to fruition! by ShadowRangerRIT · 2009-09-02 02:27 · Score: 3, Funny
  
  But what about storing the new episodes in HD? Clearly a masterpiece of TV such as this should not be stored at mere SD quality!
  
  --
  $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
2. Re:My plan comes to fruition! by RMH101 · 2009-09-02 02:29 · Score: 4, Funny
  
  I think we have a new metric unit of storage, to rival the (now deprecated) Library Of Congress SI unit.
3. Re:My plan comes to fruition! by snspdaarf · 2009-09-02 02:31 · Score: 1
  
  I wouldn't watch Genital Hospital with a gun to my head! Give me All My Children, or give me Death!
  
  Well, maybe Tea and Cake instead of Death, but you get the idea.
  
  --
  Why, without your clothes, you're naked, Miss Dudley!
4. Re:My plan comes to fruition! by ari_j · 2009-09-02 02:33 · Score: 5, Funny
  
  Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
  And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.
  Of all the computer systems out there, yours is the one for which becoming self-aware terrifies me the most.
5. Re:My plan comes to fruition! by maxume · 2009-09-02 02:40 · Score: 1
  
  What's intimidating about a self-absorbed, over-acting computer?
  
  --
  Nerd rage is the funniest rage.
6. Re:My plan comes to fruition! by WMD_88 · 2009-09-02 02:42 · Score: 1
  
  General Hospital was only 30 minutes originally; it didn't become 60 until the late 70s. And even then, the number of commercials per hour has surely changed over time. So, your estimate is quite off. I prefer One Life to Live anyway ;D
7. Re:My plan comes to fruition! by Anonymous Coward · 2009-09-02 02:46 · Score: 1, Funny
  
  But we already have William Shatner.
8. Re:My plan comes to fruition! by elrous0 · 2009-09-02 02:46 · Score: 1
  
  I think you need to show more respect for a show that gave both Rick Springfield and John Stamos their acting debuts. These episodes also have incredible historic value. Years from now, when historians are needing footage of Demi Moore before plastic surgery, you'll thank me!
  
  --
  SJW: Someone who has run out of real oppression, and has to fake it.
9. Re:My plan comes to fruition! by maxume · 2009-09-02 02:49 · Score: 2, Interesting
  
  William Shatner has continued to be awesome into well into his 70s. He even went on Conan and mocked Sarah Palin (while gently ribbing himself).
  Of the personalities in Hollywood, he is one I like quite a bit.
  
  --
  Nerd rage is the funniest rage.
10. Re:My plan comes to fruition! by Junior+J.+Junior+III · 2009-09-02 03:04 · Score: 2, Funny
  
  I'm holding out for the porn version, Genital Horse Spittle.
  Great donkey scenes.
  
  --
  You see? You see? Your stupid minds! Stupid! Stupid!
11. Re:My plan comes to fruition! by ari_j · 2009-09-02 03:32 · Score: 1
  
  Agreed. Shatner is a perfect parody of himself and he improves everything he is in. Even if he's self-absorbed, he has a very good sense of humor about himself.
12. Re:My plan comes to fruition! by jedidiah · 2009-09-02 03:36 · Score: 1
  
  The HD Action show season and the HD Comedy show season.
  Now you just have to worry about shifting definitions of "season".
  That can vary between 39 to 22 or less episodes.
  Then there is codec details...
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
13. Re:My plan comes to fruition! by ari_j · 2009-09-02 03:37 · Score: 1
  
  Or the death of literature and fall of civilization. You'll have documentation of all these things, and more!
14. Re:My plan comes to fruition! by jedidiah · 2009-09-02 03:40 · Score: 1
  
  The repeated lynchings and bludgeonings by angry Trekkies must have helped...
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
15. Re:My plan comes to fruition! by MartinSchou · 2009-09-02 07:09 · Score: 2, Interesting
  
  You raise an "interesting" train of thought in my mind.
  Encoding in 720p x264 you get something like 45 minutes in 1.1 GB. This gives you 60,900 episodes per 4U unit or 609,000 episodes per 40U rack.
  In 1080p x264 you get something like 45 minutes in about 2.5 GB. This is 27,000 episodes per 4U unit or 270,000 episodes per 40U rack.
  Assuming 22 episodes per season and a five year average run time, you end up with 220 episodes per show (typical science fiction shows).
  Assuming 5 shows per week, 40 weeks a year, 10 year run time, you end up with 2,000 episodes per show (typical soaps).
  So you could easily store 100 full sci-fi shows and 100 full soaps on in one rack (that'd be 222,000 episodes), all stored in glorius 1080p.
  IMDb lists the following statistics:
  
  452,982 movies released theatrically.
  792,565 TV episodes.
  75,316 made for TV movies.
  61,440 TV series.
  77,624 direct to video movies.
  Leaving out "TV series" (they average 12.9 episodes/series, which seems reasonable with the amount of cancelled series) I'll make the following assumptions about average run time:
  Theatrical releases: 120 minutes
  TV episodes: 35 minutes
  TV movies: 90 minutes
  Direct to video: 100 minutes
  That's a total of 96,638,455 minutes. Encoding that in 720p would require 2,362,274 GB or 5,315,117 GB for 1080p.
  What's my point? Well, for one thing you couldn't ever watch it, as it's 183 years, so no, that wasn't my point ;)
  That it is entirely within the realm of feasibility to offer downloads of every single movie and tv-show on IMDb from a hardware point of view. One of the complaints I've heard from the production companies is that it would be impossible to set up the hardware needed for it. Even at Sun's prices, you'd "only" need to pay 10 million dollars to store everything in both 720p and 1080p quality. Set up redundant servers in 10 different locations, 5 in the US, 5 in Europe, and you're still only out 100 million dollars.
  From a cultural point of view, think of all the things that are lost when the copyright holders let these things rot away on shelves, throw it out or it's lost in some kind of calamity. And this is just movies and tv-shows. Add in music and news and I suspect you could easily get hugely redundant back-ups of it all for 1 billion dollars. Even if you had to replace the storage arrays every 3 years, it's still really really cheap. Figure twice that for maintenance, and we have an annual cost of about a billion dollars - cheap when we're saving all knowledge for our successors. That's roughly the cost of building 125 miles of rural freeway in Michigan. It'd be cheap at 10x the price. And in ten years - we will probably still be using high bit rate encoding (1080p+), but will the cost of storage still be as high? I suspect it'll slowly fall, slightly faster than inflation.
  Having to reencode everything from time to time, would obviously take a huge amount of time, but that is the price we pay for progress. On the other hand, even with 1:1 encoding time, it'd only take 183 computer-years to do it.
  Imagine what it would be like if 25 years from now your kids could, at the touch of a button, gain access to every bit of entertainment and news as from the last 25 years. I don't mean going to Wikipedia and looking up The Terminator but actually watch the film, read all the news about it, as it looked at the time, five years on, seven years on after Terminator 2: Judgement Day had its effect on the new franchise etc.
  Imagine them not having to settle for what history books said happened in the year 2010 or about specific events in that year, but be able to pull up every single news article and tv news report on the subject and make up their own mind, de
16. Re:My plan comes to fruition! by Zak3056 · 2009-09-02 12:24 · Score: 1
  
  Of all the computer systems out there, yours is the one for which becoming self-aware terrifies me the most.
  It wasn't his disk array that became self aware... it was his disk array's evil twin !!!!
  
  --
  What part of "shall not be infringed" is so hard to understand?
Disk replacement? by jonpublic · 2009-09-02 02:20 · Score: 3, Insightful

How do you replace disks in the chassis? We've got 1,000 spinning disks and we've got a few failures a month. With 45 disks in each unit you are going to have to replace a few consumer grade drives.
1. Re:Disk replacement? by markringen · 2009-09-02 02:23 · Score: 2, Informative
  
  slide it out on a rail, and drop in a new one. and there is no such thing as consumer grade anymore, they are often of much higher quality stability wise than server specific drives these days.
2. Re:Disk replacement? by TheGratefulNet · 2009-09-02 02:34 · Score: 1
  
  yeah, the lack of ANY kind of hot swap on those chassis is laughable.
  totally the wrong way to go. this guy is hell bent on density but he let that over-ride common sense!
  
  --
  
  --
  "It is now safe to switch off your computer."
3. Re:Disk replacement? by LordKazan · 2009-09-02 02:43 · Score: 1
  
  be like google - hardware redundancy and software handling the failover.
  take down the node with a bad drive, swap the drive, rebuild that pod's RAID (preferably i would RAID6 them as it has better error recovery than RAID5 at the expense of storage size being [drive size]*[number of drives - 2] instead of [drive size]*[number of drives - 1] of RAID5). when it comes back up it syncs to it's other copy.
  i would also get LARGE write cache drives and any databases would be running with LARGE ram buffers for performance.
  for the same price as you'd shell out for "professional grade hardware" you can get 5x the "consumer grade hardware" and that's more than enough to facilitate hot data redundancy and failover.
  your IT guy might even have something to do other than play World of Warcraft.
  
  --
  If you cannot keep politics out of your moderation remove yourself from the Mod Lottery.. NOW!
4. Re:Disk replacement? by maxume · 2009-09-02 02:54 · Score: 2, Informative
  
  It sounds like they just soft-swap a whole chassis once enough of the drives in it have failed.
  If their requirements are a mix of cheap, redundant and huge (with not so much focus on performance), cheap disposable systems may fit the bill.
  
  --
  Nerd rage is the funniest rage.
5. Re:Disk replacement? by TheGratefulNet · 2009-09-02 02:57 · Score: 1
  
  that's a LOT of drives to take offline if just 1 fails.
  really ugly design. very amateurish.
  there are bezels and frames that allow FRONT mount and hot swap.
  and btw, all the drives I see in commercial storage are notebook style (2.5") sas drives. I could not believe it (why not 3.5"??) but its a fact; small form factor sas drives are taking over. there must be a good reason for it or sun (et al) wouldn't be using those 'small drives'.
  
  --
  
  --
  "It is now safe to switch off your computer."
6. Re:Disk replacement? by N1ck0 · 2009-09-02 03:15 · Score: 1
  
  Depends on the software. If your data is distributed in redundant copies scattered across multiple chassis off-lining a handful of entire chassis or a few hours would just create a temporary performance decrease. Also this company is in the backup storage business, which usually means in-frequent requests for data. So at any given moment having 90% of your clients total backups online is usually considered acceptable in a disaster situation (as long as the chances that the same client's data being in the 10% from failure to failure are small). And since its all HTTP based redirecting and forwarding requests from offline sites to online ones is pretty trivial.
7. Re:Disk replacement? by Anonymous Coward · 2009-09-02 03:25 · Score: 1, Insightful
  
  It's the google model: you don't replace failed components. (This isn't meant for a case where you have 1 'server'; this is meant for when you have hundreds of these pods.) The labor is better served deploying a new pod with 45 new disks than replacing one disk in 45 pods.
8. Re:Disk replacement? by TooMuchToDo · 2009-09-02 03:28 · Score: 2, Interesting
  
  What kind of drives are you using? We've got 4800+ spinning drives, and we only have 1-2 failures a month.
9. Re:Disk replacement? by jonpublic · 2009-09-02 03:36 · Score: 1
  
  Seagate ST3500320NS. One batch of 500GB drives were particular horrible. Horrible horrible horrible in terms of failures. I think we are approaching 10% over 1 year. Which is ridiculous.
  Our disks are doing read / write operations 24/7. I dunno if that makes a difference.
  We've since switched to WD.
10. Re:Disk replacement? by TooMuchToDo · 2009-09-02 03:41 · Score: 1
  
  We're using Hitachi drives in Nexsans. Have been good to us (even though I'm surprised myself). I'm usually a WD or Seagate fan myself.
  We're about 20-30% write 70-80% reads. That could have something to do with it.
11. Re:Disk replacement? by fuzzyfuzzyfungus · 2009-09-02 03:51 · Score: 1
  
  I strongly suspect that the 2.5inch SAS based storage devices are aimed at a slightly different niche.
  
  For drives with very high rotational speeds, 10k+ RPM, they generally use 2.5-sized platters, even in 3.5 inch drives, because of the difficulty of making larger platters that work at those speeds. If you are using the smaller platters anyway, the only advantage of the larger housing is easier cooling. If you can get power draw low enough, or assume a sufficiently cooled enclosure, that doesn't matter. Smaller drives = more spindles per unit space. More spindles = more IOPs. For applications, like databases, where that is what really counts, 2.5 inch drives are the obvious choice(with an increasing amount of flash thrown in).
  
  For bulk storage, of no special speed, for low intensity file serving or as a replacement for tape backups, low rotational speed 5-7K RPM, 3.5inch drives with full size platters are far, far cheaper per gigabyte and as dense, or denser, per unit volume.
  
  You'd be insane to try to run a high throughput database of something like this, and you'd overpay horribly to run something like their backup service off of the 2.5inch SAS setups. Each has its niche.
12. Re:Disk replacement? by moosesocks · 2009-09-02 05:16 · Score: 1
  
  A few comments:
  1) They're a consumer-grade backup provider. Moreover, their service is very cheap compared to the competition.
  2) To lower the price of their service, they have to cut costs somewhere.
  3) Presumably, they have enough redundancy built into their software and network infrastructure to take several of these offline at a time. Even if a few customers backups go offline for an hour or so, the odds of anybody noticing seems incredibly slim. When you've got several dozen racks of these units, the prospect of taking one or two offline at a time doesn't seem so bad.
  4) Small drives are very fast, but also offer rather low capacity, and cost a lot. An online backup service doesn't need blazing-fast IO speeds at the disk level.
  5) Front hot-swap isn't really an option, given the number of drives, unless you wanted to completely eschew traditional rack design.
  That all said, these servers would be much more attractive if they could be slid out on their rails (while powered up), and had the ability to pop drives in or out from the top while still powered up. Honestly, I think they're missing a huge business opportunity by not selling these machines. Even with all of their shortcomings, they could make a killing.
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
13. Re:Disk replacement? by Culture20 · 2009-09-02 05:37 · Score: 1
  
  I agree with the ST3500320NS assessment. We're doing about 10% fail in one year too. One died in the first couple weeks. I'm almost absolutely certain it's in the controller boards. They look like they're arcing or something, so I now regularly take drives out for a visual inspection spot-check.
14. Re:Disk replacement? by MBGMorden · 2009-09-02 12:18 · Score: 1
  
  I think it's just a density issue. Some of our servers at work have come with the small drives when we were trying to pack as many of them as possible into a certain number of rack units. The ones I've specced out I've always went for 3.5" drives on, but I've not really hit a situation where I needed tons and tons of storage. 500GB to 1TB of space has been fine for me, and as of late 2 to 4 drives in a raid setup have delivered that just fine.
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
15. Re:Disk replacement? by Jaime2 · 2009-09-02 12:24 · Score: 1
  
  They don't do business the same way you do. If the monitoring tool shows a pod with 2 drives down, they just ignore it. Two RAID arrays out and they might replace it. When all three go, they pull the pod and replace with a new one. Then they scavenge the carcass for parts. The magic is in the software. They don't simply save files on servers, they have software that manages where things are stored and stores everything in multiple places.
  
  To give some perspective, at your stated one or two percent failure rate per month, a 45 drive pod would last over a year before all three RAID 6 arrays were likely to have failed. Just "refresh" the dead drives in each pod every three months and the likelihood of actually losing a whole pod is miniscule.
  
  In their world, a drive failure is a non-event. An array failure is an "indicator of possible proplems". A pod failure is a reason to schedule it for maintenance tomorrow. The only thing that might get them to sweat is if they lost 10% of their boxes simultaneously.
  
  There are two ways to run a reliable business. Either buy expensive reliable stuff and watch it like a hawk, or buy cheap stuff and make it redundant and self-healing.
16. Re:Disk replacement? by rcw-home · 2009-09-02 16:30 · Score: 1
  
  With 15 1.5TB drives in a RAID6, I wouldn't bother. It's just too likely that one or more drives will not merely fail but return garbage data, which RAID3/4/5/6 will propagate across the array on rebuild.
  If you do this, don't focus on building systems redundant. Focus on building redundant systems. Make your software tolerate losing any one array or server. If you lose an array, don't rebuild; start again from scratch.
  Also, it's quite clear that these servers are not particularly speed-oriented. They have 15 drives serviced by a single PCI SATA controller (through SATA port multipliers). PCI's maximum bandwidth is 133MB/sec (33MHz, 4 bytes per cycle). Today, two drives could soak that with sequential reads and writes. That's fine for this company's users (who are backing up to these servers over their slow internet link) but if you're thinking of putting one in at your office to back up your 20TB fileserver, you better think long and hard about how much time you'll have to do a complete restore from it before you're filing unemployment claims. You could speed things up quite a bit by putting in two PCIe 24-port RAID controllers, dispensing with the port multipliers, and using RAID10, but that does add a few thousand to the price and cut the capacity quite a bit.
17. Re:Disk replacement? by petermgreen · 2009-09-02 21:52 · Score: 1
  
  and btw, all the drives I see in commercial storage are notebook style (2.5") sas drives.
  IIRC while they are 2.5 inch they are considerablly taller than modern notebook drives.
  IIRC 3.5 inch platters don't like being spun at 15k rpm so you either end up with 2.5 inch platters in a case big enough for 3.5 inch ones or you use a case that actually fits the platters. So for the applications that need incrediblly high speed 2.5 inch makes sense. OTOH if capacity matters more to you than speed then 3.5 inch 7200 or 10K rpm drives make more sense.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
wtf? by pak9rabid · 2009-09-02 02:23 · Score: 5, Insightful

FTA...

But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.
Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.
1. Re:wtf? by ShadowRangerRIT · 2009-09-02 02:33 · Score: 1
  
  RTFA. That $117,000 figure includes the whole rack, not just the raw HDs (which come to $81,000 according to their chart). They priced out everything in what they refer to as a "storage pod" in detail, so you can see for yourself. My primary concern is the fact that the boot disk (priced separately) doesn't appear to have a drop in back up. If one of the 45 storage HDs goes down, you can replace it (presumably it supports hot swapping), but if the boot drive goes you've got downtime.
  
  --
  $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
2. Re:wtf? by corsec67 · 2009-09-02 02:42 · Score: 1
  
  Looking at the case, where they have a vibration reducing layer of foam under the lid screwed down onto the drives, and with the pods stacked in the frame like they are, you have to pull a whole unit out anyways to replace a drive.
  So, no hot-swap of anything anyways. PSUs fail pretty commonly in my experience, and not only do they not have redundant PSUs, they have 2 non-redundant power supplies. (RAID 0 for PSUs..... what happens when the 12V rail gets a huge surge that fries the boards on all of the drives) They might have been better off using a RAID 0 in the pod, and mirroring stuff between pods, so that when they take a pod down for maintenance (or it goes *poof*), it has less of an impact.
  Also the design doesn't have any "Replace THIS DRIVE --->" indicators when they want to replace a drive, so they would have to hope the monkey gets it right in replacing drives/power supplies.
  
  --
  If I have nothing to hide, don't search me
3. Re:wtf? by pak9rabid · 2009-09-02 02:46 · Score: 1
  
  This is from someone who has to maintain these things, my Clariion is slower and harder to maintain than my Linux storage server. FC vs SATA, both over iSCSI. Ingenuity and innovation for the win.
  Inquiring minds want to know...why would yall spend the money on FC drives only to be run over iSCSI? Why not just use SATA drives in your Clariion? I'm sure they would have been cheaper.
4. Re:wtf? by SatanicPuppy · 2009-09-02 02:56 · Score: 1
  
  It's a little odd they didn't just choose to netboot, or boot off a cd or something. Having a boot drive at all seems like an unnecessary point of failure.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
5. Re:wtf? by Zerth · 2009-09-02 03:14 · Score: 1
  
  See, your problem is that you run drives on the 12v rail. They run theirs on 5v. Common mistake, could happen to anybody.
  And they do have pod level redundancy, they mention it at the end of the article.
6. Re:wtf? by Rich0 · 2009-09-02 07:08 · Score: 2, Interesting
  
  Yup.
  You can do even better than the price quoted in this article. On Newegg I found a 1TB drive for $95 - that is only $95k/PB. What a bargain!
  Except that I don't have a PB of space with my solution. I have 0.001PB of space. If I want 1PB of space then I need hundreds of drives, and some kind of system capable of talking to hundreds of drives and binding them into some kind of a useful array.
  This sounds like criticizing the space shuttle as being wasteful as you can cover the same distance in a truck for 1/10000000 x the cost. Except of course for the minor detail that the truck can't fly in space, and can't do all that distance on a single load of fuel in a few hours.
  Or, I can generate completely green energy at a very low price per gigawatt using a small generator and a hamster wheel. Except that I'm not generating a gigawatt - I'm generating maybe a few mW and scaling it up. Unless I bury China in rats I'm not going to be competing with the Three Gorges Dam.
7. Re:wtf? by phoenix_rizzen · 2009-09-02 16:01 · Score: 1
  
  This is where CompactFlash-to-SATA adapters come in handy. Especially when they are small enough that you can put two into the case, and RAID1 them for the boot drive. 4 GB is plenty for a boot drive and OS install.
  Or, if you're really on a budget, grab a pair of 2 or 4 GB USB sticks, mirror them together, and boot off those.
  Having a non-redundant boot drive is just ridiculous in a storage box like that.
8. Re:wtf? by petermgreen · 2009-09-02 22:06 · Score: 1
  
  See, your problem is that you run drives on the 12v rail. They run theirs on 5v.
  Most drives need both, I don't thing theese are any exception (in TFA they claim to use "4 pin molex" connectors to connect the backplanes to the power supply)
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
9. Re:wtf? by petermgreen · 2009-09-02 22:13 · Score: 1
  
  Um..and what do you plan on running these disks with?.......The HD's are cheap compared to the other parts that create a storage system.
  That is the whole point of TFA, they have managed to make a machine to hold/power/access a shitload of drives on the cheap such that the cost of thier system is only 50% more than the raw drives.
  And what is more it's mostly off the shelf kit, afaict the only custom bits are the case and the PSU wiring harnesses.
  Granted it's not pretty and the higher level system will have to be able to tolerate storage pod downtime (but TBH you have to deal with that anyway, no server is perfectly reliable)
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
10. Re:wtf? by Wolfrider · 2009-09-03 09:09 · Score: 1
  
  --Given the thought they've put into this already, I wouldn't be surprised if they're actually using HDPARM to spindown the boot disk for most of the time. And they probably have a few pre-imaged drives lying around...
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
11. Re:wtf? by Wolfrider · 2009-09-03 09:14 · Score: 1
  
  --They might just not have thought of it. Article says they're looking for feedback and possible ways to improve; might be worth a few min to shoot them a note and see what they say... ;-)
  
  --
  .
  == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:That's great but what about all the hidden cost by CoolCash · 2009-09-02 02:24 · Score: 2, Informative

If you check out what the company does, they are an online backup company. They don't host servers on this array, just backup data from your desktop. They just need massive amounts of space which they make redundant.
Yeah, but with Amazon you get FREE SHIPPING !! by Anonymous Coward · 2009-09-02 02:27 · Score: 2, Insightful

I love free shipping, even if it costs me more !! I like FREE STUFF !!
Re:That's great but what about all the hidden cost by hodagacz · 2009-09-02 02:28 · Score: 2, Insightful

They designed and built it so they should know how to support it. If someone else builds one, just learning how to get that beast up and running is excellent hands on training.
Not that shortsighted for their purposes by Overzeetop · 2009-09-02 02:30 · Score: 5, Insightful

Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).
These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.
FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).

--
Is it just my observation, or are there way too many stupid people in the world?
1. Re:Not that shortsighted for their purposes by TheGratefulNet · 2009-09-02 03:00 · Score: 1
  
  These guys are storage professionals with the hardware and software know how to pull it off.
  I'm laughing REALLY hard now.
  this is a home hack. nothing more. drives 'locked into' a chassis' insides. harumph!
  laughable. truly laughable.
  they went for balls-to-the-wall density but they forgot about serviceability!
  
  --
  
  --
  "It is now safe to switch off your computer."
2. Re:Not that shortsighted for their purposes by SatanicPuppy · 2009-09-02 03:16 · Score: 1
  
  People made the same argument toward Google, when they were using off-the-shelf commodity hardware to run their search operation. Do you think they use that sort of hardware anymore? But deploying it quick got them in the game.
  Adding hotswappable drive trays to a server triples the cost. If you can just triple the number of servers instead, you can come out ahead, at least in the short term.
  Maintenance would be an issue though...I'd hate to be the poor bastard tasked with pulling the node out of the rack and taking it apart to find the bad drive.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
3. Re:Not that shortsighted for their purposes by TheGratefulNet · 2009-09-02 03:23 · Score: 1
  
  Adding hotswappable drive trays to a server triples the cost.
  I don't believe this. can you support this assertion of yours?
  I personally use a bunch of these:
  http://www.newegg.com/Product/Product.aspx?Item=N82E16817332010
  they're not enterprise quality but they do work quite well for what they are. they give decent (but loud) cooling, they allow actual hot swap and they do have temperature sensors (beeping, at least; but no system level alerts).
  
  --
  
  --
  "It is now safe to switch off your computer."
4. Re:Not that shortsighted for their purposes by mad+flyer · 2009-09-02 03:24 · Score: 1
  
  Agreed... SYBA is one of those shittiest brand around for no good reason. Their are notorious for design flaw... like basic design flaws. Last time I tried to use their fire wire pci car for networking the result was just pop and smoke, on 3 different cards. Seems they got the power feature design of the firewire bus wrong. Luckily it only fired their cards. Not the MB.
  Do they really use stuff like this for pro installation ?
  http://www.area-powers.jp/product/pcie/sata/31322ir.htm
5. Re:Not that shortsighted for their purposes by SatanicPuppy · 2009-09-02 03:32 · Score: 1
  
  Well, I'll qualify that by saying I've never built my own hotswap system, but my experience says that drives usually cost about half of what a stupid removable drive TRAY costs, and the servers with externally accessible drive rails tend to cost more than their counterparts as well.
  There was some discussion below that this design actually includes hotswappable drives, so I'll just back off on that assertion. I could very well be wrong, and it could be that the vendors I'm using are raping me on the costs.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
6. Re:Not that shortsighted for their purposes by TheGratefulNet · 2009-09-02 03:40 · Score: 1
  
  the cost of components on hot-swap is very low.
  $100 for 4 drive spaces (in the pc space of 3 bays) is a good density factor.
  I've been running 2 stacks of these (for 8 drives) in my e6400 dualcore system and an intel x975 motherboard. that mobo has 2 sata controllers, ich7r (r for ahci, hot swap) and a marvell controller that also has 4 sata ports on the mobo (also ahci). I get all 8 drives in my software md0 raid with NO pci cards needed (no cards = more reliability; less mechanical contacts to worry about (pcie)).
  a large tower (antec 900) holds them:
  http://www.flickr.com/photos/linux-works/1523567231/
  for home use, its really great!
  
  --
  
  --
  "It is now safe to switch off your computer."
7. Re:Not that shortsighted for their purposes by TheGratefulNet · 2009-09-02 03:42 · Score: 1
  
  meant to include this link, as well, which shows more of a build-in-progress of my server:
  http://forums.dpreview.com/forums/readflat.asp?forum=1004&message=25140889
  
  --
  
  --
  "It is now safe to switch off your computer."
8. Re:Not that shortsighted for their purposes by Overzeetop · 2009-09-02 07:44 · Score: 1
  
  Yes, put can you put 45 of them into a 4U space? For these guys - and most multi-PB operations I suspect - floor space is at a significant premium. I think the purpose behind skipping the hot swap (which is nice if you can afford it) was to maximize density. I've got an unRaid array somewhat similar to yours, and I also used all on-board controller slots. Then again, I've only got 4TB of space, and don't expect to need to expand soon (my whole DVD collection is on it, and now that I've populated my back catalog I only buy 12-15 discs a year).
  Even at $100 for 4 drives - and not enterprise quality at that - that's a 20% cost increase over the basic. Not a tripling as the GP suggested, but a pretty high number. With dual redundancy (R6), good monitoring software, and sliding rails the maintenance won't be that bad. Probably worth $20k/PB in savings given actual HD failure rates.
  
  --
  Is it just my observation, or are there way too many stupid people in the world?
9. Re:Not that shortsighted for their purposes by turbidostato · 2009-09-02 08:43 · Score: 1
  
  "you slide it out, pop off the top cover (I've seen both 2/3 length and full-length covers that come off), and work on the guts of the server. Pop the lid back on, and slide it back into the rack."
  Good look gaining access to a disk on the middle raw or, god forbids, to the backplane under them.
  No; my bet is that as soon as they detect some problem they turn down the whole box and take it directly to the lab without trying to reservice it on field. After all it is not as if they'd had one expensive chunk of hardware they need five nines on, but just a "brick" among dozens of twins with a software layer on top ready to manage a whole node down here and there. Why work within a fridge about 100dB loud when you just can take it on a cart to your lab and redeploy it as a new in a few days?
  Some others have told they should have used hot-swappable drives; what they fail to understand is that here the "hot-swappability" is not at the disk level but at the whole computer level.
10. Re:Not that shortsighted for their purposes by sloth+jr · 2009-09-02 10:38 · Score: 1
  
  Your point is legit - except that Google (and Yahoo, and Facebook, and ...) most certainly continue to use off-the-shelf commodity hardware, because they understand how best to create horizontal scalability for their application needs.
  
  The main point with large systems is that everything, I mean everything, fails, no matter how trivial: passive backplanes, power cables, physical failure, etc. You don't have time when you're dealing with a cluster of 100 or 400 thousand machines to diagnose - you just detect the problem, yank it out of the cluster, fix it whenever (or throw it away if it's more cost efficient) and throw in a new box.
they are missing hardware mgmt by TheGratefulNet · 2009-09-02 02:32 · Score: 5, Interesting

where's the extensive stuff that sun (I work at sun, btw; related to storage) and others have for management? voltages, fan-flow, temperature points at various places inside the chassis, an 'ok to remove' led and button for the drives, redundant power supplies that hot-swap and drives that truly hot-swap (including presence sensors in drive bays). none of that is here. and these days, sas is the preferred drive tech for mission critical apps. very few customers use sata for anything 'real' (it seems, even though I personally like sata).
this is not enterprise quality no matter what this guy says.
there's a reason you pay a lot more for enterprise vendor solutions.
personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.

--

--
"It is now safe to switch off your computer."
1. Re:they are missing hardware mgmt by N1ck0 · 2009-09-02 02:58 · Score: 4, Insightful
  
  Its better at what they need it for. Based on the services and software they describe on their site, it looks like they store data in the classic redundant chunks distributed over multiple 'disposable' storage systems. In this situation most of the added redundancy that vendors put in their products doesn't add much value to their storage application. Thus having racks and racks of basic RAIDs on cheap disks and paying a few on-site monkeys to replace parts is more cost effective then going to a more stable/tested enterprise storage vendor.
2. Re:they are missing hardware mgmt by TheSunborn · 2009-09-02 03:03 · Score: 1
  
  But the question is: Is it worth paying 100000$ more for a sun box with the mentioned features? I mean yes they are nice features but still.
  They don't say much about their software, but I guess they run the boxes in some kind of raid 1/storage rotation, where all data are stored on more then 1 box. One thing I do not understand in their setup, is the lack of spare drives.
3. Re:they are missing hardware mgmt by SatanicPuppy · 2009-09-02 03:04 · Score: 5, Informative
  
  This sort of attitude is how Sun got it's lunch eaten in the market in the first place.
  Yes, your hardware rocks. It's so fucking sexy I need new pants when I come into contact with it.
  It also costs more than a fucking italian sports car.
  Turns out that if your awesome hardware is 10 times better than commodity hardware, but also 25 times as expensive, people are just going to buy more commodity hardware.
  I've got some Sun data appliances and I've got some Dell data appliances, and the only difference I've seen between them is purely one of cost. The only thing that ever breaks is drives.
  
  --
  ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
4. Re:they are missing hardware mgmt by Anonymous Coward · 2009-09-02 03:09 · Score: 2, Informative
  RTFA - they are not saying one of these is a mission critical enterprise storage system. In fact they said:
  
  No One Sells Cheap Storage, so We Designed It
  When you are talking about multiple petabyte scale paying 5x as much for 5 temperature sensors, SAS drives, LEDs etc becomes pretty stupid.
  Treat the 67TB system as an $8,000 hard drive.
  Deploy a few tens or hundreds of them with redundancy between them.
  In 2-3 years when they start to fail, replace them with a larger capacity drives.
  ???
  Take your hundreds of thousands of dollars not payed to SUN, IBM, EMC, NetApp etc and PROFIT!!!
5. Re:they are missing hardware mgmt by swillden · 2009-09-02 03:12 · Score: 4, Insightful
  
  personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.
  I don't these folks guy believe their solution is better -- just cheaper. MUCH cheaper. So much cheaper that you can employ a team of people to maintain the "homebrew" solution and still save money.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
6. Re:they are missing hardware mgmt by TheGratefulNet · 2009-09-02 03:14 · Score: 1
  
  if one of the many power supplies goes down, you get an 'event'. standard linux does not have stuff like this.
  on some sun boxes, you even have redundant power cords (2 cords on a single PSU). you can put each cord on its own redundant UPS.
  you can get alerts as things *start* to fail, too. linux does not have any notion of trending hardware failures.
  I can't speak to the money side of things; but I surely see the technical benefits of enterprise designed storage systems. I see it from the inside and all the thought that went into it. I realize not everyone has this visibility but a 'storage box' is WAY more than cpu, wires and drives.
  it IS about management, in a lot of cases.
  and its not stricly linux's fault; its the pc architecture that has stayed way behind on physical asset mgmt.
  
  --
  
  --
  "It is now safe to switch off your computer."
7. Re:they are missing hardware mgmt by dkf · 2009-09-02 07:44 · Score: 1
  
  Thus having racks and racks of basic RAIDs on cheap disks and paying a few on-site monkeys to replace parts is more cost effective then going to a more stable/tested enterprise storage vendor.
  Maybe. But you get to worry about what happens when shit happens and you get multiple simultaneous failures. Yes, this really does happen. I've seen the strain of rebuilding a RAID-6 cause several disks in the array fail at once. What's more, the silly fools who owned the data in there had decided to not keep backups - "after all, it's ultra-reliable RAID-6!" - so they lost over 5 years of irreplaceable scientific data. We weren't very sympathetic.
  Given this sort of thing does happen, a big fat RAID isn't the solution to all your storage problems. You need backups as well as redundancy. Alas, backups are fairly expensive, especially as data sizes go up. (I'm not sure in general why this is so; I know for one particular enterprise system, but that's got some kind-of embarrassing aspects, so I'm not going to generalize...)
  
  --
  "Little does he know, but there is no 'I' in 'Idiot'!"
8. Re:they are missing hardware mgmt by BobMcD · 2009-09-02 08:50 · Score: 2, Funny
  
  And speaking of sexy, sports cars, and Sun, there is one huge factor that sets apart the purchase decisions -
  Sun has nothing on Ferrari for getting you laid.
9. Re:they are missing hardware mgmt by deanoaz · 2009-09-02 09:16 · Score: 1
  
  >>> very few customers use sata for anything 'real'
  
  You mean the Sun 7310 we just bought isn't a real enterprise storage array?
  
  --
  If 'the people' in Amendment 2 are 'the state' then Amendments 1, 2, 4, 9, and 10 benefit the state, not you.
10. Re:they are missing hardware mgmt by rhizome · 2009-09-02 10:18 · Score: 1
  
  As another real-world counterexample, anybody who uses commodity storage architectures such as the one illustrated here can put the money they save into a backup solution. This would even allow an entire hot-failover array *and* tape/offline to fill in when the main one goes down, all for well under EMC/Sun prices.
  
  --
  When I was a kid, we only had one Darth.
11. Re:they are missing hardware mgmt by Jaime2 · 2009-09-02 12:56 · Score: 1
  
  You need all that stuff because you have a lot of little workloads that must all remain available. Don't misunderstand me, by little I mean less than a terabyte. If you lose one drive in a LUN, it's all hands on deck to get the leak plugged before a second drive fails and the CFO loses his SAP reporting database. These guys have one huge storage farm with thousands of drives that can tolerate a hundred drives failing without missing a beat. Why would they buy Sun gear if they designed the system to tolerate massive failures? More generally, why choose a fragile design supported by great storage when you can go with a great design built on redundant crap storage for one tenth the cost?
  
  What EMC and Sun need to realize is that as cloud computing technology starts to trickle down to the Enterprise, reliable storage is going to become an unnecessary luxury. The new distributed world will be designed to deal with whole systems coming and going from the cloud. The good news for them is that we are at least ten years from this stuff making a dent in the corporate world.
12. Re:they are missing hardware mgmt by evilviper · 2009-09-02 19:24 · Score: 1
  
  So much cheaper that you can employ a team of people to maintain the "homebrew" solution and still save money.
  ...as long as your data is worthless... ...and since it's their customers' data, that's probably true, as far as they're concerned (ToS and all).
  Those of us who have to maintain such hack-job storage systems know just what a nightmare it is. Drives reporting fine, until a power cycle when they come up as a broken and unrecoverable array... GAH!
  There has been a change in-kind in data storage over the past few years. Capacities have grown so significantly that once rare errors are now common, and what was previously a simple job has become a Herculean task.
  ZFS promises the world. Sadly, it's licensing is seriously limiting adoption, it's lack of data recovery and repair tools is shocking, and it's still a monster on hogging tons of memory, and will eventually get out of control and cause the system to crash. So the dream of just plugging in another drive when you need more space, and not having to worry about anything, remains unfulfilled. Btrfs remains a very long ways off, and there's little reason to believe it will be notably better on any count.
  So, while you can get as many nice cheap drives as you want, the limitation is in the software, and there's no cheap solution out there. Anyone with significant storage needs remains tied to companies like NetApp, where the software works, but its only available tied to the ridiculously expensive hardware.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
13. Re:they are missing hardware mgmt by petermgreen · 2009-09-02 22:30 · Score: 1
  
  As I see it there are two approaches to designing a big deployment.
  One is to try and make all the hardware as reliable as possible by using high quality parts and by demanding of features that allow problems to be detected early and redundancy to be restored quickly with no downtime and minimal risk. The trouble with this approach is that even the best nodes will still die from time to time so you still need redundancy at higher levels for anything important.
  The other (the google approach) is to not worry too much about the reliability of individual nodes but instead design your system to tolerate node failure.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
14. Re:they are missing hardware mgmt by swillden · 2009-09-03 00:27 · Score: 1
  
  So much cheaper that you can employ a team of people to maintain the "homebrew" solution and still save money.
  ...as long as your data is worthless... ...and since it's their customers' data, that's probably true, as far as they're concerned (ToS and all).
  Nah. You just use redundancy to ensure reliability. Lots and lots of redundancy. MD-RAID and LVM offer all that's needed to make this work -- though doing it on a large scale requires lots of elbow grease.
  I agree that "Just plug in another drive" is a far cry from what you have to do with these tools, but you can make it work.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
15. Re:they are missing hardware mgmt by evilviper · 2009-09-03 07:33 · Score: 1
  
  You just use redundancy to ensure reliability.
  If that's practical for your environment, fine, but I COMPLETELY fail to see the utility of their use of RAID6, if they are in fact maintaining data redundancy at a much higher level. That's just increasing the cost by about 1/3rd, reducing the speed, and honestly providing no more reliability.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
16. Re:they are missing hardware mgmt by atamido · 2009-09-03 10:31 · Score: 1
  
  Nah. You just use redundancy to ensure reliability. Lots and lots of redundancy. MD-RAID and LVM offer all that's needed to make this work -- though doing it on a large scale requires lots of elbow grease.
  I agree that "Just plug in another drive" is a far cry from what you have to do with these tools, but you can make it work.
  Well, good luck to them. In the long term I expect there to be a Linux distro that you can boot and add to a storage cluster (at least I really hope so). But for now organizing a syncing that kind of data set is going to be a real pita to get all of the kinks worked out.
You can get 2TB drives now by cibyr · 2009-09-02 02:34 · Score: 1

Since you can now get 2TB drives you should be able to fit 90TB in one of these boxes :)
And I thought I was doing well with a few terabytes in my home server (but hey, ZFS should save me from silent data corruption when the drives inevitably start to fail).

--
It's not exactly rocket surgery.
1. Re:You can get 2TB drives now by ciroknight · 2009-09-02 03:43 · Score: 1
  
  It'll be a few months before 1x2TB drive is more cost efficient than 2x1TB drives though. But when it does, it's a simple matter of buying the newer higher capacity drives when adding more storage. Ah, the wonders of redundant arrays of inexpensive data servers.
  
  --
  "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
cheap drives too by pikine · 2009-09-02 02:38 · Score: 2, Informative

Reliant Technology sells you NetApp FAS 6040 for $78,500 with a maximum capacity of 840 drives, without the hard drive (source: Google Shopping). If you buy FAS 6040 with the drives, most vendors will use more expensive and less capacity 15k rpm drives instead of the 7200rpm drives the BlackBlaze Pod uses, and this makes up a lot of the price difference. The point is, you could buy NetApp and install it yourself with cheap off-the-shelf consumer drives and end up spending about the same magnitude amount of money. I estimate that NetApp would cost just 1.5x the amount.
NetApp FAS 6040 at $78,500 + 840 x 1.5TB drives at $120 each = $179,300 which gives you 1.26PB. Cost per petabyte is $142,500, only slightly more expensive than BlackBlaze $117,000 from the article. The real story is that BlackBlaze is able to show a competitive edge of $30,000, or being 20% cheaper.

--
I once had a signature.
1. Re:cheap drives too by Anonymous Coward · 2009-09-02 02:58 · Score: 1, Interesting
  
  The point is, you could buy NetApp and install it yourself with cheap off-the-shelf consumer drives and end up spending about the same magnitude amount of money.
  You haven't bought a NetApp (or an EMC, Compellent, or XXX brand SAN) before - it's doesn't work that way.
  You get to buy NetApp Shelves of NetApp drives which sit behind your NetApp Controller. The drives, while mechanically identical to those you buy from NewEgg, run a special FW version. If you did manage to get it working, you sure as hell aren't going to get any support from your storage vendor.
  Some of the newer NetApp controllers can sit in front of another SAN, but a bunch of commodity drives does not a SAN make.
  Consumer drives don't work behind a pair of SAN controllers from ANY dominant storage vendor. Period. It sucks - maybe this should be what we're aiming to change.
2. Re:cheap drives too by machine321 · 2009-09-02 03:12 · Score: 1
  
  Does the FAS 6040 allow the use of cheap off-the-shelf consumer drives? I don't think any filers do. You'll also have to buy shelves (unless you have a source for cheap SATA shelves with FC uplinks). A shelf of 14 1T SATA NetApp drives (10T usable after RAID-DP and two hot spares) is what, $35-40k?
3. Re:cheap drives too by Drew+M. · 2009-09-02 04:58 · Score: 1
  
  And after you purchase the filer head for $78,500, where are you going to put the drives? Netapp disk shelves aren't cheap. Just because a filer head can support 840 drives doesn't mean you purchased 840 drives worth of disk shelves.
4. Re:cheap drives too by hakr89 · 2009-09-02 13:57 · Score: 1
  
  This assumes that the firmware in the SAN server will let you do this. I know for a fact that Dell MD3000 and EMC AX-4 hardware will refuse to use anything but vendor branded drives with special firmwares.
Or wait 5 years and buy it at newegg for $280 by dicobalt · 2009-09-02 02:39 · Score: 2, Funny

and save $2,799,720.
1. Re:Or wait 5 years and buy it at newegg for $280 by TheRaven64 · 2009-09-02 04:39 · Score: 1
  
  Probably a bit more than five years. It took around 15 years to go from 1GB being a big consumer drive to 1TB being a big consumer drive. That said, flash capacities are have been doubling ever roughly 10 months for the past 15 years (you can now buy 32GB for about what I paid for 128KB in 1994, which is just under 18 doublings in 15 years). For 1PB flash drives to reach that price will take about 10-12 years at that rate.
  
  --
  I am TheRaven on Soylent News
Liability insurance by scsirob · 2009-09-02 02:43 · Score: 1

If you build a petabyte stack using 1.5TB disks you need about 800 drives including RAID overhead. With an MTBF for consumer drives of 500,000 hours, a drive will fail roughly every 10-15 days, if your design is good and you create no hotspots/vibration issues.
Rebuild times on large RAID sets are such that it is only a matter of time before they run a double drive failure and lose their customers data. The money they saved by going cheap will be spent on lawyers when they get the liability claims in.

--
To Terminate, or not to Terminate, that's the question - SCSIROB
1. Re:Liability insurance by devjoe · 2009-09-02 03:10 · Score: 2, Insightful
  
  If you build a petabyte stack using 1.5TB disks you need about 800 drives including RAID overhead. With an MTBF for consumer drives of 500,000 hours, a drive will fail roughly every 10-15 days, if your design is good and you create no hotspots/vibration issues.
  Rebuild times on large RAID sets are such that it is only a matter of time before they run a double drive failure and lose their customers data. The money they saved by going cheap will be spent on lawyers when they get the liability claims in.
  If you RTFA, you will see that they are using RAID6 with 2 parity drives per raid, so a double drive failure can be handled, and it is only the less likely triple drive failure that will ruin them. It seems weak that they don't have hot-swappable drives in this configuration, but they have software that is managing the data across disk sets, and presumably they have redundant copies of data that keep the data accessible when one of their servers is taken down to replace a drive (if they don't, the downtimes due to replacing drives will make the service useless). This redundancy may also save them in the case that they actually lose a RAID set.
2. Re:Liability insurance by Zerth · 2009-09-02 03:11 · Score: 1
  
  Why are you assuming they keep the data on only one node? With non-redundant powersupplies and the fact that these units are a fraction of the cost of other solutions, you should assume that they make up for the lack of power redundancy with redundant nodes.
  Or you could've read the article, but this is /.
3. Re:Liability insurance by jonesy16 · 2009-09-02 03:32 · Score: 1
  
  Why is everyone so caught up on the hot swappable feature. I mean, I get it, it's convenient to hot swap, but that's gotta be a terrible price / performance penalty. First off, convenient hot swap requires real estate, i.e., you have to be able to access the drive from the front of the rack which means you won't fit 45 drives in a 4U space (not 3.5 inch drives anyway). Assuming they have massive redundancy (which someone using a system like this would), it's not that big of a pain to power off the entire node, slide it out, swap a drive, slide it in. That just took you what, 4 minutes? Hot swapping requires a lot of faith in your OS (raid and filesystem subsystems) and controller cards to handle that situation gracefully and reliably (which is why you pay a lot for a box from Sun/HP that "guarantees" you have that ability).
4. Re:Liability insurance by kybur · 2009-09-02 04:57 · Score: 1
  
  Well, switch to SAS drives, and replace each 1.5T sata with 3 450G SAS with 1,500,000, and you will also have drives fail at the same rate. (3x longer mtbf but 3x more disks). Seems like you are really criticizing lack of tiered storage, rather than consumer grade hardware here.
  Additionally, if you RTFA, you would see that they were using RAID6 + a hot spare and have split up their arrays so they are not too big and do not end up with ridiculous rebuild times. A RAID6 + HS solution will not lose data with a double drive failure, and can handle three drives out so long as the first parity segment has been rebuilt by the time the third drive fails.
  Clearly this solution would not work for most enterprise needs. It is just disk based backup, and with the hardware they are using, it is just barely "online", but certainly far more "online" than tape would be.
  On top of all that, the company explains what they are doing, so their customers know exactly what they are getting into. You do not pay $5/month for unlimited backup and expect to get a high-end tiered backup system with high availability.
5. Re:Liability insurance by codeguy007 · 2009-09-02 05:37 · Score: 1
  
  The only additional costs would be the upgrade from SATA cards to SATA RAID cards like 3ware and maybe they would need to switch to a server class board with multiple PCI-e slots. Not a significant additional cost.
Cool for home pr0n collection, but business? by filesiteguy · 2009-09-02 02:43 · Score: 1

Though I don't run a datacenter, I do rely heavily on one. My co-manager is in charge of keeping my 80 TB of data online 24/7 using redundant HP StorageWorks 8000 EVA units.
These cost a bit and have drives which fail at a fairly infrequent rate. It doesnt' hurt that the data center is kept at 64 degrees by two (redundant) chillers and has 450 KVa redundant power conditioners keeping the electricity on at all times. (We do shut off the power to the building once a month to check these and the diesel generator housed on the premises as well.)
Now - paying $x,xxx per year for maintenance on these units is cheap insurance in my mind. If something goes wrong, HP is available 24/7 to be onsite with replacement parts. This has - in fact happened - during the past few years. A controller on the array went bad, causing disk read failures. We instantly called HP, had a tech onsite, and had the controller replaced within a few hours of the problem being detected.
OTOH - for someone's 4 petabyte home pr0n collection, this might be a good idea! :P

--
The Kai's Semi-Updated Website Thingy
1. Re:Cool for home pr0n collection, but business? by Fross · 2009-09-02 03:36 · Score: 1
  
  Why are these unsuited for business?
  Get 2 pods, 100 TB of storage, $16K.
  Hosting it in a raq somewhere... I don't know, $10K per year, for somewhere really good?
  $10K per year for someone to perform maintenance on it, based on your figure above.
  You're still coming in way under $50K, all in, for more storage.
  I find it hard to believe that the additional $x00,000 really gives worthwhile "added value" on top of that.
2. Re:Cool for home pr0n collection, but business? by TheHawke · 2009-09-02 05:14 · Score: 1
  
  Do you have a proper transfer switch and a control panel that exercises the generator? Usually it has a biweekly cycle that runs the unit for 15 minutes to warm the oil, then shuts it off. The really good control panels have a test cycle option that cuts out curb power and cuts in the generator in a predetermined manner. This saves wear and tear on the main breakers, preserving them so that they will do their job properly when The Event happens.
  If not, I would strongly recommend getting in touch with your generator's vendor and upgrade your control center for it.
  I've heard of a 25KW generator blowing it's generator with a roar due to poorly bonded armature wiring.
  
  --
  First rule of holes; When in one, stop digging.
3. Re:Cool for home pr0n collection, but business? by filesiteguy · 2009-09-02 07:45 · Score: 1
  
  Fro the diesel generator? I have no clue. I'm a PHB over app development. The systems and the infrastructrure people are responsible for the generator. I know that - monthly - they shut off the power to the building and ensure the KVM keeps the server room and associated items live during the 15 seconds or so it takes the generator to kick in and power the rest of the building. We have a seven-story building with about 1500 staff members and roughly 900 workstations/peripherals.
  
  --
  The Kai's Semi-Updated Website Thingy
4. Re:Cool for home pr0n collection, but business? by filesiteguy · 2009-09-02 07:47 · Score: 1
  
  Believe it or not, that's our primary backup system solutions for our first-tier offsite - one mile away - building. We have a few 1TB drives containing primary sql servers and associated applications. Monthly we test these and make sure they work. We do a daily transfer of data to the drives to ensure they have at least that days' worth of information. Keep in mind, too, that you want to point fingers at the vendor if/when something goes horribly wrong.
  
  --
  The Kai's Semi-Updated Website Thingy
5. Re:Cool for home pr0n collection, but business? by Jaime2 · 2009-09-02 13:06 · Score: 1
  
  Here's something to think about... you lost a single controller and it was an event that required a vendor to be on site. BackBlaze's system is redundant enough that ten controllers failing isn't a big deal. So, you pay for the support that your system needs, they designed the support need out of the system.
Re:That's great but what about all the hidden cost by TooMuchToDo · 2009-09-02 02:48 · Score: 1

If you need the support, go pay the premium. Those of us with the appropriate technical background welcome the cheaper implementations.
Lets try to be a bit more supportive here! by fake_name · 2009-09-02 02:50 · Score: 4, Insightful

If an article went up describing how a major vendor released a petabyte array for $2M the comments would full of people saying "I could make an array with that much storage far cheaper!"
Now someone has gone and done exactly that (they even used linuxto do it) and suddenly everyone complains that it lacks support from a major vendor.
This may not be perfect for everyones needs, but it's nice to see this sort of innovation taking place instead of blindy following the same path everyone else takes for storage.
1. Re:Lets try to be a bit more supportive here! by theManInTheYellowHat · 2009-09-02 04:52 · Score: 1
  
  I agree!! I thought that the alternative to norm and pushing the bounds was what this crowd was all about.
  As far as doing the math, if they take the money they save from hardware and provide good jobs to people that can not be outsourced (this setup is going to need on-site hands) how about commending them for that. They should get some small US town (with a fat pipe) to help on the taxes for bringing tech jobs to them.
2. Re:Lets try to be a bit more supportive here! by Rich0 · 2009-09-02 07:15 · Score: 1
  
  Except, this isn't a PB array. This is a 0.067PB array. Is there any evidence that this solution can practically scale to a level where calculating cost in dollars per PB makes any sense?
  Hey, its a great achievement. However, the kinds of people who need storage by the PB aren't going to roll out hundreds of these smaller arrays and figure out who to organize the data on them.
3. Re:Lets try to be a bit more supportive here! by mistahkurtz · 2009-09-02 15:31 · Score: 1
  
  i don't disagree fully, but the difference here is that it's a business. if someone came out and said here's a $2m storage array for your house(!), we'd all scream and laugh and point fingers.
  
  since it's a business, and (especially in this case) the storage array in question is a basic and required tool for the business to function, it doesn't seem to make sense for them to skimp on it.
  
  what they came up with is cool. but that's not the issue. the issue is yet another (in this case small/startup) company saying "we're going to do X!!!" and then realizing "oh, shit it costs how much to do X right?" and then saying "fuck it here's some hard drives*".
  
  * can be substituted for pirated copies of whatever software, PDF creator when they need Acrobat Pro, or whatever skimpy solution many companies employ because they're not willing to shell out the $$ for the cost of doing business.
  
  --
  not only is time travel possible, it's irrelevant.
What's all the hate? by xrayspx · 2009-09-02 02:53 · Score: 5, Insightful

These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.

It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.

I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.

--
I like music
1. Re:What's all the hate? by langelgjm · 2009-09-02 03:01 · Score: 1
  
  Seriously, I thought it was a pretty cool rundown of how they did it. Nice to know you can purchase SATA port-multiplier backplanes, though I doubt I'll ever have a need to.
  I know they have 6 fans, but I still wonder about temperature issues with so many drives so close together.
  
  --
  "Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
2. Re:What's all the hate? by sockonafish · 2009-09-02 03:05 · Score: 4, Interesting
  
  Running on the cheapest hardware possible and engineering the software to gracefully deal with hardware failure is exactly how Google runs their datacenters, as well. As long as you've got the talent to pull it off, it's much more cost effective than buying a prefab solution.
3. Re:What's all the hate? by xrayspx · 2009-09-02 04:31 · Score: 1
  
  Thanks for making that point too, I was going to mention that Google plans for failures and found a long time ago that it's cheaper to pay someone to go change drives all day than it is to buy super expensive enclosures.
  
  --
  I like music
4. Re:What's all the hate? by BobMcD · 2009-09-02 09:04 · Score: 1
  
  You're probably speaking rhetorically, but just in case someone is still wondering:
  Simple Dissonance, probably of varying levels.
  People look at coolness like this and are hit by a number of emotions. They wish they had thought of it. They realize what this could have saved them, cost-wise. They imagine rolling one of these out in the short term, and they doubt the decision to 'back' the big guys in that last quote they passed up to the boss.
  The tech is available and they could have one tomorrow if they so wanted. You'd have to be stupid to not see the potential of something like this, and at least have considered it yourself.
  But they're not stupid.
  Ergo, it must be bad idea, there's obviously something wrong with it, everything is fine, they made the right call and would do so again tomorrow.
  Human beings do this ALL the time. Usually we can parse and recognize it before it becomes a post on slashdot. Usually, but not always.
5. Re:What's all the hate? by xrayspx · 2009-09-02 11:24 · Score: 1
  
  The thing is, people are trained to be open to open source /software/. No one thinks twice until it's a hardware solution.
  
  I'm starting to think that it's all about being able to blame someone else. If your storage dies, blame EMC or Netapp or whoever. If you drop off the Internet, well must be damn Cisco, I'll get 'em on the phone.
  
  But with the OSS routers and homebrew storage and clustering solutions, since there's no one to call, you actually have to be prepared for hardware failure, and able to fix it. Maybe?
  
  Or, people think "well, I could have done that" and discount it as trivial or fragile or "Not Enterprise"? Kind of a weird NIH syndrome.
  
  --
  I like music
6. Re:What's all the hate? by Alioth · 2009-09-02 20:39 · Score: 1
  
  OT: Why do people write MM to mean million instead of the more usual M? I've never had a satisfactory answer to that question.
  
  --
  Oolite: Elite-like game. For Mac, Linux and Windows
7. Re:What's all the hate? by xrayspx · 2009-09-03 03:44 · Score: 1
  
  It's Roman for "Thousand Thousand". I used to work with lots of Europeans and finance people, and I must have gotten into the habit. Evidently when you start talking to French people (or Europeans in general?) there starts being confusion between million and billion and other terms for larger numbers.
  
  --
  I like music
8. Re:What's all the hate? by xrayspx · 2009-09-03 08:57 · Score: 1
  
  I bet they've done the math. Here's the thing, they could easily be using array level redundancy, and writing everything to at least two of these. If they lose one, they have data somewhere else which seamlessly integrates. The concept isn't that different from GoogleFS, have a distributed filesystem that manages the data over several arrays, if you lose an array, take it offline and fix it, then put it back in.
  
  I also don't see anything saying "this whole thing should be one big RAID0 volume" either. This is a hardware spec, whatever you want to use for distributed filesystems or RAID configuration is left up to you.
  
  I still don't get why people don't think this company might just be smart enough to know what they're doing. What leads you to believe that if you use them for backup, your data is backed up to one single RAID0 volume in one of these boxes, and if that array goes away, so does your backup? They seem like a pretty smart bunch of guys, why can't they deploy a distributed FS?
  
  --
  I like music
9. Re:What's all the hate? by drsmithy · 2009-09-03 19:52 · Score: 1
  
  Ergo, it must be bad idea, there's obviously something wrong with it, everything is fine, they made the right call and would do so again tomorrow.
  When making the comparison they are (their system to Dell, EMC, NetApp, et al) pretty much _everything_ is wrong with it - the design is riddled with catastrophe-multiplying SPOFs and performance would be dismal by every relative measure.
  For their unique purposes, it might work well (although I have my doubts for the long term), but the systems they are comparing themselves to are just so much better at what they do it's not even funny.
Components by HogGeek · 2009-09-02 02:55 · Score: 1

Not too shabby.
I had recently built a "storage pod" for my media @ home (6T using 4 1.5T drives), and had a hell of a time finding "good" components. So, I looked this over, and while it's made up of "consumer components" a couple of the components seem impossible to find for this as well.
Case: Custom Built
HD Backplane: Custom made by chinese manufacturer.
So good luck building a "one off" for your small business/home, as I'll also bet these prices are for "quantity" (quality not withstanding)
Re:Sooooo, by Christophotron · 2009-09-02 02:57 · Score: 1

hell, *I* would like to buy one, for my own personal use! $8000 seems very cheap for 67 terabytes of storage in a neat little package. My 4TB raid was quite expensive compared to this (on a $ per TB basis) and it's almost full now. I can definitely see something like this in my future. running ZFS for error detection, of course. And probably 2 redundant PSUs instead of standard consumer-grade ones. Wouldn't want one of those to go out and take half of my drives with it!
Online storage is way too expensive and internet connection speeds here in the USA will suck too badly for too long to even consider it..
Re:Battery Backup? by ajlitt · 2009-09-02 03:01 · Score: 1

These guys have a little more to worry about than redundancy... The two cheap ATX supplies in each box are split between the drives. So if one of the two supplies dies, the whole thing goes down. How's that for MTBF?
missing the point.. by zcold · 2009-09-02 03:06 · Score: 1

I think people are missing the point of this whole thing... instead of trashing and tearing the idea down. think what would make it better and improve the design... Ive been researching for a while now for something to store a life's worth of data, and this looks like something that will meet my needs. scalable, and enough space for a lifetime (I hope)

--
you know you can fry stuff putting things into things that dont like the things you put into it...
1. Re:missing the point.. by EmagGeek · 2009-09-02 23:39 · Score: 1
  
  You really need to look beyond the trees around you and see the forest. This solution is terrible for a critical storage application. You're talking about saving your data for a lifetime. This device will not come anywhere near doing that for you.
  I work for a company that designs high-availability, redundant systems. This thing lacks just about everything we consider bare-minimum for a five nines system, including but not limited to redundant power and communications, fault coverage, and low MTTR, low FIT rate, and so on.
  Others have quite thoroughly explained why this should not be considered a high reliability or high availability solution. The use of commodity hardware makes it cheap, but none of the hardware used is really suitable for a HA/HIREL application, and the architecture is anything but capable of providing 99.999% availability.
  This is an amateur solution to a complex problem. They can pull the whole "domyjobforme" trick on Slashdot, but ultimately they're going to end up in the same place as EMC and the rest of them.
Where's the de-dup? by vrmlguy · 2009-09-02 03:24 · Score: 1

Weâ(TM)re a backup service, so our datacenter contains a complete copy of all of our customersâ(TM) data, plus multiple versions of files that change. In rough terms, every time one of our customers buys a hard drive, Backblaze needs another hard drive.
Data deduplication (see http://en.wikipedia.org/wiki/Data_deduplication) drastically reduce the storage requirements for backups. While email attachments are the classic example, it's doubtful that every one of their customer's is using a unique build of their OS. Ditto for third-party software. A lot of media also gets duplicated between people: vendor's whitepapers, video, even porn gets downloaded by lost of people. Rsync uses de-dup techniques to reduce bandwidth requirements; there's no reason why a clever storage node couldn't use that de-dup meta data to keep its own storage costs down.

--
Nothing for 6-digit uids?
These guys *are* the off-site mirror by vrmlguy · 2009-09-02 03:34 · Score: 1

For your average datacenter, primary storage needs to be on a major vendor's hardware, because you need the extras that the major vendor's supply. However, Backblaze is in the business of providing off-site storage for their customers. Their data is the secondary copy, so it can be as cheap as they can make it. No one is going to be running their data center off of this copy, so it can be low performance. And while I'm not saying that they should, they could probably get away with running non-protected storage for everything. Even if they lose a drive every day, it's unlikely to hold the data needed for that day's requested restores. That means they can almost always rebuild a failed drive's contents the next time the affected customers sync up.

--
Nothing for 6-digit uids?
Re:They could have quite better by cowbutt · 2009-09-02 04:00 · Score: 2, Informative

they used incredibly cheep-ass HBA's for no good reason.
In their defence:

A note about SATA chipsets: Each of the port multiplier backplanes has a Silicon Image SiI3726 chip so that five drives can be attached to one SATA port. Each of the SYBA two-port PCIe SATA cards has a Silicon Image SiI3132, and the four-port PCI Addonics card has a Silicon Image SiI3124 chip. We use only three of the four available ports on the Addonics card because we have only nine backplanes. We don't use the SATA ports on the motherboard because, despite Intel's claims of port multiplier support in their ICH10 south bridge, we noticed strange results in our performance tests. Silicon Image pioneered port multiplier technology, and their chips work best together.
Please.... by mpapet · 2009-09-02 04:04 · Score: 2, Interesting

where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad,
Whomever spec'd your systems should have accommodated obvious failures like this. As in, paying for colo, using servers with dual power supplies that fail over, sensible RAID strategy. Giving money to EMC in this situation is not sensible.
but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?
I have a couple of hundred drives going at any one time and I get an SNMP alert when a drive goes bad. I take one out of the closet and destroy the broken one. The RAID does the rest.
someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
Our storage strategy is N+1 all the way and required to be online 24/7 so failures are part of the plan. They are probably part of the plan at this startup.
We pay premiums so we can relax and concentrate on what we need to concentrate on.
I don't understand this. If your job is 89% software dev, then EMC may be the way to go. Expensive! But, it makes a little business sense. If you aren't spending most of your time writing software that adds value to your service/product, then EMC is doing your job and you are some kind of TPS generator. Do you pay a premium to blame someone else? I've had the opportunity to work in places like this and I've always passed because of the veiled contempt for IT.
Please, explain this to me.

--
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
are you a project manager by any chance? by leoc · 2009-09-02 04:05 · Score: 4, Insightful

I like how you dismiss a detailed real world design example based simply on a claimed feature without any further substantiation. Very classy. I'm not saying you are wrong, but would it kill you to go into a little more detail about why these folks need "luck" when they are clearly very successful with their existing design?

--
STFU about slashdot bias.
1. Re:are you a project manager by any chance? by pyite · 2009-09-02 04:13 · Score: 5, Informative
  
  are you a project manager by any chance?
  Of course not. A project manager would look at this and go, "wow, we saved a lot of money!" It's pretty simple. ZFS does what most other filesystems do not; it guarantees data integrity at the block level by the use of checksums. When you're dealing with this many spindles and dense, non-enterprise drives, you are virtually guaranteed to get silent corruption. The article does not once have any of the words corrupt.*, checksum, or integrity mentioned in it once. The server doesn't use ECC RAM. The project, while well intentioned, should scare the crap out of anyone thinking about storing data with this company.
  
  --
  "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
2. Re:are you a project manager by any chance? by teknopurge · 2009-09-02 05:29 · Score: 1
  
  MOD PARENT UP.
  
  --
  Website Hosting
3. Re:are you a project manager by any chance? by profplump · 2009-09-02 06:28 · Score: 3, Insightful
  
  What failure rate are you using to "virtually guarantee" that you'll get data corruption with 45 drives?
  What failure rate in your RAM, CPU, and motherboard are you using to guarantee that the ZFS checksum are not themselves corrupted? Not to mention the high possibility of bugs in a younger file system, and the different performance characteristics among FSes.
  I'm not say ZFS is a bad plan, at least if you're running enough spindles, but if you're going to "virtually guarantee" silent corruption with less than 100 drives I'd like to see some documentation for the the non-detectable failure rates you're expecting.
  It's also worth noting that in a lot of data, a small amount of bit-flips might not be worth protecting against at all. Or they might be better protected at the application level instead of the block level -- for example, if the data will be transmitted to another system before it is consumed, as would be typical for a disk-host like this, a single checksum of the entire file (think md5sum) could be computed at the end-use system, rather than computing a per-block checksum at the disk host and then just assuming the file makes it across the network and through the other system's I/O stack without error.
4. Re:are you a project manager by any chance? by Kayden · 2009-09-02 08:26 · Score: 1
  
  MD5 will only tell you the file is corrupt, it won't fix the file. Ideally, you'll want checksums both storing and receiving, however, storing is much more important. If you only check the checksum when you go to use the file, it's already broken.
5. Re:are you a project manager by any chance? by Cramer · 2009-09-02 08:48 · Score: 1
  
  Actually, they did. You fail as a geek :-) They run RAID6 across blocks of 15 drives. As long as they scrub each array regularly, they'll detect and correct any corruption or bad disks before it becomes an issue.
  ECC RAM is a bit of an unnecessary expense. Bit errors in RAM are exceedingly rare. I have many (MANY) servers with ECC memory. Over nearly a century of total CPU time, none of them have ever reported ECC errors. (even after over heating the hell out of one of them.)
6. Re:are you a project manager by any chance? by BikeHelmet · 2009-09-02 08:50 · Score: 1
  
  You're right. In a system with hundreds of HDDs and sticks of RAM, even one going silently bad would be a nightmare.
  Sun sure did put out some amazing stuff.
7. Re:are you a project manager by any chance? by kiwimate · 2009-09-02 11:28 · Score: 1
  
  are you a project manager by any chance?
  Of course not. A project manager would look at this and go, "wow, we saved a lot of money!"
  No, actually a project manager would look at this and ask, "hmm, which of the triple constraints of cost, time, or scope, are we endangering", and know enough to zero in on the so-called fourth constraint of Quality.
  If not, then I suggest said alleged "project manager" is merely a wanna-be hack with no training.
  Which brings us back to on topic. As others have pointed out, there's generally a reason you pay that much money for an EMC or a NetApp solution. If you go with the less costly solution, you are losing SOMETHING. A storage architect should know enough to assess what it is that is lost (performance, reliability, pretty flashing lights) and determine if that reduction in whatever-it-is is appropriate given the situation you are looking to address.
8. Re:are you a project manager by any chance? by bored · 2009-09-02 14:39 · Score: 1
  
  Every block on the disk has error correction as well. The undetected bit error rates for hard drives are exceedingly low. When put into a Reed Solomon raid6 scrubber the chance of undetected bit error rates goes even lower. That doesn't mean you will not have uncorrectable errors, what it means is that it becomes extremely unlikely that you won't detect them.
  Compared with multiple layers of ecc the Fletcher checksums in ZFS are a joke.
9. Re:are you a project manager by any chance? by raddan · 2009-09-02 16:08 · Score: 1
  
  It's too much work for applications to have to worry about errors. That's within the scope of the filesystem, because the filesystem's job is to provide reliable access to data for applications. With error-detection and error-correction, you tend to want to catch the problem as soon as possible; that usually means lower in the stack.
  
  ACM has an article on latent disk error rates here. IIRC, Seagate's figures were about one latent error for every 10^15 bits; this figure was confirmed by Sun when they were doing the heavy lifting on ZFS. The important thing is that the latent error rate is not decreasing as disk capacities increase. ZFS specifically addresses this problem using checksum trees.
  
  I think that bit-flips at all are unacceptable. I know that guaranteeing that they can't happen is not possible, but we're willing to spend extra money to ensure that they happen infrequently.
10. Re:are you a project manager by any chance? by isorox · 2009-09-02 23:51 · Score: 1
  
  Well, speaking from unfortunate real life experience with ZFS ... block checksums are great, but when (not if) the filesystem gets corrupted and you have no (zero, none, SOL, so sorry) tools to repair it, your data is just as gone.
  My team's project hasn't received any errors on our zfs file system, we run a scrub every week, but in testing bit flips on disks, we found that zfs would report the error and fix it by rebuilding from the raid, however in the case of a file system crash, that's why we have a backup server.
  Sun have had to fix one of our projects twice, the second time they had some better tools, still not to the point of fsck though, however zfs seems fine for redux.
  http://www.uknof.org.uk/uknof13/Bird-Redux.pdf
11. Re:are you a project manager by any chance? by blofeld42 · 2009-09-03 08:30 · Score: 1
  
  Look at the CERN data on disk error rates. They found the system error rate to be about 3X10^7.
12. Re:are you a project manager by any chance? by drsmithy · 2009-09-03 10:05 · Score: 1
  
  They run RAID6 across blocks of 15 drives.
  That, alone, should tell you to stay far, far away from them. The rebuild times on those arrays are going to be measured in days, and the performance while they are rebuilding is going to be even more dismal than it would be normally.
13. Re:are you a project manager by any chance? by Cramer · 2009-09-03 11:20 · Score: 1
  
  Actually, it's not that bad. Half a day, maybe. They didn't necessarily build them to be fast -- they use port multipliers, so there's 80% of the performance gone.
  However, the point still stands... they *do* have multiple layers of data integrity, both within each unit and across the data center.
14. Re:are you a project manager by any chance? by drsmithy · 2009-09-03 11:30 · Score: 1
  
  Actually, it's not that bad. Half a day, maybe.
  One of their 15-spindle arrays is hanging off a single PCI SATA controller. The rebuild speed on that is going to be less than 10MB/sec. That's less than a terabyte a day. For ~22TB, the array would take about 25-30 days to rebuild. Even the ones on PCIe controllers would only be a bit more than twice as fast, in ideal conditions.
15. Re:are you a project manager by any chance? by Cramer · 2009-09-03 12:05 · Score: 1
  
  It would be interesting to see actual performance numbers. On paper, based on the performance of my own sil3124 and 1TB drives, if it's doing nothing else, it should be able to rebuild in just under 6 days. However, given that it's under load, there are port multipliers, and the 3124 shares bandwidth across all of it's ports, it might take twice that long.
  (PCI-X vs. PCIe makes little difference. The drives are nowhere near as fast as either bus.)
16. Re:are you a project manager by any chance? by drsmithy · 2009-09-03 18:56 · Score: 1
  
  (PCI-X vs. PCIe makes little difference. The drives are nowhere near as fast as either bus.)
  They are in their case. Each of their PCIe x1 SATA cards has 10 drives hanging off it, which will _easily_ saturate the 250MB/sec they're capable of. Also, as I mentioned above, a full 1/3 of their drives are on a single SATA card connected to a 32-bit, 33Mhz, ~120MB/sec PCI bus.
  It must take them a solid month just to commission a node and complete the initial array scrubs (unless they're not even doing that (!) ).
This is Amusing... by Nom+du+Keyboard · 2009-09-02 04:09 · Score: 1

disgusted with the outrageously overpriced offerings from EMC,
That's amusing, since EMC was born out of the outrageously overpriced offerings from IBM and other mainframe companies of the day.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
A Comparison with enterprise class storage by rayzat · 2009-09-02 04:19 · Score: 1

I think this solution is quite interesting and probably fits their needs but comparing it to the storage solutions of the vendors listed is quite ridiculous. Another thing to note, there are vendors, NEXSAN, that sell cheaper storage systems, that while still more expensive then this, would have probably meet their needs. The first issue is high availability. There are many single points of failure on this box. There is only a single controller. The power supplies are not redundant. With the number of drives a single fan failure might lead to and high enough heat to damage components. Single port back plane. No NVRAM. The only thing that isn't a single point of failure are the drives themselves because they are in a RAID6 config but I still see a problem with that, their configuration uses no hot spares. A high end storage system is going to have multiple controllers, redundant power supplies, be able to sustain multiple fan failures, multiple back planes with interposer cards. It's also going to have NVRAM that should a power failure occur acknowledge cached data would not be lost. The second issue is maintenance. A high end storage system systems parts are high accessible and often hot swappable. A controller goes out, it's like changing a Nintendo cartridge. With this box if anything goes except a drive, the box is coming down. If you are a replacing a drive you'll have to slide the box out, hopefully you left enough clearance for the power cords when you slide it out, then you have to pop in a new drive, and hopefully not break the SATA connector on the back plane. Oh man, I forgot to put on a new rubber band, I mean vibration dampner. What's the performance of this box like? With software RAID and only a single processor with no ASIC acceleration for anything I would have to imagine the processor is going to get pretty bogged down. With a high end box everything is pretty much designed, within reason, to make the drives the ultimate performance bottleneck. Can this systems fully utilize all the drives or can the drives deliver more IOPS and throughput then the controller can handle? Extra features. What does this box offer in terms of volume copying, flash copying, and remote mirroring? The value of an enterprise solution is that it provides the features that keep it working 99.999% of the time, not just 99%. I see so many possible areas where data could possibly be lost or corrupted. A couple of comments have suggested this just being a block in a bigger solution, treating it just like a drive. In that case you are going to have to a additional layer of redundancy, probably a mirror. With a straight mirror you are going to see a doubling in cost of hardware, infrastructure, power and cooling, which is going to start disrupting cost/benefit of this solution. If you just want a bunch of file space accessible through HTTP with the ability to tolerate the occasional loss of data and downtime, this solution will work fine. If data loss or downtime means the loss of data or jobs, you'll go with one of the major storage vendors.
You need a SAN by mpapet · 2009-09-02 04:27 · Score: 1

A plain-vanilla SAN is worth every penny.
Especially now that you can get them from distressed companies who paid too much for them a couple of years ago, $15,000 will get you a refrigerator-sized solution. Straight retail on a 2U san is still getting cheaper every year. http://h71016.www7.hp.com/dstore/ctoBases.asp?oi=E9CED&BEID=19701&SBLID=&ProductLineId=450&FamilyId=2569&LowBaseId=15222&LowPrice=$1,899.00

--
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
1. Re:You need a SAN by sloth+jr · 2009-09-02 10:31 · Score: 1
  
  SAN has its own set of baggage, mostly in complexity and increased failure scenarios (HBA failures, transceiver failures, switch failures, cable failures, management/monitoring/presentation software issues, split brain, increased human error - better be sure your technicians know exactly how the SAN behaves before monkeying around with cables - firmware incompatibilities, etc.). One of the biggest problem with SANs is that, though the underlying protocols and link layers have gotten fairly good at blocking and recovering when the underlying problem is corrected, most applications that require a SAN do NOT survive failure in the block layer well, AT REASONABLE COST. Yes yes, multipath HBAs, blah blah blah, etc. etc.
  
  For most needs, a SAN is totally overkill: expensive new acquisition, expensive in expertise required. I would recommend everyone who can afford to do so build one and understand one, and start simulating failure, before issuing blanket decrees of SAN advisability.
a sun thumper rip off? by pjr.cc · 2009-09-02 04:40 · Score: 1

I remember when i first got my hands on a sun thumper... impressive piece of kit but with a sun price tag... this one is way kewler on the price.
A project i worked on tried to deploy quite a number of thumper's and we ran into "issues"... First the racks - the thumper weighs in the vicinity of 150kgs (330lbs i think?), try put 10 of them in a rack and your in for a shock, assuming the rack can handle it, most data center floors have "issues" supporting the weight.
The second problem we had was cooling, the temp coming out the back of the rack was quite astronomical, and lastly power. In AU, this can often be a pain in the rear, specially with each thumper taking in about 2kw - 20kw per rack = PAIN.
Still, its kewl to see them do it all open.
I'd love to see someone do something like that though with computing power. Take the proliferation of mini-itx boards with "real" cpu's on them (ok, desktop cpus, but still, not shabby really) you could do a similar setup with a custom case supporting quite a number of those little buggers quite easily. Theres a beautiful little zotac AMD board that is almost ideal - supports the quad core, has a gig interface (sadly it has been aimed at htpc's cause you could replace the wireless, video, 6usb ports, etc with server-useful componentry - i.e. 2 or more gig ports and ipmi). But mostly it would be cheap and could run something like ovirt or abicloud quite happily. Shame that. There are other options in the space, like intel have a half-width xeon board (and a 1ru case that can support 2 of them side-by-side), but they're hard to get a hold of and quite long. get rid of local storage on the servers, use serial instead of video and add gpxe for remote boot - brilliant and dont exist!.
On a completely side note, one thing that i'd love to see in linux that has yet to exist in a useful format is replication (async) - the only real option is drbd, but its such a pain to setup and very inflexible. I was always so disappointed that neigther zfs, lvm or btrfs include it (even basic local replication would have surficed given the existence of so many network level block storage transports (iscsi, etc).
Rebuild time? by AliasMarlowe · 2009-09-02 04:40 · Score: 1

They went RAID 6, even though it is slow as shit, for the added failsafe mechanisms.
How long does it take to rebuild the array if a disk has to be replaced? Each RAID 6 volume is 15 disks of 1.5TB each (19.5TB data + 3TB parity). So either they'd have to take a real performance hit during a relatively short rebuild period, or a smaller hit over a longer rebuild period. Longer rebuild periods increase the odds of further failures before the rebuild is complete.

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
1. Re:Rebuild time? by sloth+jr · 2009-09-02 10:40 · Score: 1
  
  Rebuild time doesn't sound like it matters in a system like online backup. You need to write the data once for a few nodes, and read it rarely.
2. Re:Rebuild time? by drsmithy · 2009-09-03 19:31 · Score: 1
  
  How long does it take to rebuild the array if a disk has to be replaced?
  By my calculations, best case scenario is about ten days for the arrays connected to PCIe controllers and 25 days for the array on the PCI controller.
Cutting too many corners? by codeguy007 · 2009-09-02 04:43 · Score: 1

These are pretty impressive and a good start but there's a few of things I would do differently. I use to build NAS boxes for a living and there are some issues with this design. I think they have cut too many corners
1) Their choice of RAID cards is somewhat questionable. What 4 cards can you get for $175 total which will support proper hotswap? Even running software raid, I would still want cards that provide proper monitoring and drive management like 3ware. Yeah maybe it would have cost you a fair bit more $175 per box but it would be worth the difference. You would still be saving a ton. Also I am not sure I would put more than one drive on a cable with a multiplexer. You can get 16 port 3ware cards that use multiport cables that break out at the back plane. Now you would also have to upgrade to a server class motherboard with at least 3 PCIe slots.
2) I haven't checked recently but is software raid 6 even recommended yet. I know the 2.6 kernel has been supporting for a while but it was still listed as experimental last I checked. I might stick with raid 5 here.
3) While using Zippy power supplies is an excellent choice, I would definitely want redundant power in these boxes.
Re:That's great but what about all the hidden cost by Anonymous Coward · 2009-09-02 04:44 · Score: 1, Informative

You realize they are USING this NOT SELLING it, right? They tell YOU how YOU can build one, nowhere are they offering to sell some schmuck a storage array.
If you don't know how to maintain it, do not try to do it yourself! however if you do, and you can save the kind of money they are saving, then go for it.
Re:Battery Backup? by codeguy007 · 2009-09-02 05:05 · Score: 1

Umm, he was talking about the power redundancy. Also those are not 2 cheap ATX supplies, they are top quality server grade power supplies just no redundant. Though if you provide redundant power, you really shouldn't need battery backup on the sata cards as the datacenter would certainly have a UPS. I guess the motherboard could blow and battery backup could protect against that.
Re:FC / iSCSI / 10GBe / Cache / Snapshot etc by codeguy007 · 2009-09-02 05:10 · Score: 1

Umm, it's kind of obvious but whatever.
This is a NAS box. They aren't adding 10Gbe nics so the network will GigE and would be the bottle neck if they weren't using PCI sata cards.
You would have support for typical NAS stuff like NFS, Samba, distributed filesystems like AFS. YOu could also setup iSCSI nodes as well. But definitely no FiberChannel support. I don't know maybe you could add a card to the box to add this but I am pretty sure for the same money you could setup a 10Gbe storage network. Of course the 10Gbe storage network would be faster.
Not a complete solution BY DESIGN by DLG · 2009-09-02 05:34 · Score: 1

The article talks about how it is not intended as a complete solution. They do not go into, or intend to, describe their redundancy features, their performance issues, or anything else.
From the Article:
A Backblaze Storage Pod is a Building Block
We have been extremely happy with the reliability and excellent performance of the pods, and a Backblaze Storage Pod is a fully contained storage server. But the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failuresâ"itâ(TM)s irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesnâ(TM)t allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a âoesolutionâ in itself.
If you did want to attack this concept, it would be based on the fact that I cannot think of a good general storage use for this besides serving static webpages.
The only access method is through https.
There is only 1gigabyte bandwidth per 67 terabytes. 67 Terabytes is duh, 67000Gigabytes... Thats 536000 gigabits. a 1gigabit/s interface needs 6 days to move all that data. Oh and it can only be accessed through https. So its somewhat questionable that you can actually move nearly that much data. I don't really know what the limitations of the harddrives or SATA are, but no matter how much speed any of that has, the network link and latency are going to be significant if you are really moving large scale data. I can only assume their applications don't require speed, or that by duplicating it over a large number of systems they are going to get some load balancing. So then one asks... HOw many of these pods equal a redundant system with reasonable performance? And what is the power usage involved?
There is Raid6 based on 15 drive sets with 2 parity drives spread across between 1 and 3 controllers but there is no hot swappable drive, fan, or controller.
Essentially a single drive failure requires you to take down the entire system. Now I assume there is a replicated system, so you can just take down any of these boxes with no planning.
--------------------
Honestly I am sure this suits their purpose. I can't imagine what purpose it would suit for me.
A Classic Case by MerlynEmrys67 · 2009-09-02 05:42 · Score: 1

Someone with expertise in one domain - trying to solve problems in another domain using a very simple solution. There are reasons dell is charging 8 times as much (well, that and they need to make a buck in there too). You will have to pay for the parts, pay for manufacturing (so keep someone on staff for 60K a year), pay for failures (how many hard drives will fail in 3 years - especially using commodity components rather than server components).
I give it a 50/50 chance of actually breaking even vs. buying the cheaper Dell solution in a 5 year time frame.
I give it a 10% chance of causing an EPIC FAIL that causes the company to go out of business from a massive loss of customer data.

--
I have mod points and I am not afraid to use them
No, the cat does not actually "got my tongue." by Impy+the+Impiuos+Imp · 2009-09-02 05:59 · Score: 1

After adjustment, storage capacity has increased about 100,000x per dollar in the last 25 years. To get to a petabyte in the desktop price range requires just another 10 years or so.

--
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Not the whole cost by CustomDesigned · 2009-09-02 06:02 · Score: 1

The $117K is just the computer hardware. You still need UPS, A/C, Power, and floor space. Add up those, and a reasonable profit, and I'll bet Amazon and EMC don't look so bad. But if you already have the infrastructure, and the marginal cost of adding the storage arrays is low, then the design could save money.
most definitely a JBOD... by rivaldufus · 2009-09-02 06:08 · Score: 1

and not something you'd want to store valuable data on. First off, it does not have redundant power. You could probably add redundant power for another $1,000 or so.

Second of all, if you did set up something like RAID 5 or RAID 6 (or RAIDZ/RAIDZ2), the rebuild time on a drive would probably be well over 12 hours with 1.5TB SATA drives.

I'm sure many people would be tempted to put all 45 drives in a large RAID 5 volume, which would be even scarier.
A more practical version would be to go with 41x 500GB SATA, 3x 60GB SSD, dual redundant power supplies, 32GB RAM, and Solaris or OpenSolaris.

You would probably break it down something like this: 2 disks - RAID 1 mirror for the system 2 30GB SSD drives for the slog (definitely helps improve performance) 3 hot spare and then 6 sets of 6 drives in RAIDZ-2 in a single pool This leaves out a couple of drives. You could put in a couple of 1.5TB (or even 2TB) in a Raid 1 mirror for some supplementary storage or just leave them out. You're not going to have as much storage, but, your data will be safer. Plus, dropping down to 500GB from 1.5TB drives is a large difference in price (as much as $50-$60 per drive,) and the price differentials mean that the added expenses (such as power and the SSD drvies.)
Don't forget where the real value is by pedantic+bore · 2009-09-02 06:23 · Score: 2, Insightful

Forgive me; I've committed the sin of working for one of those name-brand storage companies.
The real value in a data storage system isn't in the hardware, it's in the data. And the real cost incurred in a data storage system is measured in the inability of the customer to access that data quickly, efficiently and (in the case of a disaster) at all.
If you need to crunch the data quickly, a higher-performing system is going to save you money in the end. Look at all the benchmarks: no home-grown systems are anywhere on the lists. If you want to stream through your data at several gigabytes per second, you need to pay for a fast interconnect. Putting 45 drives behind a single 1GbE just doesn't cut it.
Similarly, if you want to ensure that the data is protected (integrity, immutable storage for folks who need to preserve data and be certain it hasn't been tampered with, etc) and stored efficiently (single instance store, or dedupe, so you don't fill your petabytes of disks with a bajillion copies of the same photos of Anna Kournakova) then you need to pay for the extra goodness in that software and hardware as well.
Finally, if you want extremely high availability, then the cost of the hardware is miniscule compared to the cost of downtime. We had customers that would lose millions of dollars per service interruption. They're willing to pay a million dollars to eliminate or even reduce downtime.
These folks are essentially just building a box that makes a bunch of disks behave like a honking big tape drive. It's a viable business--that's all some folks need. But EMC et al are not going to lose any sleep over this.

--
Am I part of the core demographic for Swedish Fish?
Oh, it's *only* 976TB not a 1PB! by galanom · 2009-09-02 06:27 · Score: 1

:D
*sigh* by upside · 2009-09-02 06:29 · Score: 4, Insightful

How about reading the section "A Backblaze Storage Pod is a Building Block".

<snip> the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failures — it's irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesn't allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a "solution" in itself.
Emphasis mine. I believe there are quite a few successful and reliable storage vendors not using ZFS. We get the point, you like it. Doesn't mean you can't succeed without it. Be more open minded.

--
I'm sorry if I haven't offended anyone
1. Re:*sigh* by pyite · 2009-09-02 06:38 · Score: 1
  
  How about reading the section "A Backblaze Storage Pod is a Building Block".
  I did read it. Black and white hardware failures are easy to deal with. Corruption is not.
  
  --
  "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
2. Re:*sigh* by Dahamma · 2009-09-02 07:22 · Score: 1
  
  Wow, I don't even know how the computing world existed before ZFS!
  I'm sure they are doing all kinds of things at the application layer (who knows - checksums, error correction, duplication, etc) to ensure the integrity of their data; likely they have more or less implemented a whole distributed filesystem on top of these arrays. Why should they bother with the overhead of a filesystem in their "black box" (ok, red box) that duplicates what they already have at a higher level?
Practical question by yawhcihw · 2009-09-02 07:14 · Score: 1

I'm somewhat serious about building one of these boxes myself.
I have to buy a lot of little parts from a multitude of vendors, fine. A small premium to pay over their quoted price.
My question falls to: where the heck do I buy a "Chyang Fun Industry (CFI Group) CFI-B53PM 5 Port Backplane (SiI3726)"?
Spend a few minutes and try and find that part for sale.
--frustrated--
But ZFS isn't the solution anyways. by emj · 2009-09-02 07:20 · Score: 1

Checksums, error corrections, self healing shit or what ever is the solution. Such things are easy to put on top of these pods.
Wow... by EmagGeek · 2009-09-02 08:26 · Score: 1

I typed up a lengthy critique.. but decided not to post it...
I'll replace it with... "wow..."
This thing needs a lot more thought, especially with respect to redundancy, fault coverage, and maintenance.
Very poor hardware choices. by Super+Happy+Fun+Chem · 2009-09-02 11:12 · Score: 1

Anyone else notice that they seriously restricted the throughput to/from the drives based on their choice of SATA cards (good old 32-bit PCI 2.3 only has max theoretical of 266MB/s, and the same goes for PCIe 1x [250MB/sec])? In the worst case scenario, each drive is at max getting 17MB/sec of transactional bandwidth, which is just pathetic (based on some very back of the envelope calculations). For the amount of money they spent on making a custom solution, an extra 100-200 bucks to get a few 4-8 lane PCIe sata cards is a pittance. Overall, it just demonstrates to me, at least, a poor understanding of what goes into making a good storage solution. And dont get me started on the the lack of backup power supplies, error checking ram, etc.
RAID6? Bzzzzt! Wrong answer! by Anonymous Coward · 2009-09-02 13:59 · Score: 1, Insightful

The real solution here is to design custom ASICs which can tolerate more failures than standard RAID6, and to store redundant data on a completely different controller. That way, if one board on a rack goes titsup (or is merely down for maintenance) chances are better that sufficient data is available to reconstruct the original file. The Reed-Solomon coding tech is well known and lends itself well to custom ASICs. The part that isn't well developed is the network routing/transport mechanism that lets you efficiently shuttle large quantities of data between boards in the rack. The general idea is well known in the literature (read: "Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance" by Michael Rabin) but the hardware to do the many-to-many interconnection isn't available as an off-the-shelf architectural component. Rather than trying to optimise the system from the viewpoint of how cheaply the MAID (massive array of inexpensive disks) can be constructed, anyone interested in this area should be more interested in how to guarantee /availability/ of the data for the least cost. For that you need better strategies than "let's see how many commodity disks we can fit in a rack".
Raw storage will always be cheaper than the effort of designing of fault-tolerant, high-availability systems, but it's worth the effort to at least implement "good enough" systems to attempt to achieve these qualities rather than sticking with the dumb "stack-em-high" approach. Scalability matters, or else your "super cluster" will quickly be overtaken by the next dumb implementation when the next 18-month increment rolls around.
so who's taking bets by mistahkurtz · 2009-09-02 15:23 · Score: 1

who's taking bets on how long it takes them to bight the bullet and shell out the cash for a netapp, emc, ibm, hp or other true SAN?

there are reasons that companies pay large sums of money for them. it's not because the *can* or because the *want to*.

one day they'll realize this.

--
not only is time travel possible, it's irrelevant.
FYI: Lake City College is Legit by Farhood · 2009-09-05 05:39 · Score: 1

I visited LCCC several times when in student government a few years back. They're a legitimate college with a good student population and decent teachers. They're right outside Ocala, FL - halfway between Tallahassee and Orlando.
You really need hardware RAID10 for that by DamnStupidElf · 2009-09-09 08:59 · Score: 1

At least I wouldn't trust software RAID10 to write to both disk sets and then fill in the the other set with the redundant copy when it had time. That really needs a battery-backed cache to implement safely. The overhead of RAID6 parity calculation should decrease for bulk writes, but at some point the CPU is going to be spending too much time calculating parity and not doing other stuff. 16 100MB/s drives in RAID6 would put quite a load on the system, but if it's only a file server it may be acceptable. I agree that degraded drives suffer a much worse slowdown,especially for partial stripe reads. You could easily start getting only 100MB/s for lots of small reads on that same RAID6 with one or two failed drives, and that's assuming the CPU is fast enough to do error correction at 100MB/s (with two drives missing the fast algorithm for accelerating raid6 stops working and it has to emulate gf(2^8) multiplication with lookup tables). Most of my personal needs are cheap bulk data storage (movies, isos, etc.), so RAID5/6 makes sense. At work, I use RAID1, 10, and 5 since we don't have hardware support for RAID6 on the SANs. Production data goes on RAID10 because we can afford it, mirroring for system drives, and RAID5 for test/development systems that just need lots of storage.
1. Re:You really need hardware RAID10 for that by Anarke_Incarnate · 2009-09-09 09:15 · Score: 1
  
  I would not recommend partial stripe-set writes in software either. It can be done but defeats the purpose of RAID 10 in many ways.
  Also, my comment about 100MB/s was in relation to something like each drive putting out that much on a regular basis. I doubt, very much, that it can be sustained.