Domain: backblaze.com
Stories and comments across the archive that link to backblaze.com.
Comments · 162
-
Something like this
Do something like this. Put it in a case / box / cabinet of your own design since you don't need the rackmount capability.
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
-
MM Drives are cheap today
Although I endorse this approach for people with big storage needs, space and power budgets, I had in mind an application that would that would RAID 45 of them for an obscenely high IOPS + bandwidth FC node for media content storage for video work. The kind of thing James Cameron would use for shipping his in-progress movies on. I might actually go with something else, like this instead since it supports up to 70 TB in 5U and now is certified to work with normal SAS controllers instead of a proprietary switch.
Naturally at five racks instead of 5U your suggestion lacks a certain perfomance density for this application - though admittedly you do have the advantage in the $/TB area, that's not always the only consideration.
Over time SSD will become cheaper than spinning disc, and as performant as RAM. That will change many of the market dynamics and may cause some unpleasantness.
-
Re:Exactly what you're doing
Yeah, keeping those drives in a huge online storage array is probably better. Then they can mirror them across multiple sites.
Here's a compelling petabyte online RAID system for cheap:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
-
or build a couple of these...
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
I'm sure the price has come down some since this article was published...
For those too lazy or paranoid to read the link... It describes how backblaze builds "cheap" 67 TB storage boxes for use in their online backup service. All the hardware specs are open sourced and freely available. They also talk a little bit about the software for managing all of the spce they have, but not in any real detail...
-
Shared resources are cheap.
My point was not that the storage really is that cheap - of course it costs four times what I said if you want geographically diverse replication and that's how I would do it. If you want fiber channel with proactive support, go ahead and multiply the cost 40x or more. And yes, there's no such thing as an infinite resource. That's why I said effectively unlimited, as in it doesn't matter that there's not an shoreless sea of pasta salad behind the "all you can eat" claims at the buffet - if there's more than I can or will eat, we're good.
But the many who use little subsidize the few who use much, and your storage needs are very different from those of a shared hosting provider. I don't have much insight into BlueHost's operations, but they claim 1.9 million hosted domains and over 525,000 paying customers. Look closer at those numbers and you'll see that they have less than 1000 servers, and so they're running over 2,000 domains per server. It's a good bet that the vast majority of those customers aren't trying to store terabytes of data or open the new ebay on a $10/month hosting plan, so the many pay for the few because at BlueHost there's only one plan and only one price. They're not going to try and ding me with unexpected charges because my bandwidth or storage went over my limit one month, because there's not any limit to go over. For some people like me and the people in the article who posed the question saying they have little money, that's comforting. Bluehost claims to only have 20Gbps of aggregate bandwidth to the Internet - I have individual servers with more bandwidth to the intranet than that and you probably do too - and if anything that's where their bottleneck is. But my hosted sites come up just fine, so I don't worry about it.
When you operate at that scale, you get economies of that scale. You don't buy your storage from NetApp, HP, EMC or Hitachi. You don't pay $3K/TB for bare 450GB FC drives and another $3K for the software licensing and hardware and support to run it. You build it yourself from stuff like BackBlaze does it, out of commodity hardware that delivers the storage and IOPs through systems engineering, and redundancy through software. You self-warranty by buying hot and cold spares. You buy 24/7/365 15 second response support by hiring rotating shifts of people whose livelihood depends on showing up at work on time. You step up and be responsible for your own systems engineering when you get that big and if you blow it you're toast, so you take good care. You use open-source technologies like openfiler (has those snapshots you like) and Lustre. And for God's sake you're not doing anything so retro as trying to spool all that stuff to tape. Really: Tape? Still? Google and Amazon and others do it in analogous ways.
These guys know that commercial SANs are not made from magical parts - they're servers and drives and software, crafted with engineering that can be bested cheaply and reliably if you know what you're doing. If you can't meet the engineering and service requirements, you're better off buying the SAN. Even if you can meet the requirements, for most people the SAN is a better deal because their needs don't support the time and effort and so roll-your-own solutions, though cheaper up front, offer poor net ROI over the equipment lifecycle. I have heard it said that the SAN also gives you a throat to choke when things go horribly wrong, but I know guys who think like that and I don't like them and I don't respect them.
Of course shared hosting and BlueHost isn't for everybody, nor is roll-your-own servers, storage and networking. Some h
-
The SAN argument
The SAN argument is that your storage is so precious it must not be stranded. If you're paying $50K/TB with drives, controllers, FC switches, service, software, support, installation and all that jazz then that's absolutely true. If you're doing something like OpenFiler clusters on BackBlaze 90TB 5U Storage Pods for $90/TB and 720 TB/rack you have a different point of view. As for somebody showing up to replace a drive, I think I could ask Jimmy to put his jacket on and shuffle down to the server room to swap out a few failed drives every couple months - that's what hot and cold spares are for and he's just geeking on MyFace anyway. Low utilization? Use as much or as little as you like - at $90/TB we can afford to buy more. We can afford to overbuy our storage. We can afford to mirror our storage and back it up too. In practice the storage costs less than the meeting where we talk about where to put it or the guy that fills it. If you want to pay for the first tier OEM, it's available but costs 10x as much because first tier OEMs also sell SANs.
Openfiler does CIFS/NFS and offers iSCSI shared storage for Oracle, Exchange and SAP. If you need support, they offer it. OpenFiler is nowhere near the only option for this. If you want to pay license fees you could also just run Windows Server clustered. There are BSD options and others as well. Solaris and Open Solaris are well spoken of, and ZFS is popular, though there are some tradeoffs there. Nexenta is gaining ground. There's also Lustre, which HP uses in its large capacity filers. Since you're building your own solution you can use as much RAM for cache as you like - modern dual socket servers go up to 192GB per node but 48GB is the sweet spot.
Now that we've moved redundancy into the software and performance into the local storage architecture, moving storage to the edge is exactly what we want to do: put it where you need it and if you need a copy for data mining then mirror it to the mining storage cluster. We still need some good dedicated fiber links to do multisite synchronous replication for HA, but that's true of SAN solutions also. We're about 20 years past when we should have ubiquitous metro fiber connections, and that's annoying. Right now without the metro fiber the best solution is to use application redundancy: putting a database cluster member server in the DR site with local shared storage.
Oh, and if you need a lot of IOPS then you choose the right motherboard and splurge on the 6TB of PCIe attached solid state storage per BackBlaze pod for over a million IOPs over 10Gig E. If you need high IOPS and big storage you can use adaptor brackets and 2.5" SSDs or mix in an array of The Collossus, though you're reaching for a $6K/TB price point there and cutting density in half but then the SSD performance SAN has an equal multiple and some serious capacity problems. If you go with the SSD drives you would want to cut down the SAS expanders to five drives per 4x SAS link because those bad boys can almost saturate a 3Gbps link while normal consumer SATA drives you can multiply 3:1.
If you're more compute focused then a BackBlaze node with fewer drives and a dual-quad motherboard with 4 GPGPUs is a better answer. At the high end you're paying almost as much for the network switches as you are for the media. If you're into the multipath SAS thing then buy 2x the controllers and buy the right backplanes for that - but
-
One solution total backup
VMWare Snapshots
Are you backing up just data, or configurations or what? Backup Solutions are nice and all, but you're still missing something
.... all the crap^H^H^H^H configurations that you've collected over the years of using that particular setup.And once you go to VMWARE (or other VM product) you'll quickly realize that the abstraction away from specific Hardware is very nice indeed.
However, if one is REALLY concerned about backups, a duplicate Hardware setup in a seperate location sitting idle (or cold) is a necessity. And having a VMWare snapshot ready to load on backup hardware is just tits when things REALLY go south. You end up looking like a genius, and get to play Scotty (over engineered everything).
The difference between amateurs and professionals is not when things are going well, it is when the shit hits the fan. A weekend Geek can built the $8000 backupsever or whatever of storage, but once the drives start to fail (and they will) that solution starts to REALLY suck because you can't get to the freaking drives easily (and I doubt it will tell you that the drive even failed).
Let me just say it this way, if you can't afford "over engineered" equipment, you can't afford to do it right.
So, VMware, snapshots and spare hardware offsite are the way to go. Anything less these days is simply weekend geek pride.
-
Re:Build a Backblaze Storage Pod.
Try one of these babies on for size. 67TB for about $8,000.
That could only be a good idea for large installations like backblaze. You need to have lots of spares of everything, extra capacity for failures and someone on call to fix the thing when it breaks. There is almost no redundancy and they use consumer grade hardware which means that there will be very regular hardware failures. If you have a ton of the things, this isn't so much of an issue and it probably does end up being cheaper. But using just a couple, much less one of those things would be an exercise in sheer stupidity.
I think that it all depends on what your budget is and what you have access to. You don't need fibre unless its already there and you have the hardware and knowledge to use it. Those cards ain't cheap and would add much complication. Unless you have huge amounts of data (on the scale of several hundred GB or more) changing on a daily basis or can't afford to lose anything at all in case of catastrophic failure, just use the gigE or 100M that is already in place. If that isn't enough, you should really be looking at systems that are designed for remote replication and that gets really expensive, really fast.
Are you looking for an off-site mirror or backups? Those are not the same thing and you need to make sure that you know which you really need. I sincerely doubt that you need to be worrying about recovery time if the building burns down. Just worry about reducing the risk of data loss in the event of failure.
For backups, KISS is the most important thing. If something in the university is already in place, use that. Backup administration is a PITA and you're going to have to hand it off to someone once you leave (assuming you're a student) who may know almost nothing about computers. The simpler the system is to use, the easier the handoff is going to be and the less the people after you will hate you. Someone else mentioned asking the IT department if they offer any backup programs. That would be the best solution, I think. More expensive on paper, possibly but not likely. More expensive overall, I doubt it.
If you end up building a backup server, take into account hardware failures and how much time people can spend to babysit the thing. ZFS has some awesome features, but I don't think that I would use it with anything other than Solaris or maybe BSD. There is no way that I'm going to trust backups to FUSE for linux. Then again, I don't think that ext3 or ext4 would be my first choices, either. Personally, I would probably go with JFS with linux if you have 12TB and growing. Look into external storage arrays. I'm not so familiar with this price range, but HP's MSA 2000 or something comparable might be a good choice if you have the $$$. Just remember to budget for replacement drives if you go the hard drive based route. I'm using a hard drive backed backup solution at work and BackupPC is what I have been using for software. I have been pretty happy with it so far. Its free, it works and has some really good features (like intelligent backup so that it doesn't just blindly store 20 copies of the same file) but has a bit of a learning curve. Nothing like Zmanda's MySQL backup but it needs a little more than a few clicks. I also wish it didn't hit the backup targets so hard but that may not be an issue for you.
I know that there is the temptation to do something really cool and roll your own. I get that temptation a lot, too but you need to ask yourself if you're doing it for fun or to get the job done. If its the former, good for you and I'm jealous but I suspect its the latter. Do the minimum to satisfy the requirements with the least amount of required maintenance and the least cost. Let IT worry about backup systems if you can. That way you can worry about making television programs instead of checking up on the backup server whenever something hiccups or WHEN (not if) hardware fails.
I hope that my rambling helped a little. Good luck in figuring this out. -
Re:Amazon S3
Why do anything when you can pay someone else twice as much? 12TB from Amazon will be an order of magnitude more expensive than just running a storage server, and you have to pay for internet bandwidth instead of just running a wire.
-
Build a Backblaze Storage Pod.
Try one of these babies on for size. 67TB for about $8,000.
There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.
Talk to a mechanical engineering student on campus, they can probably help with that.
-
Re:A Very Shortsighted Article
Scroll down to the bottom of the article and there is a parts list. The case is custom manufactured and is not for sale but there is a link to the 3d model. Take that file to a local metal shop and get them to make the case for you.
-
Backblaze
I have been using encrypted Backblaze $5/mo backup for my home system for a few months now. I think in the end it will save me way more money on the bottom line then a dedicated system that will periodically need updated and maintained. It is uses 2048-bit RSA public/private key system that makes me feel warm and fuzzy. For the first time I feel pretty good about my backup of all my family photos and movies. http://www.backblaze.com/