Does ZFS Obsolete Expensive NAS/SANs?
hoggoth writes "As a common everyman who needs big, fast, reliable storage without a big budget, I have been following a number of emerging technologies and I think they have finally become usable in combination. Specifically, it appears to me that I can put together the little brother of a $50,000 NAS/SAN solution for under $3,000. Storage experts: please tell me why this is or isn't feasible." Read on for the details of this cheap storage solution.
Get a CoolerMaster Stacker enclosure like this one (just the hardware not the software) that can hold up to 12 SATA drives. Install OpenSolaris and create ZFS pools with RAID-Z for redundancy. Export some pools with Samba for use as a NAS. Export some pools with iSCSI for use as a SAN. Run it over Gigabit Ethernet. Fast, secure, reliable, easy to administer, and cheap. Usable from Windows, Mac, and Linux. As a bonus ZFS let's me create daily or hourly snapshots at almost no cost in disk space or time.
Total cost: 1.4 Terabytes: $2,000. 7.7 Terabytes: $4,200 (Just the cost of the enclosure and the drives). That's an order of magnitude less expensive than other solutions.
Add redundant power supplies, NIC cards, SATA cards, etc as your needs require.
Get a CoolerMaster Stacker enclosure like this one (just the hardware not the software) that can hold up to 12 SATA drives. Install OpenSolaris and create ZFS pools with RAID-Z for redundancy. Export some pools with Samba for use as a NAS. Export some pools with iSCSI for use as a SAN. Run it over Gigabit Ethernet. Fast, secure, reliable, easy to administer, and cheap. Usable from Windows, Mac, and Linux. As a bonus ZFS let's me create daily or hourly snapshots at almost no cost in disk space or time.
Total cost: 1.4 Terabytes: $2,000. 7.7 Terabytes: $4,200 (Just the cost of the enclosure and the drives). That's an order of magnitude less expensive than other solutions.
Add redundant power supplies, NIC cards, SATA cards, etc as your needs require.
I think you'll a bit high. I put together a 5-500Gb Sata II disk setup with Raid-Z in a 5 disk enclosure for under $1000. I run it off my Sunfire v20z. That's 2 TBs for under 1k USD!
Google have a great solution that focuses on the “cheap” part without compromising much the latter two. If you have not read up on the Google Filesystem, definitely take the time to. At the very least, it seems to call into question the need to shell out tens of thousands for high-end storage solutions that promise reliability in proportion to the dollar.
Why bother.
I actually have two 48GB databases full of minimal instruction sequences for generating boolean functions. Do I win the obscure use of disk space prize?
Program Intellivision!
I wonder if the Coraid ATA-over-Ethernet would be good enough? It ditches TCP/IP in favor of raw Ethernet frames so has much lower overhead than iSCSI and only major loss is no routing. http://www.coraid.com/
2 94698,sid5_gci1161824,00.html
BTW, I read recently that where 4Gb FC really excels is in large block sequential transfers and that small random access transfers are actually better over gigabit iSCSI. Check it out: http://searchstorage.techtarget.com/columnItem/0,
Plus you really have to think about other bottlenecks. How many disks need to be striped to consistently saturate the bandwidth of 1Gb Ethernet? 10Gb ethernet? What about the bus that the host adapter/NIC is on? Precious few boxes have 4x PCIe and then what about CPU overhead, managing all this streaming data? Just food for thought...
--
It seems to me that even if the entire setup is prone to failure, all you really need is a gigabit crossover or two running to an identical setup. I don't know if ZFS does anything like this, but I can think of at least one way to make it work on Linux: DRBD + OCFS2 + heartbeat. If you're smart, you can even do some load balancing, at least until one of them fails -- and when that happens, the other should be able to take over very quickly, if not instantly -- Linux heartbeat means it would simply takeover the other machine's IP and start its services.
So, that's $6k total instead of $3k.
The one problem I have with OCFS2 is that when it fences a system, it tends to either bring the whole thing down (kernel panic), or in newer versions, give you the option of forceably rebooting instead. This killed it for a project I was working on, where one of the machines had other mission-critical systems running that were not on the OCFS2, and thus, it seemed retarded to panic and bring down everything else too.
So if that's your problem, you can always build a third, identical system to run the other stuff on. $9k.
Even if you figure another $1k for random stuff, like maybe a LOT of gigabit crossovers, or 10gig fiber, or something, that's still a fifth of the cost of the "business-grade" or whatever else he was considering. Even assuming the worst-case scenario, where the homebrew system costs a lot more to maintain (even electricity and cooling, maybe), how long will it take for it to cost another $40k? And this way, you have an ENTIRELY redundant system -- the only way you lose it is if, say, the whole building blows up.
I mean, I sort of agree that you get what you pay for. But when the difference in price is that much, the only way it's ever worth it is if there's really great support with the high-end package. And is it $40k worth of support? If not, I imagine this guy could put together a company selling little $3k, $6k, and $10k systems for $20k each (including support), shaving off $30k even for the most paranoid.
And all of that is pretending you're right about the cheap consumer-grade hardware actually being less reliable.
Don't thank God, thank a doctor!
> I cant really see why one would bring up ZFS and OpenSolaris for this purpose
Here's why:
1) Snapshots. ZFS lets me make lots of snapshots to protect myself from user error, viruses, etc destroying my data. ZFS snapshots are so lightweight that I can make them hourly at nearly no cost in time and disk space.
2) Data integrity. Even RAID-5 can allow some errors to creep into my data (google: bit rot). ZFS has a much higher level of data integrity protection.
3) Cost/Performance. ZFS RAID-Z appears to be much faster than software RAID-5. it appears to be even or faster than hardware RAID-5. Hardware RAID-5 is much more expensive than software.
- For the complete works of Shakespeare: cat
It's all a question of scale, and your scale is a bit skewed.
The premium paid for higher-end storage is decidedly nonlinear. For marginally more reliable or faster storage, you pay about a factor of ten. One example I'm familiar with is Hitachi. We had a 64TB HDS array a few years ago that was worth roughly $2M. We could have purchased an equivalent amount of commodity storage for probably $200k at the time, but didn't. Why would we spend the extra money? Speed, configurability, expandability, and reliability.
First of all, speed. That thing was loaded with 73GB 15k FCAL drives. RAID was in sets of four disks, with no two disks in a set sharing the same controller, backplane, or cache segment. Speaking of cache, the rule was 1GB/TB. so we had 64GB of fast, low-latency, fully mirrored cache on the thing. It was insanely fast, and (most importantly) didn't slow down under point load. One tool automatically ran on the array itself, looking for hotspots and reallocating data on the fly.
Configurability: We could mirror data synchronously or asynchronously to our DR site, by filesystem, file, block, LUN, or byte. We could dynamically (re)allocate storage to multiple systems, and moving databases between machines was a breeze. Disk could be allocated from different pools (i.e. different performing drives could be installed), depending on requirements. Quality-of-Service restrictions could be put in place as well, although we never used them.
Expandability: The beast had 32 pairs of FC connections, could support 96GB of internal mirrored cache, and I can't remember how much actual disk. The key wasn't the amount of disk we could put on it, so much as how well the bandwidth scaled--and it scaled well.
Finally, the real key - Reliability. All connections were dual-pathed, with storage presented to a pair of smart FC switches which were zoned to present storage to various systems. We could lose three of the four power cables to the main unit (auxiliary disk cabinets only had two power connections each), and still run. We could lose any entire rack, and still run. We could lose any switch in our environment, and still run. We could lose two disks from the same RAID set and still run. When we lost a disk, the system would automatically suck up some cache to use for remirroring the data to multiple disks as fast as possible, and then after protecting it, would remirror back to a single logical device. In the event that we lost the entire device, we could run from our DR site synchronous mirror with less than a ten second failover.
This sort of thing is massive overkill for most people and companies, but when someone is doing realtime commodities trading, (or banking, or stock exchanges, etc.) the protection and support are worth the extra money. You just can't build that sort of thing on your own for any less money, at the end of the day.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
"A 128 bit file system can't ever be filled. (yes "never" do the math)"
:)
I did the math. That would handle 3.4x10^38 Bytes, or 340 trillion YottaBytes (1 YottaByte = 1 billion PetaBytes, 1 PetaByte = 1 million GigaBytes). That's a very large number of Bytes, but I still wouldn't use the word never. I usually even try to avoid the phrase "never in my lifetime", but in this case that's probably a safe bet.
Note: I'm using the hard drive manufacturer's definition of *bytes here.
Carpe Cerevisi - Seize the Beer
Mass of the earth = 5.9742 × 10^27 grams
Make the drives out of the earth, you need a drive density of 57Gb/gram
A drive with a density of 1 bit per carbon atom, 5.4 *10^10 metric tons
Size of said nanotech drive, a cube 2.88 Km tall (at the standard density of carbon)
Never in your lifetime is a really safe bet.
T
Laws are horrible moral guides, moral guides make even worse laws.
Look at the high-end Sun Fire X4500 server ("Thumper") they released a few months ago: 48 drives in 4U with *no* hw RAID raid controller, Sun designed this server as the perfect machine to run ZFS.