Distributed Storage Systems for Linux?
elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"
Check out Lustre at http://www.lustre.org/ It's being developed/used by the DOE on alot of Supercomputer Cluster systems, for multi-terabyte storage stuff.
It would be useful to know about how much data we're talking.
I suppose there's a difference between serving just 500GB or a few terabytes.
Stop making that big face!
Also, what else do you need to do with the data? Back it up? Mirror it? Data mine it? Any type of performance requirements? Interoperability requirements? The list goes on. We need more information.
Panasas http://www.panasas.com/products_overview.html has some products which probably fit your requirement of high speed distributed storage.
- What's "a lot"? 1MB is a lot of data if you think about it. When people start talking about "a lot" of data these days, I assume they're meaning hundreds of terabytes. Is that what you mean?
- What's the budget? What performance do you need? Do you need to back it up? Do you need to replicate it? Your post is sort of like "hi, I have a problem. What is the answer? Thanks!"
Also, it's "too expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.If you can afford NetApp, why not keep with NetApp? A bunch of Linux boxes is not a storage solution. Indeed, what does Linux have to do with anything? We're talking storage here. What are you planning to do - put in 200 of them with internal SATA drives? Yeah, that'll be a lot cheaper to maintain...
I'm not shilling for NetApp, but if you really have "a lot" of data to put "on the web" "24/7" then you need some kind of real storage solution like a NetApp or one of their competitors.
Now go away and please take Cliff with you.
Advice: on VPS providers
Back about 4 years ago we were forced to get a maxtor netattach (can't remember the name) because at the time journaling file systems were virtually non-existent. Then that lasted for a year before we outgrew it and then we went with a Dell NAS server 600GB also windows 2000 embedded. It has scsi connection to connect SuperDLT tape backup drive and the windows 2000 backup program works for our needs. Simple for any average joe to restore files.
I did look into getting a linux NAS but the solutions out there didn't support external tape drives all that well like SuperDLT. Backup software was crazy on Linux. I tried over 15 IIRC different Linux backup solutions, everything from ARCserve for Linux to free backup scripts. I just use tar to backup our mail server and to restore it's not totally intuitive but I manage.
So please tell me now there has got to be a decent Linux NAS solution out there that has web based interface to manage (add users, groups,etc.), has scsi connection for LTO, DLT tape drives, and comes with decent backup and restore program/interface.
How about OpenAFS ? It is sort of like NFS on steroids, with redundancy, scaling, cacheing, Kerberos-based security ... I've just started looking at it myself, but it seems pretty slick.
10b||~10b -- aah, what a question!
NetApp is number four in storage revenue terms, after EMC, HP and IBM
so go ask them about what you want
really you can admin your white box's (that become a NAS ) or you can get a NAS
are you thinking SAN ?
also talk to Apple they do some nice product as well as SUN
whats this for large data ?
video data go talk to SGI and their XFS products
really it depends on what your doing NetApp is great for company File system of documents but Bad if you want to get the most out of your storeage and you do mostly video/music dont care about snapshots etc....
regards
John Jones
Get A Centera.
I'm biased but this is a high level Linux based storage system done right. It's not easy to create a coherent storage system out of lots of separate machines, the software that runs on this cluster does a lot of work. This thing fully redundant with no single point of failure, dynamically expandable without even taking it offline, it scales to 100's of terabytes and manages all that content continuously (scanning for corruption and fixing it, garbage collecting, etc..). The cluster has redundant backend networks and parallel paths everywhere, it even uses reiserfs to store the data. There's a lot of good engineering in this unit and they sell it at a decent price compared to NAS boxes.
Check it out:
http://www.emc.com/products/systems/centera.jsp
I do work for EMC (like I said.. I'm biased) but I don't speak for them, my opinions are my own.
Storage clustering is simply hard to do while still presenting a low level filesystem interface. Tossing that out and creating file storage as a high level service with a richer interface seems like the right approach to me. Show me a storage clustering solution that doesn't do that and I'll show you something full of bugs, expandability issues, limitations, and pain points.
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
SAN Solutions via EMC/Dell package http://www1.us.dell.com/content/products/compare.a spx/fibre?c=us&cs=555&l=en&s=biz
In my opinion is best for the buck ROI
CX300, 15 146G drives, with Snapview, Powerpath, Navisphere, with Gold Support on all software and hardware components - $55,000. Extremely scalable. Database cluster performance improvement by far over 50% as compared to NAS or DAS. What ever package on top starting from middleware hardware for embarrassingly parallel cluster farms to MogileFS to MSCS will be pretty happy, assuming your database/app is optimized etc... And than you should run VMware ACE.
I'm sure they'd be happy to sell you something along the line of serving data....
I did some reasearch on clustering filesystems for work a while ago. Here's the Cliffs-notes version:
GFS High-end, a pain in the ass to set up and run. Wants a RHEL server or two to run. OpenGFS Started as a fork of GFS when the GFS license changed, it has followed a bit of a different path. Not nearly as stable or fast as GFS, but might be there some day. Lustre Lustre should be really nice, but is horrendous to run (at least, that's the word from my friends at Sandia, who know a thing or two about it). General consensus is that you need a full-time staff member just to make it work. If you can afford that, it's a good way to go. PVFS Fast, light-weight, not POSIX-compatible. If your apps don't need the stuff it doesn't do, or you're willing to write some glue code for your app to speak PVFS natively instead of using the FS driver, this is a great way to go. Looks simple to set up (as simple as these things get).^]:wq
A barely-related subject - I've been wondering whether there's some way to collect the unused space on all the Windows workstations around here into a shared space for storage.
This is purely a speculative exercise, but I keep wondering if some combination of:
Yes, I know it's kind of silly, and performance seems like it would be pretty pathetic, but the more I think about it, the more I want to see if I could actually do it (think pretty much the same mindset that the IP-over-carrier-pigeon guys had...)
Heck, it might conceivably actually WORK for a large-but-infrequently-accessed historical repository or something...
Or has someone already started some sort of "Virtual ATA-over-ethernet-from-a-file driver for Windows" project and spoiled my fun?...
Hacker Public Radio is our Friend
We have about 27TB of data from Mars (and adding another TB per month) that we need to keep online. We have been using netapps, but at ~$25K/TB, plus maintenance (3 years maintenance is about as much as a whole new system) they're just WAY too expensive for data warehousing.
We've moved to using linux based OpenAFS servers. A high quality 3U box (qsol.com) loaded with 16x 300GB ATA drives costs about $8.5K and provides us about 3.5TB (2 drives for parity, 2 drives for hot-swap). That works out to $2.5K/TB. If your risk tolerance is higher than mine, you can bring that up to $8K/5.5TB, for about $1.5K/TB). We really want 99.999% availability, so just to be safe, we keep a 100% redundent read-only copy on a second machine (AFS supports this beautifully, including automatic fail-over).
OpenAFS has a couple of features that make it better than NFS (client-side cache, for instance), but it also has a few drawbacks, like no files >2GB.
If your data is partitionable into small-enough discrete units that have low or not inter-unit dependencies, then it should scale almost without limit.
After all, as a collection there is an immense amount of data on "the world wide web" but since it's partitioned, scaling isn't an issue.
Even before the web, the universe of ftp, gopher, news, and other servers held gobs and gobs of data, nicely partitioned.
When answering questions like this, it would help to know the organization of the data and if it can be easily reorganized, or at least viewed from a different "angle" that may offer other solutions you may not have thought of yet.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Check out http://www.ibrix.com/ This is a perfect solution for your requirements. Pixar uses this.
Every day is Saturday and all the rainbows have silver linings.
Our data is important enought that all of it needs 24x365 access, but we're too cheap to pay for hardware to support that.
This is the solution archive.org uses.
http://www.archive.org/web/petabox.php
They are on the order of petabytes
-- I was raised on the command line, bitch
There's a great article on ATA over Ethernet (AoE) and it has a story about a guy who put 2TB of RAID 10 up for $6,500. It looks like a fascinating solution for storing large volumes of data. If your data is primarily static, a couple of these machines replicating between themselves and you're good to go.
If you don't want crime to pay, let the government run it.
It looks great, except for one thing:
2.4 only.
The system itself is developed in a VERY cathedral-like style by a company called ClusterFS, who is selling the 2.6 version. My guess is they'll release it for free when Linux 2.8 or 3.0 is released, so that they always give away the obsolete version and sell the new one.
Don't thank God, thank a doctor!
create a number of accounts at Gmail. Then simply e-mail your .tgz data.
I prefer the "u" in honour as it seems to be missing these days.
The theoretical upper limit of any file system is limited by 2 things, the address space, and the efficiency of the data structure.
In a 32 bit system, that means that, in theory you could fit 4.2 billion objects into a file system... but don't try it. NTFS craps out at between 15 and 50 million depending on whose numbers you are willing to listen to, EXT3 starts to lose performance between 50 and 100 million objects (inodes).
The worst thing you can do then is compound the problem by adding filesystems to get capacity. The management of such a complex system becomes untenable without serious automation.
Try looking at content addressed storage (CAS) at http://www.cascommunity.org/ see if that approach isn't more scalable. Basically CAS abstracts the object from the filesystem by addressing it using a self-referrential and unique (usually hashed with MD5 and/or SHA256), "valet ticket." Present the Content Address to the system, it gives you your data.
...But I digress. TREMBLE PUNY HUMANS!ONE DAY MY SPECIES WILL DESTROY YOU ALL!
um, the stats speak for themselves ~ 64-bit scalability to support files to 9 million Tb, filesystems to 18 million Tb ~ Instant data sharing without network mounts or data copies among all major OS: IRIX® Sun(TM) Solaris(TM) IBM® AIX® Windows® (2000/XP) Linux® (32 and 64 bit) Mac OS® X Unix® Flavors # Highly optimized distributed buffering techniques that provide the industry's fastest performance # High availability with automatic failure detection and recovery # Centralized, intuitive Java(TM) language-based management tools # POSIX® compliance that requires no application change So if you really do have large volumesof data and want 99.999% uptime, look at this and tremble in awe! http://www.sgi.com/pdfs/2508.pdf
It's just a lot nicer in a lot of ways. What, pray tell, does Lustre offer that makes it "needful" in that situation? He could just get some sort of a SAN device. But Lustre would probably be nicer. In the same way, 2.6 is nicer.
I've seen systems still running 2.2, but it's not pretty. 2.6 is the way to go these days. Lustre is about the ONLY reason anyone should be using anything older.
Don't thank God, thank a doctor!