Red Hat announces GFS
PSUdaemon writes "Over at Kernel Trap they have an announcment that Red Hat has released GFS under the GPL and offer it through RHN. This could potentially be a very substantial offering from Red Hat."
← Back to Stories (view on slashdot.org)
Will it run on distros other than Redhat? According to the linked page, it looks like it only for redhat enterprise platforms.
--
11 Gmail invitations availiable
Would it be too much to ask that the writeup blurb include a ten-word summary of what makes GFS any different from any other Linux-ready filesystem? Many sites get slashdotted, making most links unusable for 12 hours or more.
[
What does GFS exactly do for you? Allow you to have your hard drive in another computer?
Just because its opensource doesn't mean you can download it for free. Under the GPL suppliers are only required to make the source code available to people who buy/legally obtain the product. It's perfectly possible that you still have to pay to get the binary, although of course once you have it you can compile your own version from the code and sell it or give it away.
Still needs to be said - Opensource means free as in speech
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
Though in this case, you can download GFS and all the related software for free. Just go to the
cluster project page.
GFS on the GPL? From RHN? WTF?
Normally I'd ask what's the BFD? but most people would just LOL. Then other people would probably want to know if it comes on DVD or FTP, but the FAQ will explain it JIT. Now what would be really cool would be a PDA that would run it with an RGB display, but it might need extra RAM.
HTH.
Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
GFS allows multiple redundant storage computers to serve a whole lot of other servers for data availability purposes. It isn't just another FS like EXT* or JFS or .... It's a transparent networkable filesystem with failover and all of the other goodies needed to implement a hardcore enterprise level solution for serving needs like a million hits a minute sites, or filesharing with 50,000 users...
Say you want to create a webserver cluster that can host some big files and dynamic content and survive a slashdotting. No one machine can survive all of us hitting it for video and dynamic content at once, so you build your cluster so that the video is distribtued over several machines, the webservers are distributed over some other machines, and the layers in between the that decide which request goes to which physical hard drive holding a copy of the video are also made redundant.
Now if, after running for some time, one of the machines gets coffee spilled on it and dies, GFS will automatically route around it. The result is that a slashdotter will not be aware of the failure, and still get the video.
Meanwhile you can fix the problem and bring the downed machine back on-line again.
Red Hat Global File System (GFS) is an open source, POSIX-compliant cluster file system and volume manager that executes on Red Hat Enterprise Linux servers attached to a storage area network (SAN). It works on all major server and storage platforms supported by Red Hat. The leading (and first) cluster file system for Linux, Red Hat GFS has the most complete feature set, widest industry adoption, broadest application support, and best price/performance of any Linux cluster file system today.
Red Hat GFS allows Red Hat Enterprise Linux servers to simultaneously read and write to a single shared file system on the SAN, achieving high performance and reducing the complexity and overhead of managing redundant data copies. Red Hat GFS has no single point of failure, is incrementally scalable from one to hundreds of Red Hat Enterprise Linux servers, and works with all standard Linux applications.
Red Hat GFS is tightly integrated with Red Hat Enterprise Linux and distributed through Red Hat Network. This simplifies software installation, updates, and management. Applications such as Oracle 9i RAC, and workloads in cluster computing, file, web, and email serving can become easier to manage and achieve higher throughput and availability with Red Hat GFS.
Highlights
Performance
Red Hat GFS helps Red Hat Enterprise Linux servers achieve high IO throughput for demanding applications in database, file, and compute serving. Performance can be incrementally scaled for hundreds of Red Hat Enterprise Linux servers using Red Hat GFS and storage area networks constructed with iSCSI or Fibre Channel.
Availability
Red Hat GFS has no single-point-of-failure: any server, network, or storage component can be made redundant to allow continued operations despite failures. In addition, Red Hat GFS has features that allow reconfigurations such as file system and volume resizing to be made while the system remains on-line to increase system availability. Red Hat Cluster Suite can be used with GFS to move applications in the event of server failure or for routine server maintenance.
Ease of Management
Red Hat GFS allows fast, scalable, high througput access to a single shared file system, reducing management complexity by removing the need for data copying and maintaining multiple versions of data to insure fast access. Integrated with Red Hat Enterprise Linux (AS, ES, and WS) and Cluster Suite, delivered via Red Hat Network, and supported by Red Hat's award winning support team, Red Hat GFS is the world's leading cluster file system for Linux.
Advanced features
Scalable to hundreds of Red Hat Enterprise Linux servers. Integrated with Red Hat Enterprise Linux 3 and delivered via Red Hat Network, comprehensive service offerings, up to 24x7 with one-hour response. Supports Intel X86, Intel Itanium2, AMD AMD64, and Intel EM64T architectures. Works with Red Hat Cluster Suite to provide high availability for mission-critical applications. Quota system for cluster-wide storage capacity management. Direct IO support allows databases to achieve high performance without traditional file system overheads. Dynamic multi-pathing to route around switch or HBA failures in the storage area network. Dynamic capacity growth while the file system remains on-line and available. Can serve as a scalable alternative to NFS. Product Information Supported on Red Hat Enterprise Linux AS, ES, and WS. Red Hat Cluster Suite support available on Red Hat Enterprise Linux 3. Support for a wide variety of Fibre Channel and iSCSI storage area network products from leading switch, HBA, and storage array vendors. Mature, industry-leading, field-proven, open source cluster file system.
Agile Artisans
I think the other people have covered the basics pretty well - plug lots of computers into one fibrechannel or possibly firewire disk or disk array.
;)) or the same with uml, Zen, etc
The second really interesting use is with virtualisation - imagine if you want all your S/390 virtual machines to share the same bsse file systems for efficiency (given the price IBM charge for mainframe disks
I was reading only the other day about the Google File System. So there are now two acronymns which are both GFS which both refer to a distributed file system. That's not going to get confusing. Nope, not at all.
Are there any distributed filesystems that don't have serious issues?
I mean, NFS has issues with security (relying on numeric user id's sent by the client is a nightmare). Locking is problematic. Different versions have severe compatibility issues.
I forget the issues with AFS, but it's successor, Coda, seems not very mature, although it is one of the more promising filesystems out there. InterMezzo is a more complete and robust implementation of the Coda featureset, but is Linux-only.
SFS looks very promising (simple, but effective), but requires NFSv3 clients and servers to interact with the kernel.
None of these filesystems allows regular users to access remote filesystems (superuser privileges are required for mounting) like with FTP.
What's so hard about getting this stuff right? And can we please have kernels that support userspace filesystem drivers (or, better, any drivers)? (Yes, I know about LUFS and FUSE).
Ok, rant over. Thoughtful comments, corrections and pointers appreciated.
Please correct me if I got my facts wrong.
GFS has a number of useful applications. But I think the times where you could design your enterprise around the idea of a globally consistent file storage system are over: enterprises are getting more flexible, more decentralized, and people would prefer not to have to deal with IT staff over issues such as file space and permissions. And they can avoid it--since many of them make the purchasing decisions.
I don't see security in the least of features. Calling this a Global file system is a bit presumptuous, considering the lack of security prevents it from being used outside of a closed LAN segment.
Gustavo J.A.M. Carneiro
Released today, see more at Always Current Lineox Enterprise Linux Gets Global File System (GFS) Support.
What is the difference between GFS, NFS and AFS? (Other than AFS's global file structure, kerberization and encryption)? Do they all do the same thing, or does GFS add something that the others don't have?
They bought this technology when they bought Sistina. Sistina has been working on GFS for a long time.
Oh, but they said it was free, they didn't say it was free.
Don't you know the difference between "free" and "free"?
If so, let me explain:
1) Internet Explorer is free, for instance, as you don't pay for it;
2) Internet Explorer is not free because you cannot have its source to modify and make it more secure;
3) Professional distros like Red Hat and Suse are not free because you have to pay to have it;
4) These same professional distros are free because you can compile the source yourself whenever you can.
Got it? If you don't understand this, you'll might believe next time someone says "Linux is not free". Don't be fooled! It is free!
Now, the relevant quote is:
"We're looking for people help us work on this project so we can eventually get it included into the Linux kernel. Comments, suggestions, patches, and testers are more than welcome."
See the part that mentions "get it included into the Linux kernel"? It means it will be free.
Now, these superb guys at RH really should charge for a professional product with support. Soon, very soon, they might discover they must do what Sun does: have a personal low cost (maybe gratis) version, so that people can tweak it, use at home, report bugs etc.
I, for one, thank them for all the fish and get the message that everyone must contribute, no matter how little, and not just wait for them to make things for us.
And don't use English to discuss such things. Or, better yet, change English so that it becomes fit for use. I suggest stop using free to mean gratis. Just use gratis, like in "There's no gratis lunch".
Or you can download the SRPM's here
GFS was well-liked at supercomputing centers I have worked with until Sistina dropped the GPL license in favor of proprietary. They did this very suddenly and without warning. It pissed off a lot of potential users and the open source community. It has since fallen out of favor.
This move by Red Hat gives new life (and resources) to GFS beyond the OpenGFS Project that has also been continuing to work on the code.
Another recent development in this area is HP's decision to productize Lustre. Lustre is perhaps the most prominent and promising HPC filesystem.
SGI also announced a major deal last week involving Luster:
The new file system is expected to sustain write rates in excess of 8GB/sec and demonstrate single client write rates of more than 600MB/sec. To achieve this performance, the new file system will leverage Lustre, an open source, object-oriented file system with development lead by Cluster File System Inc., with funding from DOE. Lustre currently is used on four of the top five supercomputers, including the PNNL cluster based on 1,900 Intel® Itanium® 2 processors.
Contrary to popular belief the world is not nurb432-centric. Many other people (including myself) care about SANs, and can afford a small licensing fee (2200 USD is small compared to other solutions like XSan, which is 5000 USD, but as other people have said, if you want it for free, you can download the source, just forget any level of support).
I'm sorry you're not exposed to ERP and enterprise-level work, but many of us are. Slashdot's plugs are not exclusively for free-as-in-beer projects.
Not to YOU of course, because you have no need for such things.
Remember, it's Free Software. That means you can pay Red Hat for it and get their support. Don't want that, fine. Now the source is available, so you can download and compile it yourself, or print it out and wipe your ass with it. Or maybe your favorite distro will download it, package it, enhance it, and include it in their next release.
I mod down all the "free iPod"-sig losers.
IBM has a product called GPFS (General Parallel File System) which has sold on AIX for several years and is offered for Linux as well. On Intel based boxes it sells for about $1000 per CPU. I wonder how IBM will react to this Open Source competition ? The IBM product has very similar function - it is also used with Oracle RAC. It originated on the RS/6000 based SP clusters but has been ported out to be used on pretty much any AIX or Linux based cluster.
-The Mad Duke
Re-read what you just posted.
It says the *license* under which you distribute it must make it available to third parties. The GPL does not require you to *distribute* the source code to anyone except those who receive the product in executable form. But because it is licensed to third parties, anyone in possession of the source code *may* distribute it to third parties.
-- TTK
I don't think so.
Red Hat's HA clustering software is also GPL but it doesn't run on other distros (and is not supported by Red Hat on other distros).
The code itself is open source, that is true, but "Red Hat Enterprise Linux subscription [is] required" (http://www.redhat.com/software/rha/gfs/)
You can thank the Microsoft marketing engine for this mentality. I have been involved a few projects where managers have either understaffed IT believing the hype that MS products really don't need staff after setup (while MS products are easier/quicker to setup, they require, in general, more hands on time in day to day ops and as you add more third party products, things can mysteriously break. If you go MS, it is even more important to get someone with deep AND wide knowledge in administration) or have suggested going with OpenSource products expecting the same level of ease of use.
While it is amusing to see a MCSE struggling to configure Postgres or MaxDB (which can be a little tricky) and complaining about the lack of a GUI (I didn't have the heart to find and install the various GUIs for them...heh), it does not sit well with the PHB to see labor costs skyrocket with no discernable work being done (from their perspective).
The moral of this rant is: even though it is free software, that does not automatically mean that you should not have to pay for the expertise to setup, run, and maintane it. RedHat (and the other commercial distros) have excellent service and tend to service smaller companies at the same level MS only does for much larger companies. PHBs should be looking for gain in long term licensing costs and flexibility. No lock-in, no artificially driven need to upgrade, no technological sea change forced upon you.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
GFS is nothing like NFS, except they're both file systems. GFS is a filesystem specifically created for clusters. It means you have a lot of machines sharing a single logical file system. You can then add or remove machines from the cluster and the filesystem and all its contents remains accessible to all the nodes in the cluster. This is great for a lot of cluster tasks, such as having multiple load balanced web servers all serving the same content from a GFS system.
NFS on the other hand can be accessed from multiple machines but is ultimately hosted on one specific machine, giving you a single point of failure.
It's like deja vu all over again.
Normally I'd ask what's the BFD?
BFD is a library from the GNU project for manipulating ELF object code files, among other formats.
(OTE: off-topic excursion, PWB)
(PWB: posted without bonus)
what exactly does it give you that OpenAFS does not?
:-(.
Performance. Simplicity. Adopting AFS is kindof an all-or-nothing proposition - we looked into it, but it would mean retraining a _lot_ of physicists, some of whom make computer geeks look like social geniuses.
Is it primarily that it is more useful for highly parallel computing systems so that the actual nodes can share the actual block devices?
Yes, and it does it relatively elegantly. PVFS, the main alternative, is fundamentally a kludge IMHO and compared to GFS is horrendously brittle (one PVFS participant failing takes down the whole virtual filesystem) However, until such a time as GFS supports MPI-IO "ROMIO", PVFS will be the cluster FS used on our cluster
AFS is for distributed computing, GFS is for fault-tolerant cluster computing, similar to SGI's CXFS. Calling it a "global file system" is a misnomer.
Don't go down that road... Red Hat's contibutions to Linux absolutely dwarf SuSE's to date in no uncertain terms.
But let's just focus on the most recent efforts of both companies. Realistically no distro is going to include Yast, but it's still a very good move since it will allow SuSE ISO images to be distributed without the existing restricitions in the future and I'm thankful to Novell for it. On the other hand, Red Hat buying Sistina for $31 million and setting their arguably only asset GFS free and then working on including it in the Linux kernel proper directly also benefits Novell and other Linux distributors.
"lately has been locking down their Linux offerings"? How about giving some concrete examples. Last time I checked RHEL was 100% open source and available for download, and so is Fedora Core for the home user. SuSE has been cleaning up their act since they got purchased by Novell, but to play them against Red Hat, who has been completely 100% behind open source since day one, as somehow a more free alternative is laughable.
It's like deja vu all over again.
No, that's to get a supported version. If you would actually read the announement linked to in the story, you'd get a link to where you can get the source code from cvs.
There is also OpenGFS http://opengfs.sourceforge.net/ and Oracle Cluster File System http://oss.oracle.com/projects/ocfs/
These may go away since their major reason for existing was that Sistina had closed up source for GFS.
Thanks RedHat. With LVM2, GFS, my EMC SAN and my cluster of Gentoo boxes (ya, sorry 'bout that part) I'm going to have lots of fun.
Intermezzo and CODA try and solve a different problem (the one AFS does), they replicate data as much as possible without violating coherency and at a file level.
GFS instead gives everyone access to the same disk at the same time rather than replication. Both methods work well for different data sets - so yes GFS and oMFS are similar
I'm also wondering about how GFS compares with OpenAFS. I've looked into OpenAFS, and if you require any type of locking, you shouldn't use OpenAFS (though, I'm not sure about "lock file" semantics with OpenAFS).
o fcntl locks are "always granted" meaning you won't know if anyone else is using any file on AFS.
o It doesn't support hard links between files in different directories because their ACLs are directory based, not volume based.
o Permissions are based on AFS acls, and the standard unix octal permissions will show in file listings, but mean nothing.
o The stable version (v. 1.2) has a 2GB file size (though volume size can be much larger) limit.
o Depending on the length of your file names, you are limited to approx. 64k or fewer files per directory.
o Currently it only supports 2.4 kernels stably, and there is some strife between the OpenAFS and Linux Kernel communities on the implementation. Specifically, they don't like that the syscall table is not exported to modules in Linux 2.6 (they say it is for all other Unix-like OSes...)
o OpenAFS isn't GPLed (it uses an IBM open source license that's GPL incompatible). So, unless there's a rewrite, it won't get into the mainstream kernel. There seems to be some progress made on a GPLed rewrite for the 2.6 kernel, but it is very experemental and only provides some functionality compared to the OpenAFS non-gpl module.
AFS as it is now is best for a distributed file sharing network where locking isn't important. With its one read/write server, and multiple read-only servers per volume it would be a great tool to maintain mirrors (read the OpenAFS docs on what exactly a volume is, and the concepts that suround them). It has great caching concepts, and only sends the parts of the files you request (unlike Coda that must send the entire file before the first byte is available to userspace).
GFS looks like it will be a great tool for the LinuxHA project for active/active pairs of nodes.
Given time, I'm sure GFS will be in all distributions that value cluster admins as users (suse, debian, etc).
There: Something at a specific location.
Their: Owned by someone.
Please make sure your english compiles.
A SAN can be a single large block device. The specifics will depend on the SAN, but you should be able to arrange the disks in any RAID configuration (or none), and present 1 or more block devices to 1 or more servers.
When I manipulate a sector on the disk, the SAN is actually manipulating the same sector on multiple identical drives.
Not necessarily the same sectors, depending on whether we're talking physical or logical sectors, but basically that's correct.
So from this standpoint, it sounds similar to RAID, except for the redundant power supplies.
Well most servers come with 2+ power supplies for fail-over, so even the redundant power supplies isn't different.
From the description, it sounds like SAN has another important difference from RAID. The SAN, redundant power supplies, redundant drives, and all, is a separate system from the computer.
There are disk arrays you can buy that direct attach to computers. These too would be separate units from the computer (benefit: if the computer dies, reattach the pack to a separate computer. A lot simpler than having to remove/insert each disk).
Unlike RAID, which pretends to be a single block device, the SAN can be accessed by multiple CPU's.
Depending on the RAID device, you can configure multiple logical devices across multiple physical devices. Dell's PERC's generally allow this (ok, not across separate disks, but if there are 10 disks, you could have 2 sets of 5 disks in RAID5).
A RAID device can be accessed by multiple CPU's in the case of a 2+ way server. So, you mean multiple servers, not multiple servers.
A SAN can be connected to many servers - 64, 128, 1024, etc, depending on the SAN and your budget.
(Therefore, you don't want to put an ordinary filesystem onto it, such as Ext3.) Therefore the design of GFS, which allows multiple cpu's to concurrently manipulate the filesystem.
Yes, the FS will depend on the use. If you can hookup multiple servers to the same disk, then you need an FS that can handle that. If you are planning on dynamically growing the device, then you need an FS that can handle that.
Do I fundamentally misunderstand?
Parts you understand. A SAN also has many other uses, like disk consolidation, functionality, and management, but these issues and uses will really depend on your environment.
For example, if you generally buy a server with a bunch of disk in case you ever need it, then you probably have a big range of % use on your servers. A SAN lets you consolidate that disk space in one place. Perhaps you have one server running at 30% total disk use, another at 99%, and another at 50%. Would be nice to dynamically allocate the disk from the unused servers to the disk on the 99%, but barring inefficient methods, this is very, very difficult. With a SAN, i can grow those disk devices on the fly and make sure each server always has X amount free (probably around 20% free space). When you're talking about many servers, or lots of unused space, this can add up to a big ROI.
Or, let's say you use a proprietary FS like Veritas for your Enterprise servers. Buying automatic mirroring for those servers may add up to a lot of money, so instead invest once in your SAN's disk mirroring product and use this for those servers (yeah this may be just as or more costly, depending on your SAN).
And there are other functionalities, like server independant snapshots and mirrors - your FS may handle snaps or mirroring, but can a separate server mount that? With a SAN, that can be possible. Imagine your webserver mounts a RO mirror of the data that is only changeable via a more protected server. You could do that via NFS, but it would come at a speed cost. With the SAN, you're not limited to the
The difference is how it tries to solve the problem. NFS works over IP and access files at the inode level. This requires the server system or device to be running RPC and the NFS protocol. Most network filesystems work in a similar way. You have servers and clients accessing the servers via some protocol.
Now imagine a filesystem designed for servers that allows them to access the filesystem at a block level directly via the shared bus. Let's say a parallel SCSI buss (or any bus that allows more than one host, e.g. iSCSI, Fibre Channel, Firewire). Imagine how fast it would be to access a shared disk over Fibre Channel! The problem is that if two servers mount the filesystem at the same time it would normally currupt the filesystem. People with SAN's (Storage Area Networks) solve this problem by making mini virtual hard drives and setting ACL's on them so only one host can access that virtual hard drive at a time. This could lead to a waste of space.
GFS solves the SAN problem by using a Distributed Lock Manager (DLM). No one host is the server of the filesystem, but writes/locks are coordinated via the DLM. Now multiple hosts *can* share a virtual hard drive or real block device and not corrupt the filesystem. If a host dies, no problem, there is no server for the filesystem!
Let's give an example. Say you have a firewire enclosure. Now plug that firewire hard drive into two computers. This, by the way, may still require a patch to sbp so that Linux will tell the enclosure to allow both hosts to talk to it at the same time. Now that the hard drive is talking to both computers you could run GFS on it and access the data at the block level by both systems. Now start serving email via IMAP (load balanced), *both hot*, no standby. Now kill a box. IMAP still works. No remounting, no resycronization.
Pretty amazing if you ask me! This technology is pretty rare. IBM has GPFS. SGI has Clustered XFS. Both are pretty expensive. GFS? RedHat just re-GPL'd it! Microsoft? Ummm. I think they are just now getting logical volume management.
GFS also has nice features like journaling (kinda required for this sorta thing), ACL's, quotas, and online resizing.
Now tell me Linux isn't enterprise!
If your looking for a more falt-tolerant nfs, try combining it with the network block driver (nb) and raid-1 (md). The basic technique is to mirror a volume between a local disk slice and a remote partition, using md to do the mirroring and the nb driver to access the remote slice. If you want, you could also set up several remote slices on different boxes, and set up raid-5 among them.
I am too lazy to check myelf, so I'll ask the collective : does GFS support locking and mmap() ? I am asking because this is a sine qua non condition to run my favorite mail server, Cyrus imapd. Redundant high-availability servers are one of the most asked for scenario. And no, Cyrus Murder don't cut it (it solve a different problem, that is scalability).
:wq
Both GFS and Lustre (since it was mentioned earlier) support mmap. Note that the syscall semantics might be slightly different (I don't know the details). So give it a shot.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON