XFS 1.0 is Released
Isldeur was the first of many to note that SGIs now open source Journaling File System "XFS" has announced the release of
version 1.0. It, Reiser, the new ext format continue to be an area of debate, but regardless, Journaling file systems are nice to eliminate those slow fsck boot ups, and to protect all your pr0n when you lose power and realize that you plugged the UPS into your stereo by mistake (not that I've done that. No sir.)
Yes, XFS is proven on IRIX, XFS is not proven on Linux, Reiserfs is proven on Linux (shipping with SuSE for almost two years now).
Had they placed it under a BSD license, effort that has been put into producing an open and free filesystem could be closed by a company such as Microsoft. Why should I let them profit if they don't contribute or at least acknowledge my work?
As it is, xfs is under the GNU GPL and is thus protected from being made proprietary. The GPL protects the rights of free software authors. Myself, and thousands of other free software developers worldwide, wouldn't have it any other way.
#hdparm -Tt /dev/hda
/dev/hda:
Timing buffer-cache reads: 128 MB in 0.89 seconds =143.82 MB/sec
Timing buffered disk reads: 64 MB in 2.94 seconds = 21.77 MB/sec
Observe the time taken during the buffered disk read test - 21.77 MB/sec. This is on my year-old Athlon system. AFAIK 100mbit networks don't tend to transfer data at speeds faster than 10 MB/sec. Perhaps you meant that you upgraded your home network to gigabit. Or not.
Use hdparm to ensure that your hard drive is set to use DMA.
For those of you looking for comparisons, why not check http://www.softpanorama.org/Internals/filesystems. shtml
which appears to have links to information on a variety of filesystems (most of the journalled FSs under Linux) and even NTFS.
SGI is going to put Linux on their Big Systems(tm) when the Itanium-class CPUs start shipping. They've been planning this for a while now. The current generation of Onyx/Origin boxen are designed with multiple CPU architectures in mind -- e.g. you will be able to have a MIPS system or an IA-64 system just by swapping a single brick.
The eventual plan is to have Linux for the Intel servers and IRIX on the MIPS ones, with IRIX being phased out over a long period of time so as to keep the old customers from getting paranoid. There's even rumors internally about servers with *BOTH* intel and MIPS processors in them running Linux. If you watch SGI's Linux pages, you'll notice that more and more support is made available for running Linux on R10K, R12K and other heavy-duty processors, not to mention SGI's memory architectures (e.g. ccNUMA).
My own theory is that the now-EOLed 320/540 workstations were an experiment to see how SGI's customer base would react to non-MIPS/IRIX workstations and to get everyone warm to the idea of SGI branching out.
SGI is a company to watch over the next few years, and releasing things like open-sourced XFS for Linux are just teasers of what's to come.
- A.P.
--
Forget Napster. Why not really break the law?
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
http://www.kernelnewbies.org/~phillips/
Speaking of journaling filesystems, what ever happend to tux2? Was any code ever released?
tux2 looked really good. Supposed to be faster than traditional journaling, and preserves file data as well as metadata.
Anyone?
(I don't count NTFS, because that is hard-pushed enough to be called a genuine filesystem, never mind a journalling one.)
Feel free to reply to this, adding any that I've missed.
The Logging filesystem does much the same thing as Ext3 - it is an extension to Ext2 - but it looks like it would be a lot more useful than Ext3. IMHO, it'd be much better if neither of them were so FS-specific and could be used as a generic wrapper. SnapFS does exactly this, for example.
Anyway, on with the list of Journalling Filling systems...
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
At this point, SGI has only provided an unsupported Red Hat system installer for XFS. However, there are a couple people in the Linux community who have been working on Debian packaging & installers, and also someone working on slackware. Check the xfs mailing list archives for more info...
FWIW, GRIO and realtime subvolumes (err... partitions in the Linux case) are not yet implemented in Linux.
So one can only patch a stock Linus kernel with XFS?
:) As with any patch, you will have varying success patching source trees that differ from that which was used to generate the patch.
e le ase-1.0/patches/RHlinux-2.4.2-core-xfs-1.0.patch.g z
You can patch whatever you want, it's a question of how many conflicts you need to resolve.
What about the Red Hat 7.1 kernel? Would be nice if there was a patch for that.
ftp://linux-xfs.sgi.com/projects/xfs/download/R
The kernel & userspace utils are packaged several different ways - cvs, patches, tarballs, rpms etc.
Go to http://oss.sgi.com/projects/xfs for the info.
We have a system installer that works with Red Hat 7.1 to do exactly what you're asking about. Grab our iso, burn a cd, boot from it, and you're on your way. You'll need the Red Hat 7.1 cds as well.
The other option, of course, is to have lots of extra space, install your distro, boot an XFS capable kernel, make some XFS filesystems, and copy everything over.
Lilo in the MBR works just fine with XFS. There are no issues, I have 3 machines that boot that way.
A lot of people have been complaining that there is no 2.4.4 patch - but bear in mind that 2.4.4 is only 3 days old. We'd be a bit nuts to release 1.0 on a kernel as untested as that.
On the other hand, the devel cvs tree is usually updated within a few days of a new kernel release. As soon as the kinks get out of XFS+2.4.4, it'll be in the devel cvs tree.
The majority of our 1.0 testing has been done on 2.4.2, so we have the most confidence in XFS there. We also have a 2.4.3 patch which should be fine, although it has not had as much direct testing.
We realize that there are issues with 2.4.2 (loop device, anyone?) If you're concerned about fix-ups, and you run an RPM-based systems, you might take a look at the Red Hat kernel RPMs we offer - those include a ton of patches from Red Hat - essentially the same kernel as shipped with 7.1, with XFS added.
If you're concerned about netfilter, just get the patch - I would be very surprised if it conflicted in any way with an XFS-enabled kernel.
Yeah, but they can make their version of a BSDL'd software incompatible with what's in the open. And if they control 90% of the market, guess who's getting screwed?
For a different scenario, imagine a BSD licensed unix. Now imagine several large corporations taking that great technology and using it. Sounds great, right? Now fast forward a couple of years, and you find that every one of these corporations has expanded on the original BSD licensed unix in a proprietary fashion in attempts to maintain and expand their customer base. Even if most of the corporations would have preferred to maintain their software as BSD licensed, their hands are forced when the first of the corporations starts spitting out proprietary, incompatibly feature enhanced versions. Admins find themselves trapped, having to either understand and maintain several incompatible systems, or going with one vendor and getting gouged for prices. Suddenly, WinNT 3.51 pops up, and although much worse technology, it runs on cheap hardware, costs less, and is far easier to administer.
Sure, you can say the customer should have used the original, still BSD licensed software, but in reality, most customers can't code, and are going to go with the commercially supported software, because the added features and/or lower administrative costs of the commercial software is (at least initially) cheaper then going with the BSD stuff.
Would this be under "currently abandonded" or "abducted by aliens"?
GeoffEG
It seems like one of the nastiest problems when you want to promote a new filesystem is getting LILO, SILO, MILO... to load a kernel image off of the filesystem. What are the issues involved here? Do these loaders really only support ext2fs? If so, this would prevent a user from having a completely journalled system, right? Perhaps there are ways of fixing this (like backups of the /boot partition on a journaled fs) but it would be cool (I think) to have a mini-fsck run on the boot partition before the kernel boots.
There may be issues here; perhaps a MD5 sum or something of the sort might be better to check that the boot partition is uncorrupted. The sum would be checked against... what?
This post is as much an RFC as anything else. Go at it!
Sigmentation fault - core dumped
Not quite. The log/journal is structurally different than the main data areas, with different synchronization and performance characteristics. Writing once to the log and once to the main data area is quite different than writing twice to the main data area.
However, an observation very similar to yours is behind log-structured filesystems. In other words, if you're going to write all the data to the log in a highly robust etc. way, why not just make the log the authoritative copy of the data? There's a whole lot of gunk that has to be worked out after that, such as how you find data and how you reclaim log space, but it all flows pretty cleanly from that initial idea. The result is pretty nifty for some kinds of workloads, but in general changing OS structures and their effects on I/O patterns have sort of left log-structured filesystems behind.
If you're interested in exploring further, the seminal papers in this area are The Design and Implementation of a Log-Structured File System by Rosenblum et al, and (IMO even better) An Implementation of a LogStructured File System for UNIX by Seltzer et al. Enjoy!
Slashdot - News for Herds. Stuff that Splatters.
Interesting. I dunno about the SGI product, but the EMC Symmetrix takes a different approach. It has enough reserve power so that if it detects loss of external power it will immediately flush its cache to special areas on disk. Then, the first thing it does when it comes back up is slurp all that data back into cache - which not only ensures data stability but preloads the cache for you as well. Cool. I've heard that in a simulated blackout in a big data center everything would get eerily quiet *except* for the Symmetrix, which would actually get extra-loud as it does the flush.
Disclaimer: I work for EMC. I don't speak for them, they don't speak for me, yadda yadda yadda.
Slashdot - News for Herds. Stuff that Splatters.
That's an important issue. I'll try to provide a couple of answers.
Well, there are at least two ways:
SCSI gives you other options as well. For example, if you're using tagged command queuing, you can set FUA only on the last command of a sequence (e.g. a transaction). That way, you can allow the disk or storage subsystem to do appropriate reordering, combining, etc. and you'll still be sure that by the time that last command completes all the commands logically ahead of it (as specified by the tags) have completed as well. It's tres cool, and it's one of SCSI's biggest benefits compared to IDE.
Tagged command queuing also comes in handy if you have to force write caching off - which BTW is common and not particularly difficult on either SCSI or IDE drives. Since you're now forced to deal with full rotational latency, the importance of overlapping unrelated operations (by putting them on different queues) becomes even greater.
Tsk tsk, that's a shame. It's pretty common knowledge among storage types, but still far from universal. Go look on comp.arch.storage and you'll see a recurring pattern of people finding this out for the first time and sparking a brief flurry of posts by asking about it.
The problem with having the drive notify the host that a write has been fully destaged is that target-initiated communication (aside from reconnecting to service an earlier request) is poorly supported even in SCSI. Hell, it's even hard to talk about it without tripping over the "initiator" (host) vs. "target" (disk) terminology. Most devices lack the capability to make requests in that direction, and most host adapters (not to mention drivers) lack support for receiving them. AEN was the least-implemented feature in SCSI.
There's also a performance issue. Certainly you don't want to be generating interrupts by having the disk call back for *every* request, but only for selected requests of particular interest. So now you need to add a flag to the CDB to indicate that a callback is required. You need to go through the whole nasty SCSI standards process to determine where the flag goes, how requests are identified in the callback, etc. Then you need every OS, driver, adapter, controller, etc. to add support for propagating the flag and handling the callback. Ugh.
It's a great idea, really it is. It's The Right Way(tm). But it's just never going to happen in the IDE world, and it's almost as unlikely in the SCSI/FC world. 1394 seems a little more amenable to this, but I have no idea whether it's actually done (I doubt it) because even though I know they exist I've never actually seen a 1394 drive close up.
I hope all this helps shed some light on the subject.
Slashdot - News for Herds. Stuff that Splatters.
So does XFS. From one of SGI's own presentations:
[emphasis added]
This is *normal* for a journaling filesystem. Very very few actually log or otherwise protect file data, because of the cost. Maintaining a metadata-only log is already a significant performance limiter, and journaling data as well would just be prohibitively expensive. Most users wouldn't even want it, if they had to pay the performance cost.
Slashdot - News for Herds. Stuff that Splatters.
I have been using XFS on my home machines since v0.9. The installer has had a couple of glitches in the past (0.9 left me without access to the network and my cdrom drive by default). The recent beta fixed a lot of problems and was based on RedHat 7.1 (as opposed to 7.1 betas from earlier releases).
;)
I haven't tried the 1.0 release yet. There's only so many hours in the day. On the other hand, the last install I did with the beta, after installing everything I wanted, I fired up a dozen programs such as Mozilla, GIMP, Nautilus, etc. While the drive was churning, I hit the power switch. For those of you who have used ReiserFS, I'm sure this is no big deal.
It should be noticed that on my Athlon 800MHz w/ 128MB of RAM and a 27GB hard drive, I almost missed the filesystem check as it scrolled by on bootup. That had me sold forever on journaling filesystems.
I haven't seen any visible performance differnece though. There may be, but so much has changed on my system that any subjective comparisons are almost impossible/meaningless. For example, devfs is enabled by default, there's a more up-to-date kernel and the drive has a different partition layout. Who could tell what the FS performance difference may be. I definitely don't need to go back to ext2 just to see if my switchover was justified. Any more info will just be icing.
If someone wants to post "real" benchmarks (lies, damn lies, and all that) I'd love to see them too.
- I don't need to go outside, my CRT tan'll do me just fine.
XFS is still an external patch, it's not included in the official kernel. And it seems that there is a delay between a new kernel release and a new XFS version for that kernel.
XFS 1.0 is against kernel 2.4.2 . Or 2.4.3, but SGI says it may be instable with this version.
But the current kernel is 2.4.4 (or 2.4.4-ac2) .
And 2.4.4 fixes important problems that previous kernels had. For instance, it fixes serious security flaws in Netfilter.
So, today, you can either run XFS, or get a fixed kernel. Not both.
This is why I'll stay with ReiserFS, until XFS get officially included in the kernel.
{{.sig}}
I used it on my companies web servers at my last place. We had millions of tiny files, and EXT2 wasn't cutting it. ReiserFS worked great.
And if you want someone better... SourceForge's FTP site is half on ReiserFS. So it works for them.
I used some of the install disks someone made to install Debian to a 100% ReiserFS system. Does anyone know of any disks to do this for XFS?
The Debian disks are on Freshmeat and work GREAT.
Unlike Reiser, it currently works with NFS.
Yes this was an issue with Reiser, but they have had patches for it since 2.4.2 to work with NFS, and I beleive that full NFS support might be in 2.4.4 (not sure).
Minor quibble, I checked the reference, and that is reiserfs3.5 not 3.6 (the difference is that 3.5 is the linux2.2 reiser, and 3.6 is for linux2.4). When looking at 3.6 results, they appear marginally better, but your point still holds.
Create____203.88 / 171.95 = 1.19
Copy______411.67 / 384.59 = 1.07
Slinks____3.23 / 2.81 = 1.15
Read______1165.61 / 1291.76 = 0.90
Stats_____1.49 / 1.17 = 1.27
Rename____1.81 / 1.32 = 1.37
Delete____14.46 / 3.95 = 3.66
As an aside, it's pretty hard to get much faster than ext2 for this statistic, reading of bulk files greater than 10k less than 100k. You need to weigh what reiserfs gives you against what it could slow down. Reiserfs has truely awesome small file speed, and very nice tail packing.
I dont know how it compares to XFS. But go here to see how ReiserFS compares to Ext2, Ext3. (Hint: it kicks its ass). Add in journaling and you have a killer combo. XFS is a little more industrial strength as opposed to general purpose. If you're streaming gigabyte files and processing them on the fly, I imagine XFS is the way to go.
ext3 is a hack to add journaling to ext2. An ext3 partition is backwards-compatible with ext2, so in a worst-case scenerio you could just mount it as ext2 and lose nothing but journaling. However, the support right now is 2.2 only, and personally, I don't think it's such a great idea to maintain backwards compatibility when so many underlying things change. This will only lock us into any bad compromises that were made in the design of ext2/3.
Well, the biggest difference is that XFS is proven and Reiser isn't yet. XFS has been the IRIX filesystem for something like 6 years now, and the on-disk filesystem format does not change between revisions, even during the development stage. You can even mount an IRIX disk under linux and read and write normally. The only thing in development in XFS were the userland and kernel-space tools. Compare that the Reiser where things tend to change a fair bit much.
Well, thats not _entirely_ true :) There was one absolutley devastating patch in the IRIX 6.2 time frame where 6.2 boot media couldn't boot machines that had the XFS patch applied to it.
:)
:)
:)
That sucked
But i agree with you in general: XFS rocks. We were one of the first XFS customers on irix 5.3, and it didn't rock quite as much back then, but by the time irix 6.2 shipped it was pretty fantastic
I remember reading a post in comp.sys.sgi.{something} from one of the SGI guys... to the effect of "we have XFS doing sustained write performance of 2gb / second here in the lab"
That rules.
My opinions are my own, and do not necessarily represent those of my employer.
its meant to be a hi perf filesystem, from the start.
I mentioned this in another post, but SGI claimed internally to have it sustaining 2gb/sec of _write_ performance across a suitably large number of spindles.
Also, one thing i dont see people mentioning is XFS's support for GRIO (guarnateed rate I/O). No linux filesystem has that, and the linux kernel plumbing to support it i think is SGI contributed (if its on xfs for linux yet, i can't recall).
The idea of grio is an app says ahead of time "i need this much disk performance - figure it out", and the OS will say "yes, i can hook you up" or "sorry, throw more money at the problem".
My opinions are my own, and do not necessarily represent those of my employer.
-dB
"It if was easy to do, we'd find someone cheaper than you to do it."
Its probably still a way to go until its well integrated with the distributions, but I think this FS has potential. Unlike Reiser, it currently works with NFS.
I guess its a race to see which of these will ultimately become the common denominator FS for linux. Reiser currently has the lead, due to Suse and being in the kernel.
Yes, finally! Vince McMahon has come through to give us XFS: the eXtreme File System! Complete with new rules and directory structures, this will appeal to even the most hard-core file system fans. Under the new rules, all crosslinked files WILL be deleted on the spot, multiple programs attempting to write to the same block will be penalized for thirty million clock cycles, and all deletions are FINAL. And just check out the i-nodes on the cheerleaders. I think you'll agree this will be the new pop phenomenon.
Oh. Wait. Journaling file system? oops... never mind.
/* Steve */
"Every jumbled pile of person has a thinking part that wonders what the part that isn't thinking isn't thinking of"-TMBG
What's the big diff (pun intended) between Reiser and XFS? Which is better? (I realize that this may start a holywar, but I want the brief synopsis and analysis since I'm not a sysadmin.)
Thanks
Some basic info and a couple of links for folks:
- file system - basic defition -the general name given to the logical structures and software routines used to control access to the storage on a hard disk system. Operating systems use different ways of organizing and controlling access to data on the hard disk, and this choice is basically independent of the specific hardware being used--the same hard disk can be arranged in many different ways, and even multiple ways in different areas of the same disk.
- Journaled file system - Basic definition (as seen here)
- IBMs JFS webpage on their system, along with links for for downloads and turtorials online,etc
There is an awfull lot of info at the SGI site. Just poke around.A file system in which the hard disk maintains data integrity in the event of a system crash or if the system is otherwise halted abnormally. The journaled file system (JFS) maintains a log, or journal, of what activity has taken place in the main data areas of the disk; if a crash occurs, any lost data can be recreated because updates to the metadata in directories and bit maps have been written to a serial log. The JFS not only returns the data to the pre-crash configuration but also recovers unsaved data and stores it in the location it would have been stored in if the system had not been unexpectedly interrupted.
As far as the question about how to choose file systems, that is often a matter of what the OS will let you get away with, and your needs. Using FAT 16 is recommended if you need to maintain compatibility with MSDOS, for example. Usually, this is something like if you have a multi boot scenario, and which OSen can mount which partitions with which partitions. MS is notoriously picky in this regard, with a "My way or the Highway approach". For example, if you have a single hard drive hooked up to your computer for configuration purposes, You cannot just create anextended partition unless that drive is a salve with another master. If you want to create just an extended partition it will not permit, and tell you that you can only create a primary dos partition instead.
So you Live and you Learn
Check out the Vinny the Vampire comic strip
"It is a greater offense to steal men's labor, than their clothes"
Be wary of statistics...
For example, picked pretty much at random from the mongo results, Linux-2.4.2 Ext2 vs. ReiserFS-3.6:
parameters:
files=15168
base_size=10000
bytes
dirs=86
Create 203.88 / 187.01 = 1.09
Copy 411.67 / 411.28 = 1.00
Slinks 3.23 / 2.99 = 1.08
Read 1165.61 / 1325.27 = 0.88
Stats 1.49 / 1.48 = 1.01
Rename 1.81 / 1.30 = 1.39
Delete 14.46 / 5.64 = 2.56
So the total time of the test is 1802.15 / 1934.97
= 0.93. (i.e. Reiser is 7% slower performing the whole test.)
I don't care if they make the thing that takes a tenth of a millisecond twice as fast, it's the reading of the bulk of the file that takes the most time, and for that part, Reiser is slower.
However, each individual has got to look at what is most important for them, for me it's 99% file read time on medium to large files (30K source code, 200K log files, that kind of thing), and judge accordingly.
FatPhil
--
Also FatPhil on SoylentNews, id 863
So what is one of its strongest strengths over the other journaling fs's?
Time tested reliability.
Finnaly, 2.4.4 was released, and it is fixed: it's the first "stable" kernel in the new series.
I never read a single bad review of ReiserFS until I actually used it--it worked "flawlessly" for everyone who had tried it. I didn't find out that it had these problems, and that it doesn't work over NFS, until it was too late.
The thing I learned is that when things--especially filesystems--claim stability, the user still has to test things out for himself.
ReiserFS is a good filesystem; don't get me wrong, but it may not be the best for you. (In fact, Red Hat does not plan to use ReiserFS in its distribution, because in the event of filesystem failure, it is near impossible to recover the filesystem with standard tools.)
I have used XFS in the past on Irix machines and have been very happy with it. But be careful before you deploy this filesystem--even on your home machine--without thoroughly testing it. And not simply creating two files and saying, "Hey! They're still there! I guess it's stable." I fell into that trap.
I would highly reccomend anyone running the 2.4 kernel to upgrade to at least 2.4.4, especially if he uses IDE or ReiserFS.
If you're going to use XFS, test it first.
By the way, does anyone know what's going on with moderation? I've had mod points three times this week, and there are a huge amount of +5 comments.
Got friends?
My favorite utility of the xfs distribution. Where else could you find so much joy about a program that does nothing?
If you watch SGI's strategy, they seem to be moving towards that direction. They are keeping Irix around primarily because Linux isn't ready and there aren't any good 64-bit processors out that fit with their business model other than a MIPS, right now.
I mean, think about it. Why bother writing all of the kernels and utilities when you can have the hackers of the world pick up the slack? SGI can't put as many developers on Irix as MS can put on Windoze. So they are developing Irix only for the MIPS machines and keeping Linux for their Intel machines.
And the strategy is pretty evident. They have been very supportive of good OpenGL under Linux. They have XFS, clustering software, etc. All of the Irix advantages are getting ported over.
The problem is that they haven't been able to move over to the Intel platform properly. Their first attempt was a fiasco. The Onyx 3000 series was designed to be a transitional system. It can work with either a MIPS processor or an Itanium. But the Itanium delays are making that hard. And, unlike the desktop workstations, you can't stuff a Pentium 4 in a Onyx because you need 64 bit addressing to make their NUMA architecture work -- each processor gets a piece of the address space. With a 512 processor Onyx 3000, that makes 8 megs of RAM per processor. So Intel is holding up SGI's full migration to Linux.
Now, as far as the stability of SGI, I'm not entirely sure. They are still bleeding money, and at a faster rate than last year, too. Given the downturn in the tech economy, they are going to be hit with it, too. It's very shakey.
Gentoo Sucks
I wonder if folks over at SGI plan on dropping Irix in the near future for Linux entirely. As it stands right now the majority of their hardware run Linux, and the last version of Irix released was to mainly fix bugs.
Its a shame that SGI has done pretty poor the past few years, when they're such kick ass machines, and personally I think they should kick the marketing teams asses.
I know previously they've used a customized version of Windows exclusively on their 320/540 servers, I guess they changed em all around to avoid fireselling them at crackhead prices. Maybe someday I'll see a BSD running on an Irix machine to see how it would run in comparison to Linux (don't bother to troll this post this is not an OS war-penis-envious-linux-vs-bsd-post) as far as benchmarking is concerned. As for XFS support I though it was supported for reading and not writing? Oh well I don't use Linux anymore
360 degrees of Karma
In addition to the overhead, you also have to deal with the risks to your data from the fact that both the file system code itself is more complex and that utility programs and administrative tools may do the wrong thing with journalling file systems.
Altogether, I think you are better off with a RAID and a UPS; unless you have some serious failure, that will pretty much avoid the need for running fsck. If you have really critical needs, you will want a hot backup system that you can switch to if your primary system goes down anyway; that takes care of a lot of other problems and also lets you spend however much time you need on fsck.
(As an aside, fast reboots can't have been a driving factor for JFS on AIX: while JFS may have spared people the time for an fsck on reboot, many AIX server machines spent minutes or hours (!) scanning their SCSI buses on each reboot. I think many people who use journalling file systems don't do it because they need it but because it sounds "safer".)