First Journaling FS for Linux

XFS, reiserfs, ext3fs by Laven · 1999-11-06 14:49 · Score: 3

It looks like there are a lot of questions about other journalling filesystems. I'm no expert on these things, but I have spent quite a bit of time following all three projects and I've read through all available documents on the three filesystems. Here's what I understand of the three.

XFS
Originally made by SGI for their IRIX OS, XFS is one awesome filesystem. Read this white paper (http://www.sgi.com/Technology/xfs -whitepaper.html). This white paper describes all of its cool features. The main features of XFS make it a super scalable, very reliable, ultra fast journalling filesystem utilizing many cool FS technologies like B-trees and other cool stuff.

Unfortunately, it seems that currently there are many problems with the Linux implementation of XFS. I don't know any details of this, but I guess it is safe to say that XFS will some day become available for Linux. This would be great.

ext3fs
I've only read about this in the linux mailing lists. ext3 appears to be a standard ext2fs implementation with journalling data, allowing backward compatibility with ext2, although one of the authors hinted that they may not make it backwards compatible in some later version. It is currently in super early alpha testing and definately not anywhere close to usable, stable and reliable.

In my opinion this project is very new, and holds much promise. From their README, they appear to be done basic journalling code, and what remains to be done is error handling contingencies, metadata only journalling, performance tuning and lots of other coding. As a result, it may take some time but this could hold much promise and give another viable option for a journalling FS for Linux. Choices are always good.

Ext3 Site - ftp://ftp.linux.org.uk/pub/linux/sct/f s/jfs/

Reiserfs - http://devlinux.com/namesys/
I've been following reiserfs for a few months now. Its actually been available for quite some time now as a very stable, reliable and quick filesystem for Linux, but it was only recently when journalling was added to the code. Apparently this new addition is supposed to make it faster.

In "releasing" reiserfs, SuSE doesn't mean that it is the first journalling filesystem for Linux. It is the first journalling FS for Linux to be dubbed reliable and suitable for normal use. This is great as journalling has long been a stumbling block for enterprise adoption of Linux. Alan Cox hinted that he may include reiserfs in the standard kernels soon. Excellent =)

Warren Togami
warren@togami.com

How to migrate filesystems by heroine · 1999-11-07 00:03 · Score: 2

Well I'm ready to either change filesystems or increase the block size on my ext2 partition but there's one problem: I've got 15 gigs of data to migrate, a CD recorder, and no money to invest in 15 gigs of tape storage. What strategies are most often used to migrate filesystems?

Filesystem Being DBMS by Christopher+B.+Brown · 1999-11-07 00:24 · Score: 2

Yes, I certainly agree that ReiserFS intends to be a DBMS of sorts; you can use a filesystem to do so where:

A directory can represent a table
Each file in that directory is a record in the table
An "index directory" can contain symbolic links to the records
Things can hierarchicalize as needed

One goal of ReiserFS is to make this practical even for small records by providing ways of efficiently storing hordes of tiny files.

But that's a separate issue; that requires that someone create something like a ReiserSQL, a database server that maps SQL queries onto file requests on a filesystem.

The journalling issue discussed in the top level posting implicitly regards journalling as being important for conventional RDBMSes like Oracle, Sybase, DB/2, PostgreSQL, where the model you are suggesting (and which probably is not too unlike what I outline above) does not apply.

--
If you're not part of the solution, you're part of the precipitate.

Re:Daaaaamn! I sure hope this works well. by aqua · 1999-11-06 07:07 · Score: 2

FWIW, Alan Cox did mention that he was considering merging reiserfs in with the stable kernel -- it was in the fairly short list of big-thing additions in the "maybe" pile. It would be nice to see it get into the stable tree, assuming that it's sufficiently rock-like to permit it.

sgi's xfs? by sgtron · 1999-11-06 07:07 · Score: 2

so what about sgi's xfs? i thought that was going to be linux's savior for a journaling file system. Are we to expect this bundled in any distro's any time soon? will it replace ext2fs?

--
No todo lo que es oro brilla

Re:sgi's xfs? by CelestialScum · 1999-11-06 07:37 · Score: 3

The difference between the two are more of an academical than user-related issue, as it is basically in the way they are built up. As far as journaling goes, they are both up to the task.
I do not know if ReiserFS is a true 64 bit one, handling the files as big as the XFS does, but a quick and dirty look at the two FS's homepages should yield a lot more info on this.
XFS and ReiserFS is not going to replace ext2. Actually, ext3 is, which will, when released, also be a journaling FS (from what I heard).
Maybe someone could provide the right urls or more info on this than I can. I believe in time, they will all be included into the kernel, and you can choose your preference based on your needs. In the meantime, make a small partition, insmod the module and mount the drive and play with it I guess :)

Only one question? by GreyFauk · 1999-11-06 07:12 · Score: 2

I see that some updates have required re-compile
and re-format of existing partitions to use now
source... *ouch*

Any ideas on how many more times this may happen before the beta-testing is done??
(Hmm... got first post earlier and didn't even try. :> Nice way to start the evening. *grin*)

--
Friends don't let friends buy Compaq's. (Dell/Gateway... same same) You want a good computer? Build it yourself.

Wow. by Parity · 1999-11-06 07:12 · Score: 2

That came out of left field... all the hype has been about xfs, and now this.

I wonder, though, how GPL purists are going to react, since their business model is to be GPL but sell GPL-exceptions to some companies.

I suspect that the project will quickly fork into Reiser-FS-commercial and Reiser-FS-pureGPL as soon as a contributor refuses to license a GPL-exception.

I wonder if anyone here has heard of this before? Beta-tested it? Maybe I'll try it tomorrow. (I want to keep my machine running tonight, so I can't very well replace the fs. :))

--Parity

--
--Parity
'Card carrying' member of the EFF.

FYI: NameSys FTP archive by MrHat · 1999-11-06 07:14 · Score: 2

The NameSys FTP Archive, which houses the reiserf files, is located at:

http://devlinux.com/pub/namesys

If you grab the sources from the site, the README.FIRST file says to:

Apply linux-2.2.11-reiserfs-3.4.gz to pure linux 2.2.11 with `zcat linux-2.2.11-reiserfs-3.5.gz | patch -p0`
Do 'cd /usr/src/linux/fs/reiserfs/utils; make dep; make; make install' to make the utilities.

Re:FYI: NameSys FTP archive by clmason · 1999-11-06 09:29 · Score: 2

Sorry for the confusion here, I'll ask them to change the README. These instructions will get you the non-journaled version of the ReiserFS. From the ftp site, the patch you want is:
linux-2.2.11-reiserfs-3.5.5-journaling-beta.gz
This is the most recent code, even though it is not in beta any longer. The journaling portion of the ReiserFS site has links and more information:
http://www.devlinux.com/projects/reiserfs/jrnl
-chris

Re:Hmmm... What about the *BSDs? by Chris+Mikkelson · 1999-11-06 19:16 · Score: 2

You have absolutely no idea what you are talking about.

There is an old, deprecated "Log-Structured FS" in the FreeBSD source tree. Nobody's interested -- log structured FS'es generally have atrocious read performance, because they cannot lay out files for faster read performance like FFS (and I assume ext2fs) can. McKusick has nothing to do with this, and is not very interested in this approach either.

The related journalling filesystems add an extra disk write for every single update operation, making them somewhat slower than the normal filesystem that the journal augments. The journalling technique is, however, conceptually quite simple. Since the extra data structure (the journal) is only used during FS recovery, at least it only wastes disk bandwith during normal operation.

OTOH, soft updates makes a different trade-off: it saves the disk bandwidth, but takes up CPU time and memory. Since CPU's and memory systems are always going to be much faster than magnetic disks (for the forseeable future, anyway), I think this is a better tradeoff.

And SU *does* leave the filesystem safe to mount after a crash. The *only* inconsistencies that can occur are:
1) unused data blocks not marked free.
2) inodes with too high of a link count.
These can only result in wasted space, nothing more serious. McKusick is working on a background fsck (using NetApp-style FS snapshots) for FFS, so that fsck can basically be run at anytime during system operation (i.e. the FS doesn't have to be unmounted or r/o mounted).

Oh, well, not that it matters -- this is slashdot, and I fully expect any reply to be "Linux rulez!"
The bias I see running through this thread is that "Linux has it, so it must be great" and "BSD doesn't, therefore it must be necessary," so "let's bash BSD on technical grounds -- we can almost never do that ;-)"

In reality, SU and journalling are radically different approaches to solving the same problem. They both add *extra* complexity to async writes -- that is, they are not performance tweaks! They are techniquest that try to retain *part* of the performance of async, while adding crash-resistance.

--
-Chris

Re:These stats ARE FISHY by Amphigory · 1999-11-07 10:15 · Score: 2

Disclaimer: I have never looked at the ReiserFS code, nor am I significantly familiar with it or Ext2 internals. The following is rampant speculation of the worst kind and should be ignored.

Having said that, I can think of a couple of reasons why, given the stated design goals of rfs, it would not perform well on those tests. Basically, the performance ( O(n) = "big O" ) of an algorithm can be measured as it varies on the size of data points.

Now, let's suppose that ext2 uses sequential scans to get directory entries (I'm fairly sure it does). The O() of a sequential scan is O(n)=n. That is, the time required to perform the scan for n elements increases linearly.

The time for a B-tree based filesystem would increase according to O(log2(n)). The curve on this one is /worse/ for small values of n, but much better as n grows larger. Try graphing x=log(y) to in gnuplot to get an idea of what this would look like.

In other words, you may not have had enough items in a single directory to experience the benefits of RFS. I would be interested in results with say 10,000 items in a single directory, or better yet 10,000 directories in a single directory with 10,000 one byte files.

That (as I understand it) is really the kind of grueling stuff that reiserfs is designed for. Nor is this without application. On one of the boxes where I work, we have > 70,000 elm email folders, each stored under "customer_name/email". A simple "ls" takes an hour! Granted this is a boneheaded design (that I didn't do), but the point remains.

--
-- Slashdot sucks.

Re:Can someone explain this by aqua · 1999-11-06 07:16 · Score: 2

Clarification from the Multi-Disk-HOWTO:

These take a radically different approach to file updates by logging modifications for files in a log and later at some time checkpointing the logs.
Reading is roughly as fast as traditional file systems that always update the files directly. Writing is much faster as only updates are appended to a log. All this is transparent to the user. It is in reliability and particularly in checking file system integrity that these file systems really shine. Since the data before last checkpointing is known to be good only the log has to be checked, and this is much faster than for traditional file systems.
Note that while logging filesystems keep track of changes made to both data and inodes, journaling filesystems keep track only of inode changes.

Hmmm... What about the *BSDs? by bgarcia · 1999-11-06 07:19 · Score: 3

I noticed that this code is released under the GPL. That means that the *BSD folks can't just take the code and incorporate it into their OS's.

There is a clause in the license that states that if you contact them, they will let you use it under a different license. But I can't imagine them putting it under the BSD license. It sounds like they want to control who can use it, and they've decided that GNU projects and commercial entities who pay are their target market. If they ever release it under a BSD license, then commercial entities could just grab the BSD-released copy and work from there.

Will the BSD's simply miss out on this nice new filesystem?

99 little bugs in the code, 99 bugs in the code,
fix one bug, compile it again...

--
I'm a leaf on the wind. Watch how I soar.

Re:Hmmm... What about the *BSDs? by Graymalkin · 1999-11-06 08:49 · Score: 2

But the free BSDs (FreeBSD, OpenBSD, NetBSD) all AFAIK use their own file system that works alot better than ext2, although I don't know if it is journaled. The FreeBSD people don't need to worry much about "missing" GPL software, FreeBSD and the like will run most if not all linux binaries and code is simple to port.

--
I'm a loner Dottie, a Rebel.
Re:Hmmm... What about the *BSDs? by Fizgig · 1999-11-06 09:23 · Score: 2

If this File System is a good thing and can be integrated with *BSD

It can't. It specifically says in the readme that you can't use it with a kernel that's not GPL'd without the authors' permission. That would generally be the case with GPL'd code anway, though (you don't link gcc with the FreeBSD kernel, so it's ok; but you can't take video4linux, which is GPL'd and in the kernel, and include it in the FreeBSD kernel).

Journaling File System by keytoe · 1999-11-06 07:21 · Score: 4

This would be a huge boon to those of us trying to truly break free of the commercial unices. I've had to put together quotes for enterprise quality database solutions before and there have always been a couple of hurdles to get past when considering an Intel/linux based system.

PostgreSQL works wonderfully with large data sets, but lacks the ability to do hot restores. I'm eagerly awaiting that one... Now that it does a much better job with concurrant locks, that's my only real hesitation at this point.

SMP has come a long way in a short time with linux, but is still a bit lacking. This makes it difficult to settle on Intel hardware - sometimes, you just need Raw Horsepower. I'd like to get there without having to buckle down and buy a Sun or HP box. I'm not worried about this one - things are coming along quite nicely...

Now, my last concern was journaling filesystems - and it looks like it's coming at long last! I was excited when the initial announcement was made, but now that the code is out (and Alan is even considering merging into the stable branch!), I'm all gushy inside! Let's hear it for our team!

I've watched this whole linux thing start out as a 'hobby OS' and develop through adolescence into what is becoming a damned serious contender with the big boys. Sure, they're baby steps at the moment, but at this pace, they add up right quick. God, I love this industry - never know what's gunna happen next. Who knows - maybe the government will sue Microsoft for anti-trust violations next. Oh... right...

--

Culture is more than commerce

Selling GPL Exceptions by crow · 1999-11-06 07:23 · Score: 4

This is not the first time software has been released under this model. My understanding is that this is how RT Linux was released.

[The idea of RT Linux is to put a small real time kernel underneath Linux. This kernel handles the real time tasks, and schedules Linux when a real time task doesn't require it. It also provides a communication mechanism between Linux processes and real time tasks.]

So the RT linux kernel could, in theory, be used without Linux (perhaps with another OS instead) to provide real time services. The author has carefully retained the copyright to his code, so he can sell it under a non-GPL license if someone wishes to incorporate it into a commercial project.

I'm not aware of any non-GPL licenses for RT Linux, but the model is there.

The main thing that helps make this model work is that the copyright holder controls the distribution. That means that in order to get your changes into the official releases, you have to resolve any copyright issues. It only breaks down if there is a significant dispute and someone is willing to go to the effort to start a separate distribution. Of course, if they get the file system into the main Linux distribution, that action will trigger a fork in development.

Deletion times by heroine · 1999-11-06 07:26 · Score: 2

How does this new filesystem compare to ext2fs on deletion times. For starters here is what a typical deletion ext2fs takes:

heroine:/home/mov% l *.mov
-rw-r--r-- 1 root root 1958135327 Nov 6 17:49 xena1.mov
heroine:/home/mov% time rm xena1.mov

real 0m56.536s
user 0m0.000s
sys 0m0.920s

Even a 30 second deletion time would be great.

Re:Deletion times by Jeff+Mahoney · 1999-11-06 07:46 · Score: 4

There is a semi recent benchmark vs ext2fs at http://name sys.botik.ru/~yura/benchmarks/journal_227/ext2_vs_ jour9.html

Chris has the office next to mine and has been showing me these benchmarks just about every day - they improve just about every day.

-Jeff

Re:ACL (Access Control Lists) by jd · 1999-11-07 18:16 · Score: 2

Try ACL-Posix or Trustees. Both implement ACL's for Linux.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Some benchmarks by Magus311X · 1999-11-06 07:29 · Score: 2

Though benchmarks aren't everything, they're always nice to look at.

Here's the linkage:
http://devlinux.org/namesys/bens.html

--

Wasn't ext3 first? by tap · 1999-11-06 07:35 · Score: 3

The ext3 journaling filesystem has its first beta a few months ago. It does't require you to reformat your existing ext2 partitions to convert to ext3. And an ext3 filesystem can still be used as an ext2 filesystem, you just need to update the journaling information if you go back to ext3 after using it as ext2. Read more about it at Stephen Tweedie's ext3 site.

Re:Wasn't ext3 first? by mjg · 1999-11-06 08:22 · Score: 2

They mean ReiserFS is the first "stable" journaling FS for Linux. You are quite correct in saying that ext3 was "first", in that it had journaling before ReiserFS (at least, ext3 was publically available with journaling before ReiserFS was, that I'm aware of), but it's a fair way from being considered stable just yet.
Having just looked at ReiserFS's site, it seems either they haven't updated the site yet, or they consider beta == stable, since I could only find the beta release of the code which has journaling.

Re:Beware the Jabberwock! by jms · 1999-11-08 01:18 · Score: 2

Well, first off, you're probably right not to switch over immediately for anything mission-critical. Every new program has bugs that need to be discovered and fixed, and this will be no exception.

I don't agree that journaling FS's are a buzzword, or a fad, though. When they work, they work extremely well -- and invisibly. A good example of a solid, robust journalled filesystem operating system is IBM's AIX. AIX uses the journalled filesystem for everything, including the root partition, and based on my many years experience with these machines, system crashes simply don't break the filesystem.

However, journaling filesystems aren't the end-all. There's still a significant feature set missing from unix filesystems ... and that's the concept of work units with commit/rollback.

It works like this ... you want to make a bunch of changes to a bunch of files, all at once. However, if the system were to crash while you were in the middle of making these changes, your data files would be in an indeterminate state.

If you had a filesystem with work units, you would start by making a system call to open a work unit, then make your changes. When you are finished, you either make a commit system call, or a rollback call. If the commit ends with a success return code, then all of the changes are guaranteed to be made. If an error occurs in the commit, or you make a rollback call, all of the changes in that work unit are backed off. If the system crashes before you make a commit/rollback, all of your changes are backed off when the system reboots. This gives you fine-grain control over how data changes are made to files in your filesystem. Once you've tried it, you'll never want to go back.

This is a standard database programming technique, but moving the functionality into the operating system gives you a huge programming capability. It lets you write programs with database-grade data integrity as a matter of course, without requiring that you program against a database API.

I was skeptical as to the value of commit/rollback for ordinary filesystem programming, until IBM included them in it's then-new SFS filesystem on VM. Now I consider it one of those great things that will probably take years for the rest of the world to discover and implement.

- John

Selling GPL exceptions by Per+Abrahamsen · 1999-11-06 07:53 · Score: 2

Alladin used to do that with Ghostscript. RMS had no problems with this business model, and I haven't heard of anyone who wanted to fork the project for this reason. It would also be silly, as you would have to remerge the enhancements from the trunk back to your branch.

With Ghostscript the GPL was not restrictive enough. Proprietary software would simply call the gs executable in a separate process. That is why Alladin eventually switched to a more restrictive license. Namesys should have no such problems, you can't run a filesystem stand-alone.

Re:What IS a journaling file system? by RelliK · 1999-11-06 08:06 · Score: 2

Non-journalling file system (a la ext2, fat, etc. ) must be properly unmounted on shutdown. If it's not unmounted cleanly, it needs to be checked for errors, since it has no idea what happened just before the crash/power failure/whatever.

Journalling file system keeps track of all the changes as they occur. So, even if it's not unmounted before shutdown, it can easily determine what was modified and deal with it as appropriate. So, for example, if you kick a power cord by accident, you no longer need to wait for 5 minutes while fsck scans the file system.

High-end data warehouses have file systems measured in terabytes. You *definitely* don't want to wait for fsck there...

--
___
If you think big enough, you'll never have to do it.

Paranoid, aren't we? by Millennium · 1999-11-06 08:12 · Score: 2

Look. I'll admit that the "Russian-made" line seems to have come from absolutely nowhere. But just because you don't immediately understand something is no reason to immediately and irrationally assume the author meant to defame anyone, much less an entire nation. This said, I'd like to know what he meant by it myself.

And Suse is funding this.. cool by Ami+Ganguli · 1999-11-06 09:03 · Score: 3

So Redhat pays for Alan (and Gnome?), Corel supports WINE, and Suse pays for file systems.

Open Source has always been good at producing excellent, relatively small and self-contained components. We haven't been so great (with a few very notable exceptions, the kernel being one) at producing large projects. If it's a lot of effort with no quick return, the coders get tired of it.

Now the commercial companies are funding the big stuff in an attempt to gain mindshare ("we must know what we're doing, we've got Alan"). This really complements the existing strengths of Open Source.

--
It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail. - Abraham Maslow

Two clarifications by A+nonymous+Coward · 1999-11-06 09:03 · Score: 2

1. This is not about porting ext2 to BSD, but a journaled fs. So comparisons to ext2 are meaningless.

2. Running linux binaries has nothing to do with this either, unless someone wants to make a user space version of RFS, and BSD can support it.

--

--
Infuriate left and right

These stats ARE FISHY by A+nonymous+Coward · 1999-11-06 09:12 · Score: 2

3:56.82 -- call it four

Divide by 3:07 -- call it three

Both roundings favor resiserfs, yet the ratio is said to be 1.56. I don't think so.

Look at the rm -rf * stats -- ration is claimed to be 10.1, yet it's a lot closer to 7.

What hope is there for the numbers themselves?

--

--
Infuriate left and right

Re:These stats ARE FISHY by Fizgig · 1999-11-06 09:43 · Score: 2

Ok, some independent things:

I copied /usr/local to new ext2 and reiserfs partitions on a brand new harddrive. First thing of note, from df:

/dev/hdb1 7823372 442980 7380392 6% /newdrive
/dev/hdb2 5283091 410343 4599242 8% /ext2

Newdrive is the reiserfs one. They contain the same data, but the reiserfs one is 30MB bigger.

Now for some stuff.

Running find . -exec wc {} \; on an installation of StarOffice on the Reiserfs one gives:

9.94user 21.02system 0:53.46elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (177778major+32241minor)pagefaults 0swaps

On ext2:

9.78user 17.41system 0:50.85elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (205151major+32581minor)pagefaults 0swaps

Reiserfs loses. Just one data point, however. And I may have it set up wrong. For reference, the reiserfs has lower cylinder numbers.

Re:What IS a journaling file system? by QuMa · 1999-11-06 09:24 · Score: 3

This sort of covers it:

http://collective.cpoint.net/lfs/ what_lfs_is.html

initrd anyone? Module-ness doesn't prevent XFS / by Mr+Z · 1999-11-06 09:37 · Score: 2

You could boot from an initrd RAM Disk, load the XFS module, and then remount your root partition from an XFS partition on your hard-drive. After all, this is how RedHat kernels allow you to have your root partition on a SCSI drive, yet still have all of the SCSI devices built as modules.

Indeed, just this sort of technique can also be used to handle a ReiserFS root partition that needs to be fsck'd, by having the boot routines in the RAM disk image do the fsck if necessary. Strikes me as a bit more fragile than what I'd care to deploy in a mission critical setting, but....

--Joe
--

--
Program Intellivision!

Re:Aladdin Ghostscript vs ReiserFS by Per+Abrahamsen · 1999-11-06 11:25 · Score: 2

I think you confuse some issues. Alladin has had _two_ business models. The old was to use GPL and sell exceptions. The new is to use a more restrictive license for new versions, sell exceptions, and release old versions under the GPL.

That some printer manufacturers didn't want to obey the GPL was not a problem for Alladin, it was a feature. It meant these manufactures would want to buy an exception from Alladin. When they buy an exception, the GPL become irrelevant to the customer. The problem was developers of "postscript enabled proprietary applications" who _hadn't_ any problem with the GPL, because they didn't link with gs, they just used it as a standalone program. They would not pay Alladin, instead they would distribute the source to gs. The new Alladin license was designed to prevent this.

Namesys will not need to change their license, because their potential customers will not be able to use a similar loophole.

Aladdin Ghostscript vs ReiserFS by Mr+Z · 1999-11-06 10:16 · Score: 3

RMS had no problems with this business model, . . .

Actually, I hear that he's not thrilled with it. Indeed, one of the biggest problems that I can see is that there is very little incentive for people to improve the existing GPL version of Ghostscript when they know that Aladdin has (a) already improved Ghostscript in the current commercial version, and (b) will be releasing their changes 'soon' (after one year). This interview with Ghostscript's author Peter Deutsch sheds more light on the situation, including Stallman's thoughts.

One result is that the GPL community is almost guaranteed to always be one year behind the latest in Ghostscript technology, unless someone gets up enough nerve to fork Ghostscript development and try to get ahead of Aladdin.

With Ghostscript the GPL was not restrictive enough. Proprietary software would simply call the gs executable in a separate process.

Part of the problem here is that the Aladdin folks try to license their code to printer manufacturers, etc. The printer folks aren't too keen on having to ship Ghostscript on demand to anyone who buys a printer. Also, if the printer folks make any platform specific changes (which undoubtedly they will, such as specific driver technology for running the print engine), they'd have to distribute those changes, and most aren't willing to do so.

Also, more importantly, Peter Deutsch doesn't seem too keen on having people ship Postscript-enabled printers by using his work for free (as in gratis).

The upshot: Aladdin offers their latest and greatest Ghostscript with a commercial license.

With ReiserFS, I'm sure a similar but not identical set of considerations exist. People building embedded or mission critical systems on an otherwise proprietary base might license ReiserFS for their application without introducing any questions as to the effects of GPL. At the same time, a GPL version is available for everyone.

The difference here is a bit subtle but important. Namesys appears to be releasing the latest and greatest ReiserFS under GPL, rather than imposing an artificial delay. (Whether or not this changes in the future is unclear, but for now it is an important distinction.) In this case, the commercial license seems to be a means for companies to buy an "unencumbered" version of ReiserFS for their own purposes. (By "unencumbered", I mean free of the implications of GPL.) I see this potentially as a way to keep both camps happy. Maybe. (Except, of course, RMS.)

--Joe
--

--
Program Intellivision!

Important, but likely not for DBMSes by Christopher+B.+Brown · 1999-11-06 12:43 · Score: 5

I hate to take issue with a well-spoken posting, but journalling is not of primary usefulness for helping support High Availability RDBMS systems.

The main effect of journalling, the thing that is really important about it, is that it guarantees that metadata updates are kept consistent. That is, journalling is primarily supportive of making sure that filenames, directory structures, permissions, and such are kept consistent even when moderately catastrophic things happen.

This is a really good thing when supporting file serving activities, as that indeed tends to involve lots of manipulations of files as users shift them around.

I've been on the ReiserFS mailing list since '97; have been running a personal news spool on a small ReiserFS partition for probably 6 months. I can't tell for sure if the journalling now available is metadata-only, or if it also journals normal data updates. It looks rather more like metadata-only, which is useful for file-server work, but not so much for RDBMSes.

Databases behave in quite different ways from file servers in terms of the way they do file access.

If you look at most RDBMSes, they create a few files, and do lots of manipulations on top of them. Informix SE is a counterexample, basically using Informix C-ISAM underneath, but is unusual in that regard. If you look at the database partitions, you get one of two things:

Partitions containing a few very large files.
Note that for these, the metadata is very static which means that journalling of metadata is of relatively little importance.
Partitions containing no filesystem, but rather raw data being managed by the RDBMS.
Don't just believe me; I am not the ultimate authority on this. Transaction Processing : Concepts and Techniques is a rather definitive reference; it discusses methods of managing transactions in the context of database management systems, and goes into considerable detail discussing transaction logging, which bears striking (and not merely coincidental) resemblance to journalling.
The critical point here is that it is the database manager that wants to manage the logging/journalling; Oracle and Sybase and IBM and Informix will be loathe to pass on responsibility for this to Hans Reiser, wonderful guy though he is.

Conclusions

Sorry, I have to disagree with you on ReiserFS being of fundamental importance to those doing serious database work.
What will be of fundamental importance will be when Stephen Tweedy's Raw Device Support gets integrated into the "production" kernels. That is what Oracle is looking for (consider: Oracle has pumped some funds into RHAT, and RHAT is paying Stephen Tweedie... Could there be some connection?)
Journalling IS important for sorts of applications that manipulate lots of files, which includes things like dynamic web serving and file serving.
Even if this isn't such a boon to those doing serious RDBMS work, it can still be a boon to lots of other folks...

--
If you're not part of the solution, you're part of the precipitate.

Not Even Close by Christopher+B.+Brown · 1999-11-06 12:50 · Score: 3

I've got a filesystem that has been using ReiserFS for probably 6-8 months now, and Hans has been working on it since at least July 1997.

"Who was first" isn't all that important; it should be noted that there is considerable communication between the development groups, and there are conscious efforts ongoing to make sure they build facilities that will be useful across the board:

The ReiserFS folks have been doing BTree "stuff," and intend to provide some code that should be usable by anyone wanting to do B-Trees at the kernel level, whether that be with ReiserFS, ext3, "ext4," or (and this has been explicitly mentioned) SGI's XFS.
Considerable discussion has taken place in trying to coordinate needed modifications to kernel code in terms of:
- VFS
- Buffer management
- Cache management
It often enough turns out that what one group needs another finds that they also need.

--
If you're not part of the solution, you're part of the precipitate.

It'll be available when it's complete by Christopher+B.+Brown · 1999-11-06 12:54 · Score: 2

SGI is still working on it.

You haven't seen a release; based on the discussions at ALS involving the developers, it would be surprising to see a "beta" before the end of 1999.

A "beta" is not production code, and doesn't include integration into the "regular" kernel. I would be entirely unsurprised to hear that this hasn't yet occurred by the middle of next year.

will it replace ext2fs?

Not likely any time soon...

--
If you're not part of the solution, you're part of the precipitate.

Yes, It Works. by Christopher+B.+Brown · 1999-11-06 13:01 · Score: 2

I've got a partition that has been running ReiserFS for quite some time now.

As for the possibility of forking, that was intended as a way of raising funding to support the free version. Now that SuSE is funding ReiserFS, it is rather less likely that Hans Reiser will be feeling the need to bang on Sun's door looking for money.

The hype may have been about XFS, but note that no code for XFS has been publicly released. And note that ReiserFS has been under active development since at least July 1997, which means that while silly people that watch fads may have been off hyping XFS, ReiserFS is hardly new and hardly surprising.

Note, all of these developments in filesystems move us towards having a choice of filesystems, and the ability to tune systems for one kind of behaviour or another. None are likely to supplant ext2 for our root partitions any time soon, in much the same way that commercial UNIXes' "advanced" filesystems have not largely supplanted "traditional UFS" for root partitions.

Plus ca change, plus ca reste meme.

--
If you're not part of the solution, you're part of the precipitate.

No, no effect. by Christopher+B.+Brown · 1999-11-06 13:06 · Score: 2

The critical bottleneck resulting in the 2GB limit is that of the VFS layer that sits in between the kernel and filesystems.

That bottleneck is not resolved by changes to filesystem functionality.

This means that ReiserFS does not fix the problem; this means that XFS does not fix the problem.

At present, your choices for resolving the 2GB file size limit are two:

Use the LFS API that SAS has promoted for allowing 32 bit UNIXes to support 64 bit file sizes when applications are recoded to use the LFS API.
Run a 64 bit architecture such as Alpha or UltraSPARC.

--
If you're not part of the solution, you're part of the precipitate.

Slashdot Mirror

First Journaling FS for Linux

41 of 281 comments (clear)