Ask ReiserFS Project Leader Hans Reiser

← Back to Stories (view on slashdot.org)

Ask ReiserFS Project Leader Hans Reiser

Posted by Roblimo on Monday June 9, 2003 @03:10AM from the not-the-kind-of-journal-you-keep-on-Slashdot dept.

Hans Reiser leads a successful Free Software project that has attracted plenty of attention, many users, and even that Holy Grail of so many who have started their own Free or Open Source projects: Big-time funding from DARPA, SuSE, and others. How did he do it? What's his advice for other project leaders? Ask him! And ask him any other question you have in mind. Please stick to one question per post, and avoid questions that can be answered with a few minutes' worth of research. We'll publish Mr. Reiser's answers as soon as he gets them back to us.

22 of 343 comments (clear)

Min score:

Reason:

Sort:

Re:Good business planning by Surak · 2003-06-09 03:25 · Score: 2, Informative

More than just the support revenue angle, Hans has made money off of reiserfs by directly selling the code. From the reiserfs/README file:

Source code files that contain the phrase "licensing governed by reiserfs/README" are "governed files" throughout this file. Governed files are licensed under the GPL. The portions of them owned by Hans Reiser, or authorized to be licensed by him, have been in the past, and likely will be in the future, licensed to other parties under other licenses.

Among his customers have been DARPA and BigStorage, which are noted sponsors right on the front page. I think I remember reading that BigStorage is using ReiserFS for some sort SAN.

--
My journal has hot /. gossip.
Re:Why ReiserFS? by dasunt · 2003-06-09 03:31 · Score: 2, Informative

Integrity of data? Er, please read up on ext3 - a journelling filesystem, same as reiserfs, that seems to have the same (or slightly better) filesystem integrity as reiserfs.

The correct answer would be along the lines that reiserfs is better at handling some files then ext3 - especially small files. I have a ton of text files on an 80 gig shared drive - all small files. Since I'm using ext3, a lot of space is being wasted.
Re:Why ReiserFS? by Deth_Master · 2003-06-09 03:33 · Score: 1, Informative

It was significantly faster for me. I downloaded movies and junk off of newsgroups. When I would open one of KDE's windows to start Parring/Unraring it would sit there for quite a while and the hard drive light was on as it was reading the information. When I switched to ReiserFS just to try it, it took significantly less time to load the information. It's a lot faster than ext3 and its just as secure if not more.
I also like the way it's designed it's written so that you can put modules in it. Say you want to add encryption support to the filesystem. You can write a module and load it into the Filesystem and encrypt everything written to the drive transparently. Not saying that I know how to do that. That's far beyond my programming skill at the moment. Mostly I like it for the speed that I gained.

--
find ~your -name '*base* | xargs chown :us
Re:What is the future of ReiserFS... by arth1 · 2003-06-09 03:34 · Score: 5, Informative

ReiserFS main competitor isn't really EXT3.
EXT3 is a journaling addition to EXT2, and much more interesting for people who want to change their existing file systems instead of creating new file systems. Note that EXT3 is slower than both ReiserFS and EXT2, but it does have journaling, and provides faster reboots :-)

The main competitor for performance is SGI's excellent XFS. The latest implementations are quite solid, and the performance likewise are excellent. Even compared to ReiserFS.
Both ReiserFS and XFS suffer from the potential of data loss on system failures, and XFS probably more so than ReiserFS, as tiny files might not be committed at all. However, for RAID users, I can not see any reason to use ReiserFS instead of XFS, and definitely not EXT3 unless upgrading the file system.

Regards,
--
Arthur Hagen
Re:ReiserFS and laptops by Lukey+Boy · 2003-06-09 03:38 · Score: 5, Informative

I had the same problem. Disable access time in your fstab file and the drive will not be so frequent - apparently ReiserFS spools and flushes the atime data, keeping everything spun up. Make a line in fstab like this:
/dev/hda5 / reiserfs noatime,errors=remount-ro 0 1
In fact, I disable access time tracking on every box I work with. I haven't found a worthwhile reason to ever enable it. And that's my 2 cents!
When do you expect it to be released? by Drinian · 2003-06-09 03:42 · Score: 2, Informative

From the front page of the website:
"Reiser4 is due June 30, 2003!"
Re:ReiserFS and laptops by arth1 · 2003-06-09 03:46 · Score: 3, Informative

atime can be quite useful for caches, like client and proxy web caches and man page caches. It's also used for other services that expire data based on access time, like usenet leaf servers, and log rotating programs.

Before turning off atime, I advise that an effort is made to identify what data really needs atime, and if possible create separate partitions for those, with atime enabled.

Regards,
--
*Art
Re:ReiserFS and laptops by Utoxin · 2003-06-09 03:55 · Score: 3, Informative

atime is necessary for one major component of a lot of websites: The PHP Session files.

The default PHP session handler uses the atime of the files to expire them properly. If they don't have atime, they get expired prematurely. (I think... It's been a while since I made the mistake of noatime on the partition that holds the session files.)

My solution to this is to use noatime everywhere except the /tmp partition. I also use notail on the /tmp partition, and anywhere that has frequent file IO.

--
Matthew Walker
http://www.tweeterdiet.com/ - My Diet Tracking Tool
Re:What is the future of ReiserFS... by volkerdi · 2003-06-09 03:58 · Score: 2, Informative

SGI's XFS still occasionally hangs my machine under heavy load. Plus, by the time they have a release out for 2.4.20 (they still don't), I'm sure I'll be running 2.4.21. In addition, it's still not part of the standard kernel sources. XFS would have to be considered the least supported choice of the three.

Even though ext3 is a journaling filesystem, it still does a lengthy (and annoying) filesystem check every 20 mounts or so. To its credit it has never found an error, but still. I thought getting rid of that stuff was why we wanted journaling filesystems.

ReiserFS has been rock solid for me, and has been the default Slackware filesystem for two releases. I don't forsee something else replacing it as default any time soon. It's still a bit of a moving target, though... if you're thinking of running a few different kernel versions you may run into situations where your filesystem has features that are too new to be mounted. (In those kinds of cases ext2 is still the safe choice)

There's also IBM's JFS. The one thing I've noticed about that is that a newly formatted partition won't mount cleanly until you've run fsck.jfs on it. This doesn't inspire great trust, but other than that I've had no problems while testing it.
Re:Hash collisions by gazbo · 2003-06-09 03:59 · Score: 2, Informative

Statistics not your strong point? By your reckoning lightning should never ever strike anywhere because the probability of it striking anywhere is so slim.
The tricky thing about solving the hash problem (in cryptography) is finding a value that when hashed matches a given string. Here, we are saying that given several hundred thousand keys, what is the probability that any two of them hash to the same value.
The probability is far enough from zero to be a significant danger. Just because hashtables and one-way encryption both use the hashing algorithms does not mean that you can use the same figures.
Re:What is the future of ReiserFS... by GrenDel+Fuego · 2003-06-09 04:04 · Score: 3, Informative

Even though ext3 is a journaling filesystem, it still does a lengthy (and annoying) filesystem check every 20 mounts or so. To its credit it has never found an error, but still. I thought getting rid of that stuff was why we wanted journaling filesystems.

I personally think that the occasional check is probably a good idea, but if it annoys you then you can always change the interval, or even disable it.

Just use "tune2fs -c <how many mounts> /dev/PARTITIONNAME"

-c 0 should cause it to not use that functionality.
Re:ReiserFS and laptops by vofka · 2003-06-09 04:07 · Score: 2, Informative

Wait until one of your boxes gets r00ted, and you (or some other poor soul dealing with one of your mangled boxes) need to do some fairly in-depth forensic analysis on the box to work out exactly what was happening, to what file, in what order.

The Access Time attribute can yield some useful clues to what was going on during an attack when you are doing a forensic analysis. Sure, there are plenty of other things to look at before you get that deep into things, but it's still useful to have sometimes!

--
Disclaimer: I meant what I thought, not what I wrote! What? You can't read my Mind? Oh dear!
Re:What is the future of ReiserFS... by opk · 2003-06-09 04:16 · Score: 2, Informative

SGI's XFS still occasionally hangs my machine under heavy load. Plus, by the time they have a release out for 2.4.20 (they still don't), I'm sure I'll be running 2.4.21.

Just go to http://oss.sgi.com/projects/xfs/patchlist.html and pick up the patch against 2.4.20. Works very well for me. All the releases get you is a bunch of release notes and rpms against RedHat kernels. I always get these patches which come out very promptly after the stock kernel release and work very well.
Re:Why ReiserFS? by rossjudson · 2003-06-09 04:53 · Score: 2, Informative

As a Java developer this is what I am interested in...Java produces very large numbers of small files. Any file system that handles this more efficiently is going to make for faster compilation.
Re:Naming by uhoreg · 2003-06-09 04:56 · Score: 2, Informative

The information doesn't seem to be in the current kernel, but in an older patch (search for treefs):
Two other former employees were involved who won't be getting credit here because they tried to kill the project at the end of it, and almost succeeded (they cost me maybe two years). They wanted to force me to sell it to the company they tried to start. They get to keep the money they got from me, and that is it. When their "VC" said that he could get a hundred researchers to swear in Russian Court that I had had nothing to do with the development of treefs, I changed the name to reiserfs and registered the copyright.

--
To get something done, a committee should consist of no more than three persons, two of them absent.
Re:Hash collisions by Q+Who · 2003-06-09 05:40 · Score: 2, Informative

ReiserFS doesn't use cryptographic hash by default.
Get a clue before you post irrelevant (and incorrect) information.
Re:Bad block handling! by markrages · 2003-06-09 05:59 · Score: 2, Informative

Here's something to try:

This month, I had two disk-failure on a 1.0 TB software raid5 with ReiserFS.
I was able to copy most of the data with dd_rescue and myrescue.
By the time I was finished mucking around, I had dome mkraid -f several times, so there were spots of missing data on the disk. The filesystem would not mount. So I used resierfsck --rebuild-tree, and once it completed five days later, I was able to mount the filesystem, with most of the files intact.
One very very important question. by jd · 2003-06-09 06:31 · Score: 2, Informative
There are now a HUGE number of filesystems out there for Linux. To make the question make sense, I'll need to list a bunch.
Non-Journalled, or Unsure:
- Ext2 (Bronze-age tech. Better than stone-age FAT, neolithic VFAT or iron-age UnixWare)
- v9fs (Plan 9 meets Linux, popular with nuke scientists)
- Befs (Mmmmm.... Be-fy!)
- NTFS (Can't write safely, then can't use much)
- UFS (Same as above)
- ISO9660 (If the image hasn't been burned, then you might as well make it read/write)
Journalled:
- Ext3FS (the XT-architecture of filesystems)
- XFS (Very fast, but incompatible facilities)
- JFS (The development is sloooow)
- LogFS (abandonware, but fully-logging if anyone actually finished it!)
- ReiserFS (Really nice, and an innovative use of B*-trees!)
Network file-systems
- Intermezzo (unreliable)
- CODA (unreliable)
- Novell Netware (reliable but ancient)
- NFS (reliable but stupid)
- CIFS (as reliable as Microsoft specs get)
- Lustre (used by Linux supercomputers)
  To summarize: We have a horribly-large number of filesystems, most of which are incompatiable, many of which do not support the Linux security module extensions, one (e2fs) provides defragging under Linux, and none at all provide support for conversions.
  Hey, diversity is good! I -like- diversity! I want MORE diviersity! I also want ways to efficiently move data around.
  Will future versions of ReiserFS include additional userland tools for defrag, fs conversion, scope of logging (eg: none, meta, full), pluggable hashing algorithm, etc?
  Ultimately, all the choice in the world is no choice at all if there's no way to make use of those choices.
--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:What is the future of ReiserFS... by Surak · 2003-06-09 07:16 · Score: 2, Informative

Gentoo -- and no, I'm not, Gentoo gives you the choice of vanilla sources, gentoo sources, xfs sources, etc.

--
My journal has hot /. gossip.
Re:Hash collisions misquote by Anonymous Coward · 2003-06-09 12:52 · Score: 2, Informative

Up to 127 filenames per directory (MAX_GENERATION_NUMBER, defined in include/linux/reiserfs.h) can have the same hash value. After this, creating more filenames with this hash is impossible (the EBUSY error code is returned). ReiserFS does NOT blindly overwrite files because of hash collisions.

You can choose from multiple hash algorithms when you create the filesystem (faster hashes have a greater probability of hash collision). But collisions aren't a reason to avoid ReiserFS - most other filesystems (including ext2/ext3) won't get anywhere near a million files in a directory before suffering huge performance losses.

The following code was taken from linux-2.4.21-rc1/fs/reiserfs/namei.c and demonstrates the handling of hash collisions.

gen_number = find_first_zero_bit ((unsigned long *)bit_string, MAX_GENERATION_NUMBER + 1); if (gen_number > MAX_GENERATION_NUMBER) { /* there is no free generation number */ reiserfs_warning ("reiserfs_add_entry: Congratulations! we have got hash function screwed up\n"); if (buffer != small_buf) reiserfs_kfree (buffer, buflen, dir->i_sb); pathrelse (&path); /* * Trivial changes by Alan Cox to remove EHASHCOLLISION for compatibility * * Trivial Changes: * Rights granted to Hans Reiser to redistribute under other terms providing * he accepts all liability including but not limited to patent, fitness * for purpose, and direct or indirect claims arising from failure to perform. * * NO WARRANTY * This is one of two lines that this fix consist of. */ return -EBUSY; /* I think it was better to have an error code with a name that says what it means, but I choose not to fight over it. Persons porting to other operating systems should consider keeping it as it was (return -EHASHCOLLISION;). -Hans */ } /* adjust offset of directory enrty */ put_deh_offset(deh, SET_GENERATION_NUMBER(deh_offset(deh), gen_number)); set_cpu_key_k_offset (&entry_key, deh_offset(deh)); /* update max-hash-collisions counter in reiserfs_sb_info */ PROC_INFO_MAX( th -> t_super, max_hash_collisions, gen_number ); if (gen_number != 0) { /* we need to re-search for the insertion point */ if (search_by_entry_key (dir->i_sb, &entry_key, &path, &de) != NAME_NOT_FOUND) { reiserfs_warning ("vs-7032: reiserfs_add_entry: " "entry with this key (%K) already exists\n", &entry_key); if (buffer != small_buf) reiserfs_kfree (buffer, buflen, dir->i_sb); pathrelse (&path); /* Following line is 2nd line touched by Alan Cox' trivial fix */ return -EBUSY; /* I think it was better to have an error code with a name that says what it means, but I choose not to fight over it. Persons porting to other operating systems should consider keeping it as it was (return -EHASHCOLLISION;). -Hans */ } }
Why the difference by dorfsmay · 2003-06-09 15:14 · Score: 2, Informative

Another big reason why a lot of people implement snapshot differently than NetApps, is to avoid shooting yourself in the foot. With NetApps, the snapshot data is kept on the same volume as the data itself, which leads to situation where you jump from say 50% usage to 99% just like that overnight (the snapshot area is allowed to run over the data area). This is quite a delicate situation as deleting files makes things worse (you have to get rid of old snapshots to free up space). I have seen big production database taken to their knees because of this.

On the other hand, the other implementations are a bit slower because the blocks are copied instead of being just not deleted, but snapshots never take space from data. The implementer has to make the choice, space control and simple understanding of space vs. speed to snapshot and recovery.
Re:Another reiser4 by Wolfrider · 2003-06-09 21:11 · Score: 2, Informative

--Reiser4 was planned from the ground up to surpass v3. One of its features is delayed block allocation until write-to-disk, which is expected to make the whole filesystem much more efficient. They will also be making the size of the journal smaller, which should finally enable me to start using Reiserfs on 100-Meg Zip disks. :)

--Hans has said in the past that he believes filesystems should be re-written from scratch every few years, so they can take advantage of algorithm improvements and new concepts. He's making good on his word. The V4 whitepaper ( http://www.namesys.com/v4/v4.html ) is an interesting read, especially if you're into/understand database design.

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??