Reiser4 Benchmarks

Re:Conversion? by mlg9000 · 2003-07-26 05:32 · Score: 3, Informative

ReiserFS v4 is backward compatible with v3. So if there isn't a conversion tool yet (don't know) you'll still be able use your v3 filesystems along side any new v4 filesystems you might add.

Computer's names translation by vadim_t · 2003-07-26 05:33 · Score: 5, Informative

In case anybody cares, "strelka" means "arrow", and "belca" means "squirrel"

Wonder what naming system they're using. I use names from Alice in Wonderland.

Re:Computer's names translation by kliklik · 2003-07-26 05:48 · Score: 5, Informative

Strelka and Belka are the names of the dogs that were sent into space.

Quote from first google search result: "Belka("Squirrel")and Strelka("Little Arrow") were launched into space on board Sputnik 5 on August 19, 1960. They were accompanied on their historic flight by 40 mice, 2 rats and a number of plants. Belka and Strelka were safely recovered after spending a day in orbit. Strelka eventually gave birth to a litter of 6 healthy puppies, one of which was given to President Kennedy as a gift."

--
guru in training

Re:ok by silvaran · 2003-07-26 05:48 · Score: 4, Informative

I'm not sure about the exact reasons why they don't support various other filesystems. The default bootup sequence of a RH system uses an initial ramdisk, and actually scans each partition available to find out where they should be mounted (they created nash, NAno SHell, which is just simple support for shell commands as well as fs label scanning). That's why you see the LABEL=/ in your /etc/fstab on a RH system. ResiserFS didn't support filesystem labels until 3.6, so using this setup could mess things up (with 3.5 or older), and justifies your point about having to "jump through hoops" to get reiserfs working. The simplest way I found to move to reiserfs was to change all the LABEL=??? specifications to actual device files, boot from a recovery disk, move everything around while reformatting the partitions as another filesystem, then finally rebooting.

Re:Reliability by SaDan · 2003-07-26 05:48 · Score: 5, Informative

ReiserFS has worked pretty good on 1.2TB RAID-5 array I helped build. We're running RedHat 7.2 on a box with a Promise SX6000 RAID controller.

The drivers are crap, and the box dies about every week or so. Haven't lost a single file yet, and we're at 91% filesystem useage (millions of files).

The / filesystem is ext3. It's about 20gigs, and has had to have files restored several times.

I have a lot of confidence in ReiserFS, after seeing the incredible amount of abuse on this one particular machine. I have run ReiserFS for quite a while now (ever since it was part of the kernel) for all of my home systems, and have never had a single issue with those filesystems.

Looking forward to what ReiserFS4 will bring.

Re:So how much better is this really? by gumpish · 2003-07-26 05:52 · Score: 3, Informative

Since these are times, I assume lower numbers are better? If so, they why are they usually in red in the report?

The columns marked B/A and C/A are the performance ratios of ext3 data journaling and ext3 to reiserV4 respectively. A ratio greater than 1 means more time was needed for the operation than reiserV4 while a ratio less than 1 means faster performance.

Re:I don't understand the statistics by silvaran · 2003-07-26 05:56 · Score: 3, Informative

A. reiser4
B. ext3 data journalling
C. ext3

That's how the filesystems are lettered. In the column headers, you see this:

A B/A C/A

So it looks like the first column is the actual time for reiser4 (A), the second column is the ratio between ext3-j (B) and reiser4 (A) which is B divided by A, and the third is ext3 divided by reiser4. So if the number is > 1 (red), it means reiser4 took less time and might be "better". If the number is 0 (green) it means reiser took MORE time.

I think.

About reiser4 by Fefe · 2003-07-26 05:58 · Score: 5, Informative

I attended Hans' presentation at Linuxtag.

Basically, reiser4 is optimized for the case where you unpack a large tarball, say the Linux kernel, and have enough memory to hold it all in cache, which is true for most of us these days. reiser4 will then choose the optimal disk layout for these files and flush them to disk.

Hans also has aspects of a log structured file system in reiser4, which means you don't write to the file, you write to a log file which basically encompasses the whole disk. The up side is that you mostly write linearly, the down side is that the files get badly fragmented if they are updated at all. Most files are not updated, just written once at installation of the package. The files that are updated frequently tend to be source code from CVS, which are small enough to fit in memory completely and have reiser4 choose an optimal disk layout again.

The case where this model completely sucks is the case where you update many portions of a large file. For example, running an SQL database with files on a reiser4 file system as backend, or maybe a DNS server with DDNS, or a berkeley db backend for Postfix or qmail to keep the SMTP AUTH users or something. Also, log files will probably be badly fragmented.

Hans proposes to have something like a transparent defragmenter running in the background, which he calls "repacker". This would run in the kernel space, as part of the file system, and defragment badly fragmented files that are accessed frequently. This would solve most of the down sides of his approach, but this repacker is not finished yet.

My personal view of reiser4 is: it looks like it is optimized to perform well in benchmarks. It tries to be fastest for updating databases, but buys the performance by being slower when reading the data afterwards. The critical question is whether the repacker can alleviate these concerns, and as long as it is not finished, reiser4 is basically out of the question except for a little testing here and there. I reckon reiser4 would be a great filesystem for keeping your mozilla and gcc CVS checkout handy. But until the repacker is done, I will not even use it for testing, because the repacker really is the crucial component that makes or breaks this.

By the way: my previous experiences with reiserfs were less than stellar. Some people call it shredderfs instead. The main complaint with reiserfs is and always was that the fsck is not nearly as trustworthy or stable as the one from ext2/ext3. So even if I use reiserfs at all, it's only for data I can afford to lose completely, like my CVS checkouts or the squid cache directories or something like that.

The benchmarks do look good though, and I am glad that at least someone is still trying major innovations in this area. Since most Unix vendors or divisions are no longer profit centers, file system innovations have largely stalled or moved to specialized companies who regard them as proprietary (Veritas) instead of releasing them as free software like IBM and SGI did.

Re:About reiser4 by hansreiser · 2003-07-26 07:42 · Score: 5, Informative

The difference between us and an LSF is that we perform well BEFORE you run the repacker, and we merely perform even better after you run it. LSF's required that you run the repacker to get good read performance, we don't. V4 kicks V3's butt without the repacker by a lot (due to dancing trees, allocate on flush, extents, and ending the use of BLOBS, among other things). With the repacker, it will just kick it harder.

Our approach synthesizes a lot of approaches, rather than considering one technique to be the answer to everything. This makes our performance more robust, as the different approaches each cover over each other's lackings. There are some situations in which using a repacker is higher performance than making lots of little changes while constantly maintaining optimal allocation of files.

The repacker will be ready in a few weeks.

Re:ok by Anonymous Coward · 2003-07-26 06:04 · Score: 3, Informative

that may be true for existing systems, but in the case of running an install from cd, instead of taking the default, add either "reiserfs" or "reiser" (don't remember which - think the "fs" one) to the boot options at the prompt when booting from the cd. it then adds reiserfs to the list of filesysems you can choose to format new filesystems with in the partitioning tool. pre-existing reiser filesystems will be handled correctly if you don't format them.

rh supports reiser just fine - they simply hide the option, probably because they feel ext3 is a safer choice.

Re:Conversion? by hansreiser · 2003-07-26 06:50 · Score: 5, Informative

No, V4 is not backward compatible with V3. V3 and V4 are kept as separate codebases so that the new V4 features don't destabilize V3. We are very serious about avoiding adding new features to V3, so that it can become a zero defect product.

However, there is a tool called convertfs (as well as tar) which can convert V3 to V4. It can also convert ext2 to V3 or V4 or V3 or V4 to ext2. It is pretty clever (and written by someone outside our team), in that it creates a loop back mounted target filesystem inside a file inside the source filesystem, copies everything from the source to the target, and then reshuffles the blocks of the file so that they are at the offsets on the device that they were at within the file.

Re:Which to choose for DBs? by Just+Some+Guy · 2003-07-26 06:51 · Score: 4, Informative

The database is already massively journaled. There's little advantage in passing every byte through two seperate journal, and plenty of disadvantages (speed, resources, etc.).

--
Dewey, what part of this looks like authorities should be involved?

Re:I don't understand the statistics by hansreiser · 2003-07-26 06:59 · Score: 5, Informative

The script that creates the comparison tables divides the other filesystems by the base filesystem. The problem is that Reiser4 was used as the base filesystem in one of the benchmarks, but not the other. So in one benchmark, green is good, and the other benchmark, red is good.

I would have fixed this before posting to lkml, but I had to catch a plane, sorry about that.

Re:"but you won't need to fsck" by hansreiser · 2003-07-26 07:15 · Score: 5, Informative

That was V3. V4 is an atomic filesystem, which means that every filesystem operation is performed as a fully atomic transaction. This is more secure than the guarantees of data journaling, as data journaling doesn't necessarily guarantee that the write will complete.

The reason we are able to do this in V4 but not in V3 is that V4 uses what I call wandering logs. With wandering logs, instead of copying data first to the journal, and then after commit copying it from the journal to the rest of the filesystem, (thereby writing the data twice), we just change our definition of where the journal is. I don't think that data journaling is worth going half-speed for most users. With V4, we not only don't go half-speed, we go faster than V3 ever went.

For more details, please take a look at http:www.namesys.com/v4/v4.html

Re:Reliability by Anonymous Coward · 2003-07-26 07:37 · Score: 3, Informative

Troll.

First I don't get what you mean by "huge amount of stuff in the partition"... In no place does he actually say that _anything_ is being stored on the root partition (we can assume it's an OS only partition).

It's probably a server with an internal 20GB drive connected to the 1TB raid. If you're suggesting the 20GB is a partition off the raid, well then you have no clue how these things work.

Re:Reliability by hankaholic · 2003-07-26 08:02 · Score: 4, Informative

You obviously know nothing about ReiserFS.

ReiserFS does have speed as a goal; however, with ReiserFS 4, all filesystem operations are now atomic, which is functionally equivalent to having full data (not just metadata) journalling.

In addition, having the fastest CPU in the world won't make ext[23] better at things for which ReiserFS is fast.

CPU speeds are increasing. Storage space is increasing. RAM is cheap.

However, none of that equates to "disks are fast". Having a fast CPU with a slow filesystem is like having a gigabit LAN connected to the Internet via dialup. Sure, internally you're quite good, but throughput will still suck eggs.

The fact of the matter is, it is easier to have no clue what you're talking about than to read a little bit before posting.

--
Somebody get that guy an ambulance!

Re:Reliability by JLester · 2003-07-26 13:05 · Score: 3, Informative

His point is that if you have a 2TB partition made of 8 250GB drives, you have no data protection at all. That's known as striping. If one drive fails, you lose the entire array with no way to rebuild it without a professional data recovery service (and even that might be iffy with Reiser).

RAID5 would give you a 1.75TB partition with the equivalent space of one drive reserved for fault tolerance. My preference would be RAID5+Spare with that much space which would give you a 1.5TB partition with 250GB reserved for fault tolerance and one hot spare drive in case one of the others fails. With that setup, you're okay until you lose three drives.

Jason

--
"FORMAT C:" - Kills bugs dead!

Re:Reliability by hansreiser · 2003-07-26 19:23 · Score: 4, Informative

If you are using metadata journaling, then a file that you are in the middle of writing to when it crashes can have garbage added to it. Note that Unix filesystems have had this feature since the days of FFS and UFS. Use data-journaling if you find that unacceptable. ReiserFS V3 supports both data-journaling and meta-data journaling now.

Be warned though, that all fixed location journals double the transfer time cost of performing writes because the data must first be written to the journal, and then written somewhere else. This is why we don't make data journaling the default in v3. Trust me, full data journaling would have been far easier to code first than meta data journaling, but it isn't in the interest of the 'average' user.

Now V4 is an atomic filesystem, which is much better than data journaling, because it means that all filesystem operations are performed fully atomically. Your write syscall either fully happens or it does not. Applications can have multiple filesystem operations performed atomically. We do this without writing the data twice through use of a technique called wandering logs, which I describe in a posting below (and on our website).

Re:Features by hankaholic · 2003-07-27 02:59 · Score: 3, Informative

I'm interested.

Are there features other than attributes provided by ext[23] which ReiserFS can't provide?

Regarding the attributes, ReiserFS can't do the following "useful" things:

- Excluding a single file from access-time logging.

- Append-only would be useful in certain cases.

- 'c' (compress file contents) -- Ext[23] doesn't do this yet, and ReiserFS doesn't either, but they both will support it later.

- 'D' (dirsync) attribute -- Other than ensuring that filesystem metadata operations are completed successfully before proceeding, I can't imagine a beneficial use of this attribute. ReiserFS already journals all metadata, and V4 journals everything (and does it faster).

- 'd' (do not back up using dump) -- AFAIK you can't use dump to back up a ReiserFS partition anyways, but tar still exists, and I'm sure there are wrapper scripts for the die-hard dump users.

- 'E' (error in compression) doesn't seem to apply to general (current) use of ext[23], and doesn't have an analogue in ReiserFS.

- 'I' (hash-indexed directory) -- ReiserFS' directory algorithms are much (much!) faster than ext[23]. ReiserFS will eventually support directory plugins, which allow one to set a per-directory hashing scheme. I note that ext[23] doesn't allow modification of this attribute, so I'm not sure if/how it allows you to choose a hashing scheme.

- Immutable ('i') bit -- This might be useful in some circumstances, such as a setuid binary. Not supported by ReiserFS.

- The 'j' option is meant to get around the fact that ext[23] only journals metadata unless mounted with "data=journal". As mentioned elsewhere, ReiserFS' long-term goal is to provide fully atomic file operations, which provides the benefits of data journalling with none of the burdens. Thus, ReiserFS V3 doesn't support this attribute, as the time spent in adding this feature would have been better spent working on V4. Note that with V4 all operations are atomic, which has the effect of turning this flag on for every file.

- 's' (zero file contents upon delete) -- ReiserFS doesn't support this.

- 'S' (write file synchronously) is likely used in cases where preserving file integrity is important. ReiserFS' journalling (especially in V4) provides this "for free".

- 'T' (top of directory heirarchy) is used to get around an ext[23] weakness. I don't recall details, but I believe it's something related to preventing fragmentation and grouping similar files together for enhanced performance. ReiserFS (especially V4) is already quite fast, and with code-level optimizations will likely get faster.

- 't' (don't merge tails) does not apply to current ext3 code, as tails aren't yet merged anyways. This will be used as a flag to write the file to disk in such a way as to be compatible with LILO and other boot loaders. At present I'm not aware of any special issues with ReiserFS' tail packing (which does exist, by the way) with respect to LILO. Note that a ReiserFS partition can be mounted with the "notail" option if you want, which is feature-wise equal to ext[23], which don't seem to provide tail packing at all right now (and if they do, someone needs to update the chattr man page).

- 'u' (back up upon deletion) -- ReiserFS doesn't support this.

- 'X' and 'Z' relate to compression, and do not apply to ext[23] or ReiserFS at the moment. ReiserFS will eventually support compression plugins, as well as encryption plugins.

So of the attributes, the only ones which might actually be of some use in the context of ReiserFS would be exclusion from atime logging, append-only, the immutable bit, zero-contents-upon-delete (although future crypto plugins will destroy the need for this feature), and back up upon deletion.

Anything I've missed?

--
Somebody get that guy an ambulance!

19 of 414 comments (clear)