Slashdot Mirror


Reiser4 Benchmarks

unmadindu writes "Hans Reiser has benchmarked Reiser4 against ext3 and Reiserfs 3. Reiser4 turns out to be way faster than V3, and for ext3, why don't you check out the results yourself ? Hans Reiser states, "these benchmarks mean to me that our performance is now good enough to ship V4 to users", and he will be probably sending in a patch within the next couple of weeks to be included in the 2.6/2.5 kernel."

36 of 414 comments (clear)

  1. Reliability by prestwich · · Score: 5, Interesting

    My one concern is reliability and recovery from failure; I've had a few cases where my belief in ReiserFS has been questioned; however I can't get Ext3 to build on larger than 500GB arrays.

    At this point I'd happily choose based on reliability/recoverability/stability not raw speed.

    1. Re:Reliability by globalar · · Score: 5, Insightful

      Exactly. RAM, CPU, and storage space are ever increasing. Now we need better ways to organize data, access it, protect it, and back it up.

      The fact of the matter is, it is easier to make a fast system than a stable, reliable one.

    2. Re:Reliability by SaDan · · Score: 5, Informative

      ReiserFS has worked pretty good on 1.2TB RAID-5 array I helped build. We're running RedHat 7.2 on a box with a Promise SX6000 RAID controller.

      The drivers are crap, and the box dies about every week or so. Haven't lost a single file yet, and we're at 91% filesystem useage (millions of files).

      The / filesystem is ext3. It's about 20gigs, and has had to have files restored several times.

      I have a lot of confidence in ReiserFS, after seeing the incredible amount of abuse on this one particular machine. I have run ReiserFS for quite a while now (ever since it was part of the kernel) for all of my home systems, and have never had a single issue with those filesystems.

      Looking forward to what ReiserFS4 will bring.

    3. Re:Reliability by cvd6262 · · Score: 5, Interesting

      We had a massive failure of our primary database server while I was out of the country. (Trust me, nothing puts a damper on your day more than having one of your techs call you at midnight from 7,000 miles away.) I blame Reiser. Not because it caused the outage (it was hardware), but because it was so good, it made us a bit lax.

      We're just a small grant lab at a university, so it's not like this was a corporate system or anything, and there had been hardware problems before. Given that most of the people are not techies, they did not know how to ssh in and shutdown -r now, so they would just hit the reset button whenever they thought something was wrong and I wasn't around.

      Anyway, because of Reiser's journalling, the system would come right back up after a forced reboot. I think that the guys in the lab cut the power a couple of times to many and the hard drive just gave out.

      By the way, I just had a tech install a new drive, and Debian base with ssh. I knew the password he would use for root, and I was able to rebuild the entire system and restored 250,000 records in half a day.... From North Africa.

      Try that with a non-*nix.

      --

      I'd rather have someone respond than be modded up.

    4. Re:Reliability by FyRE666 · · Score: 5, Funny

      Given that most of the people are not techies, they did not know how to ssh in and shutdown -r now, so they would just hit the reset button whenever they thought something was wrong and I wasn't around.

      I've found users doing that to my servers before now. I find that hitting them on the nose with a rolled up newspaper and shouting "No! Bad monkey!" in a stern voice tends to stop this behaviour...

    5. Re:Reliability by Billings · · Score: 5, Interesting

      Yeah, you won't lose files, but you'll lose data. It's been noted elsewhere in this article's comments in more technical jargon, but it is a known flaw in ReiserFS that blocks of data can be written to flat out wrong areas. As an example, I had an outage while I was working with my config files and running an apt-get update;apt-get dist-upgrade. Reiser then managed to write the middle of a debian package file to whatever config file I was working with.

      Had me confused to hell until I saw a newsgroup discussion that mentioned the exact problem I was having. Does Hans Reiser know about this problem? Oh, yeah. He does. Is he concerned about it? No, he's not. In his own words he's not. And ReiserFS fails silently; you'll never know until you find it.

      When I setup ReiserFS on my machine, I was aware of similar complaints, but I dismissed them as fear of trying something unproven. And I was happy with ReiserFS for quite awhile, because I never saw anything wrong (unlike ext2/3). But I really can't support a FS that has these kinds of data integrity issues if the team has that kind of attitude towards them.

    6. Re:Reliability by hankaholic · · Score: 4, Informative

      You obviously know nothing about ReiserFS.

      ReiserFS does have speed as a goal; however, with ReiserFS 4, all filesystem operations are now atomic, which is functionally equivalent to having full data (not just metadata) journalling.

      In addition, having the fastest CPU in the world won't make ext[23] better at things for which ReiserFS is fast.

      CPU speeds are increasing. Storage space is increasing. RAM is cheap.

      However, none of that equates to "disks are fast". Having a fast CPU with a slow filesystem is like having a gigabit LAN connected to the Internet via dialup. Sure, internally you're quite good, but throughput will still suck eggs.

      The fact of the matter is, it is easier to have no clue what you're talking about than to read a little bit before posting.

      --
      Somebody get that guy an ambulance!
    7. Re:Reliability by hansreiser · · Score: 4, Informative

      If you are using metadata journaling, then a file that you are in the middle of writing to when it crashes can have garbage added to it. Note that Unix filesystems have had this feature since the days of FFS and UFS. Use data-journaling if you find that unacceptable. ReiserFS V3 supports both data-journaling and meta-data journaling now.

      Be warned though, that all fixed location journals double the transfer time cost of performing writes because the data must first be written to the journal, and then written somewhere else. This is why we don't make data journaling the default in v3. Trust me, full data journaling would have been far easier to code first than meta data journaling, but it isn't in the interest of the 'average' user.

      Now V4 is an atomic filesystem, which is much better than data journaling, because it means that all filesystem operations are performed fully atomically. Your write syscall either fully happens or it does not. Applications can have multiple filesystem operations performed atomically. We do this without writing the data twice through use of a technique called wandering logs, which I describe in a posting below (and on our website).

  2. Conversion? by avalys · · Score: 4, Interesting

    Does anyone know if there will be a conversion utility available - i.e, to convert ReiserFS v3 partitions to v4?

    --
    This space intentionally left blank.
    1. Re:Conversion? by hansreiser · · Score: 5, Informative

      No, V4 is not backward compatible with V3. V3 and V4 are kept as separate codebases so that the new V4 features don't destabilize V3. We are very serious about avoiding adding new features to V3, so that it can become a zero defect product.

      However, there is a tool called convertfs (as well as tar) which can convert V3 to V4. It can also convert ext2 to V3 or V4 or V3 or V4 to ext2. It is pretty clever (and written by someone outside our team), in that it creates a loop back mounted target filesystem inside a file inside the source filesystem, copies everything from the source to the target, and then reshuffles the blocks of the file so that they are at the offsets on the device that they were at within the file.

    2. Re:Conversion? by Anonymous Coward · · Score: 5, Funny

      C'mon folks, can't we find someone more authoritative than this guy. I mean, what would someone whose nick is "hansreiser" know about this stuff.

  3. wait! by BigBadDude · · Score: 5, Insightful


    hey, I can live with an unstable gnome or Kicq, but a beta filesystem?? no thanks dude!

  4. Reiser4? Competition? by GreyWolf3000 · · Score: 4, Insightful

    After reiser4, what filesystems are actually decent competition for it? It'd be nice for OSS to claim not only the best web server (apache), best kernel, and best filesystem.

    --
    Slashdot: Where people pretend to be twice as smart as they really are by behaving like children.
    1. Re:Reiser4? Competition? by 1s44c · · Score: 5, Funny

      After reiser4, what filesystems are actually decent competition for it? It'd be nice for OSS to claim not only the best web server (apache), best kernel, and best filesystem.
      --

      Sick of gentoo zealots throwing plugs in completely unrelated topics? Me too!


      Not to mention the best software installation system ( portage ).

    2. Re:Reiser4? Competition? by jd · · Score: 5, Insightful
      Actually, OSS claims several of the best filing systems! :)


      XFS is probably one of the fastest journalling filesystems out there, all-round, and probably offers the best competition to Reiser4. I'm actually surprised not to see some benchmarks against it, as XFS has gathered quite a following in places.


      The port of the Plan9 filing system is said to be one of the fastest filing systems out there - enough so that it's a part of a Government research program called "Pink", run by some mad scientists at Los Alamos. Yes, that Los Alamos. Again, this would be an excellent FS to have some benchmarks against.


      Last, but not least, Reiser4 didn't do spectacularly well against Ext3 in the benchmarks. I saw plenty of results both ways. Reading vs. Deleting, for example, shows a definite penalty whichever FS you choose, depending on the operations you're performing.


      In the end, if you truly want the fastest system, you should format partitions according to the type of workload they'll be doing. You want fast deletes on a /tmp partition, for example, but you will likely care much more about reading times on your application binaries, and modification times on your data files.


      (Unless you're using the suspend patch a lot, you probably won't want journalling on the /tmp partition, either.)


      A truly optimized system, therefore, isn't about picking your "one true love" of the filesystems. It's about deciding what criteria apply, and then looking to see what filesystem best meets that criteria.


      A mixed-fs machine should be capable of out-performing ANY homogenous-fs machine, no matter what fs the homogenous-fs machine has picked, because a homogenous system will always be a compromise. A mixed-fs system need compromise nothing. (Other than your sanity. Which, being a geek, is just a hinderence anyway.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  5. Honest Portability Question by jstockdale · · Score: 5, Interesting

    I am curious as to whether there are any projects to port Reiser4 to *BSD, particularly FreeBSD 5.x. Does anyone have any thoughts on how difficult a port might be? Can somone more versed in filesystems on *nix enlighten me as to the implimentation differences?

    --
    **AA: a bunch of mindless jerks who'll be the first against the wall when the revolution comes
  6. Computer's names translation by vadim_t · · Score: 5, Informative

    In case anybody cares, "strelka" means "arrow", and "belca" means "squirrel"

    Wonder what naming system they're using. I use names from Alice in Wonderland.

    1. Re:Computer's names translation by kliklik · · Score: 5, Informative

      Strelka and Belka are the names of the dogs that were sent into space.

      Quote from first google search result: "Belka("Squirrel")and Strelka("Little Arrow") were launched into space on board Sputnik 5 on August 19, 1960. They were accompanied on their historic flight by 40 mice, 2 rats and a number of plants. Belka and Strelka were safely recovered after spending a day in orbit. Strelka eventually gave birth to a litter of 6 healthy puppies, one of which was given to President Kennedy as a gift."

      --
      guru in training
    2. Re:Computer's names translation by hansreiser · · Score: 4, Interesting

      These are the names of the two dogs that were sent into outerspace by Russia.

  7. I don't understand the statistics by 0x0d0a · · Score: 4, Insightful

    The statistics on that page are measured in seconds, no? So larger numbers are worse.

    The comparisons are done with [foreign filesystem] divided by reiser4.

    One would think that numbers greater than one, where the foreign filesystem has a long running time and reiser4 a short one, would be the ones that benefit reiser4.

    Yet the numbers *less* than one are green, where Hans says reiser4 is considered better.

    What's going on?

    (Incidently, after having a friend lose a filesystem to buggy reiser code, I'm a bit inclined to wait until people have *seriously* hammered on this).

    1. Re:I don't understand the statistics by hansreiser · · Score: 5, Informative

      The script that creates the comparison tables divides the other filesystems by the base filesystem. The problem is that Reiser4 was used as the base filesystem in one of the benchmarks, but not the other. So in one benchmark, green is good, and the other benchmark, red is good.

      I would have fixed this before posting to lkml, but I had to catch a plane, sorry about that.

  8. Bugs by Kaladis+Nefarian · · Score: 4, Insightful

    While great, this announcement/benchmark/statement does not mean that ReiserFS V4 is ready for production use, just that it is fast. It needs a lot more bug testing before then, so don't rush out and mass-convert to V4 just yet! See here for the full thread, rather than just the first post...

    --
    * Several monkeys are here, playing banjos and wearing small hats.
  9. Which to choose for DBs? by Openadvocate · · Score: 4, Interesting

    I realise that it is a bit early to adopt V4, but stable issues aside, which filesystem would YOU choose to for database volumes for fx. Oracle or MySQL?

    --
    my sig
    1. Re:Which to choose for DBs? by Just+Some+Guy · · Score: 4, Informative

      The database is already massively journaled. There's little advantage in passing every byte through two seperate journal, and plenty of disadvantages (speed, resources, etc.).

      --
      Dewey, what part of this looks like authorities should be involved?
  10. XFS? by leoboiko · · Score: 4, Interesting

    How does it compare against everyone's favorite, XFS?

    --
    Prescriptive grammar:linguistics :: alchemy:chemistry. Stop being a nazi and learn some science.
  11. but will it make it by DemiKnute · · Score: 4, Interesting

    So he's submitting it to 2.6, but what are the chances it'll get submitted? Isn't this what caused all of Reiser's bitching a couple of years ago? He waited to long to get RFS into the kernel and ran into the feature freeze, and then pitched a hissy fit.

    --
    .
  12. Re:ok by silvaran · · Score: 4, Informative

    I'm not sure about the exact reasons why they don't support various other filesystems. The default bootup sequence of a RH system uses an initial ramdisk, and actually scans each partition available to find out where they should be mounted (they created nash, NAno SHell, which is just simple support for shell commands as well as fs label scanning). That's why you see the LABEL=/ in your /etc/fstab on a RH system. ResiserFS didn't support filesystem labels until 3.6, so using this setup could mess things up (with 3.5 or older), and justifies your point about having to "jump through hoops" to get reiserfs working. The simplest way I found to move to reiserfs was to change all the LABEL=??? specifications to actual device files, boot from a recovery disk, move everything around while reformatting the partitions as another filesystem, then finally rebooting.

  13. Filesystems for the laptop user? by niko9 · · Score: 4, Interesting

    Anybody know what, if any, features are being added for the laptop user? Last time a checked, journaled filesystems, like ext3, were generally a no-no if you wanted you battery to last.

    Maybe a filesystem just for laptop/tablet pc users?

  14. About reiser4 by Fefe · · Score: 5, Informative

    I attended Hans' presentation at Linuxtag.

    Basically, reiser4 is optimized for the case where you unpack a large tarball, say the Linux kernel, and have enough memory to hold it all in cache, which is true for most of us these days. reiser4 will then choose the optimal disk layout for these files and flush them to disk.

    Hans also has aspects of a log structured file system in reiser4, which means you don't write to the file, you write to a log file which basically encompasses the whole disk. The up side is that you mostly write linearly, the down side is that the files get badly fragmented if they are updated at all. Most files are not updated, just written once at installation of the package. The files that are updated frequently tend to be source code from CVS, which are small enough to fit in memory completely and have reiser4 choose an optimal disk layout again.

    The case where this model completely sucks is the case where you update many portions of a large file. For example, running an SQL database with files on a reiser4 file system as backend, or maybe a DNS server with DDNS, or a berkeley db backend for Postfix or qmail to keep the SMTP AUTH users or something. Also, log files will probably be badly fragmented.

    Hans proposes to have something like a transparent defragmenter running in the background, which he calls "repacker". This would run in the kernel space, as part of the file system, and defragment badly fragmented files that are accessed frequently. This would solve most of the down sides of his approach, but this repacker is not finished yet.

    My personal view of reiser4 is: it looks like it is optimized to perform well in benchmarks. It tries to be fastest for updating databases, but buys the performance by being slower when reading the data afterwards. The critical question is whether the repacker can alleviate these concerns, and as long as it is not finished, reiser4 is basically out of the question except for a little testing here and there. I reckon reiser4 would be a great filesystem for keeping your mozilla and gcc CVS checkout handy. But until the repacker is done, I will not even use it for testing, because the repacker really is the crucial component that makes or breaks this.

    By the way: my previous experiences with reiserfs were less than stellar. Some people call it shredderfs instead. The main complaint with reiserfs is and always was that the fsck is not nearly as trustworthy or stable as the one from ext2/ext3. So even if I use reiserfs at all, it's only for data I can afford to lose completely, like my CVS checkouts or the squid cache directories or something like that.

    The benchmarks do look good though, and I am glad that at least someone is still trying major innovations in this area. Since most Unix vendors or divisions are no longer profit centers, file system innovations have largely stalled or moved to specialized companies who regard them as proprietary (Veritas) instead of releasing them as free software like IBM and SGI did.

    1. Re:About reiser4 by hansreiser · · Score: 5, Informative


      The difference between us and an LSF is that we perform well BEFORE you run the repacker, and we merely perform even better after you run it. LSF's required that you run the repacker to get good read performance, we don't. V4 kicks V3's butt without the repacker by a lot (due to dancing trees, allocate on flush, extents, and ending the use of BLOBS, among other things). With the repacker, it will just kick it harder.




      Our approach synthesizes a lot of approaches, rather than considering one technique to be the answer to everything. This makes our performance more robust, as the different approaches each cover over each other's lackings. There are some situations in which using a repacker is higher performance than making lots of little changes while constantly maintaining optimal allocation of files.



      The repacker will be ready in a few weeks.

  15. If I'm reading these right.... by Anonymous Coward · · Score: 4, Insightful

    It's still not time to swap it for ext3 for general use.

    The first table with the mixed file sizes is the most compelling. The fact that reiser4's Create and Copy times are less than a third of the ext3s in real_time is impressive.

    But the fact that the CPU consumption on Read is double that for R4 as it is for ext3 is a serious problem. On a 1.3 Ghz machine saturating a generic UDMA 100 60G bus on RH 9.0 it's about 10% of the CPU, so the home user might not care. For a system capable of delivering serious data (like a 4 drive, 15k rpm SCSI RAID array @~3 times the read throughput) going from 30% CPU to 60% CPU usage is a definite problem. Even with a 2.6 Ghz cpu it would still move from chewing up 15% to 30%. I know these numbers don't scale exactly, but they could in fact scale ugly depending on how much CPU is dedicated to communicating with the hardware and how much is in fiddling with the filesystem. My production boxes spend > 80% of their disk activity reading, so I'm not yet inspired to go out and spend the time running benchmarks on highperf. systems just yet.

    Nevertheless, I always admire it when a new version of software comes out and it's noticeably faster than the old

  16. In other news... by wirelessbuzzers · · Score: 4, Interesting

    Apple benchmarked their new G6 processor against the latest 10 GHz Pentium V. They say that despite its lower clock speed, it runs their suite of PhotoShop 8 filters almost four time faster than the Pentium.

    Seriously, Hans Reiser is benchmarking his own file system, and he's using benchmarks that make his system look good. Like the SpriteLFS, his filesystem has a log structure for sequential writing, which makes it look really good in tests like he performed where you write the files once.

    Compare a database load, where you write small chunks of big files all the time. Without the repacker (like the cleaner in LFS), the disk becomes horribly fragmented. With the repacker, you have to include the slowdown of this background process defragging your hard disk. Ick.

    I'll trust his benchmarks when he presents a final, stable release, with the repacker on, and tests it under workloads such as would be encountered on a server. I might use it on my homebox even if it sucks on a server, but it would be nice to know that he considers his structure's impact on other workloads.

    --
    I hereby place the above post in the public domain.
  17. Re:"but you won't need to fsck" by hansreiser · · Score: 5, Informative

    That was V3. V4 is an atomic filesystem, which means that every filesystem operation is performed as a fully atomic transaction. This is more secure than the guarantees of data journaling, as data journaling doesn't necessarily guarantee that the write will complete.



    The reason we are able to do this in V4 but not in V3 is that V4 uses what I call wandering logs. With wandering logs, instead of copying data first to the journal, and then after commit copying it from the journal to the rest of the filesystem, (thereby writing the data twice), we just change our definition of where the journal is. I don't think that data journaling is worth going half-speed for most users. With V4, we not only don't go half-speed, we go faster than V3 ever went.



    For more details, please take a look at http:www.namesys.com/v4/v4.html

  18. Re:Warning by hansreiser · · Score: 4, Insightful


    It is important that you use a distro that bases their kernel on 2.4.18, or later, when using ReiserFS. There exists a distro that bases their enterprise server kernel on 2.4.9, and intentionally declines to add any reiserfs bugfixes since then. (This is the same distro that once shipped their kernel with the ReiserFS debugging code turned on so that we would go slow.) Do NOT use their kernel with ReiserFS. Generally when someone reports that they are having really bad experiences with ReiserFS, it turns out they are using that kernel.



    I generally recommend using the latest official kernel from Marcelo, and not any distro kernels, but the SuSE kernels tend to have effective ReiserFS support also, and not everyone out there shares my non-technical preference for a common community developed kernel.

  19. Re:Gee Re:Reliability by Daniel+Phillips · · Score: 4, Insightful

    Root is a prime candidate for a small (100MB should do it) ext2 system mounted "sync". You don't need decent write performance on root; in fact, many sysadmins make it read-only. Journalling is pointless if you're only writing to the filesystem once every 6 months to add a new user.

    On the contrary, that's exactly the case where you should always journal, and with full data journalling. You don't care about write performance, since you hardly ever do it, but you do care at lot about keeping your root filesystem consistent.

    --
    Have you got your LWN subscription yet?
  20. Re:some questions abou ReiserFS by hansreiser · · Score: 4, Interesting

    I get reports (not verified by me) that ReiserFS V3 is an order of magnitude faster when used as a backend for an XML database than relational databases that were tried. So, if your data happens to have a hierarchical structure, or, you can put it into one, then you are likely to get a performance gain. If your data does not have a hierarchical structure, then you need to wait for V6 where we plan to expand the semantics.

    If you want to be able to "cat filenameX/..owner" to see who owns "filenameX", you need to use V4.