Slashdot Mirror


How To Use a Terabyte of RAM

Spuddly writes with links to Daniel Philips and his work on the Ramback patch, and an analysis of it by Jonathan Corbet up on LWN. The experimental new design for Linux's virtual memory system would turn a large amount of system RAM into a fast RAM disk with automatic sync to magnetic media. We haven't yet reached a point where systems, even high-end boxes, come with a terabyte of installed memory, but perhaps it's not too soon to start thinking about how to handle that much memory.

3 of 424 comments (clear)

  1. Re:Memory usage by wizardforce · · Score: 5, Interesting

    since we aren't even close to having boxes with more memory than we actively use
    640k should be enough for anyone. you do realize that the fact that computer manufacturers are happy bundling over 2 gigs of RAM in a default install so it runs Vista all prettily gives the linux users of us a fantastic advantage when we don't use anywhere near that on a regular basis. there are already linux distros that are small enough as to be sitting entirely in RAM, some even small enough to run on the L2+3 cache if you like. being able to do things like this is going to be a major advantage.
    --
    Sigs are too short to say anything truly profound so read the above post instead.
  2. Re:You only need 16GB of RAM for this to be useful by Anonymous Coward · · Score: 5, Interesting

    Did I hear a summer of code application?

  3. Speed vs tmpfs? by SanityInAnarchy · · Score: 5, Interesting

    How it seems to work:

    Actual "ramdisk" -- that is, like /dev/rd -- that is, appears as a block device. You can run whatever filesystem you want on it, but it's still serializing and writing out to... well, RAM, in this case. No sane way for the kernel to free space on that "disk" that's not actually used.

    How I wish it worked:

    No Linux that I know of has used an actual ramdisk in forever. Instead, we use tmpfs -- a filesystem which actually grows or shrinks to our needs, up to an optional configurable maximum size. It'll use swap if available/needed. It's basically a RAM filesystem, instead of a RAM disk.

    Even initrds are dead now -- we use initramfs. Basically, instead of the kernel booting and reading a ramdisk image directly to /dev/rd0, it instead boots and unpacks a cpio archive (like a tarball, but different/better/worse) into a tmpfs filesystem, and uses that.

    So, how I would like this to work is, use a tmpfs filesystem -- as I suspect it will be faster, and in any case simpler, than a ramdisk -- and back it to a real filesystem on-disk. The only challenge here is that it's not as deterministic -- it would be more like a cp than a dd.

    An even better (crazier) idea:

    Use a filesystem like XFS or Reiser4 -- something which delays allocation until a flush. In either case, it would take a bit of tweaking -- you want to make sure no writes, or fsyncs, block while writing to disk, so long as the power is on -- but you'll hopefully already be caching an obscene amount anyway, so reads will be fast.

    In this case, forcing everything out to disk could be as simple as "mount / -o remount,sync" -- or something similar -- forcing an immediate sync, and all future writes to be synchronous.

    Conclusion:

    Either of the two ideas I suggested should work, and could perform better than a traditional ramdisk. If it is, in fact, a simple disk-backed ramdisk (not ram filesystem), then it's both not as flexible (what if your app suddenly wants 50 gigs of RAM in application space?) and a bit of a hack -- probably a hack around traditional disk-backed filesystems not being able to take advantage of so much RAM by themselves.

    In fact, glancing back at TFA, it seems there are some inherent reliability concerns, too:

    If UPS power runs out while ramback still holds unflushed dirty data then things get ugly. Hopefully a fsck -f will be able to pull something useful out of the mess. (This is where you might want to be running Ext3.)

    Now, true, this should never happen, but in the event it does, the inherent problem here is that the ramdisk doesn't know anything about the filesystem, and so it doesn't know in what order it should be writing stuff to disk. Ext3 journaling makes NO sense for a ramdisk when the ramdisk itself knows nothing about the journal -- the journal is just going to slow down the RAM-based operation. Compare this to a sync call to XFS -- individual files might be corrupted, but all the writes will be journaled in some way, so at least the filesystem structure will be intact.

    This gets even better with something like Reiser4's (vaporware) transaction API. If the application can define a transaction at the filesystem level, then this consistent-dump-to-disk will happen at the application level, too. Which means that while it would certainly suck to have a UPS fail, it wouldn't be much worse than the same happening to a non-ramdisk device, at least as far as consistency goes. (Some data will be lost, no way around that, but at least this way, some data will be fine.)

    --
    Don't thank God, thank a doctor!