Slashdot Mirror


RAMdisk RAID?

drew_92123 asks: "I've got a friend who does a LOT of video editing but is on a limited budget. He is currently using a raid array and while quite large, it's not as fast as he had hoped. I had an idea and wanted to know if there is a way to make it a reality, so of course I though of all the brilliant minds here. I have about a dozen Pentium II computers with 1GB of RAM. I would like to upgrade them to 2GB and throw in some gigabit NICs and create a 1.9GB RAMdisk on each one. Then I want to use one of the computers to RAID the RAMdisks together to be shared via Samba most likely. They are all 1U systems, with no HDD's, just a 64MB IDE flash disk. Any ideas out there?" Has anyone successfully put together such a system? How well did it work for you and are there any caveats that you would like to share with others who would do the same?

4 of 114 comments (clear)

  1. Trading disk latency for network latency by pjcreath · · Score: 3, Interesting

    Um, aren't the network latency and bandwidth constraints going to obliterate any benefit you get from using RAM disks?

  2. Not the most in-expensive option by doozer · · Score: 3, Interesting

    Though what your suggesting would work, and I've done similar things before,
    think of the additional costs:

    12GB of Ram
    12 Gig nic cards
    1 >12 port Gig ethernet switch.
    Setup time

    For what your looking at spending, it may be the same cost as buying some U320
    scsi disks and some sort of SCSI raid card.

  3. Network wins over disk... by BDW · · Score: 2, Interesting

    ...but only if you can deal with the OS latency. My very rough understanding says any networking based on the OSI model is going to pay a sufficiently large penalty in OS latencies that remote memory probably won't be any faster than a good local disk subsystem. However, if you can get rid of that latency, you can win BIG.

    Since the questioner is looking at using commodity hardware with a commodity OS using a commodity networking protocol, my gut feeling is that (s)he doesn't have a prayer. It is a cool idea, but latencies are likely to be too high.

    The /. dreamers don't need to give up all hope, however. :) There is relevant work in the academic literature, using specialized hardware and software of course. The work I'm familiar with is from Hank Levy's group at UW. To sum up, based on what I remember from a class I took back in '98 from Mike Feeley (first author on said paper; also did his PhD thesis on the topic):

    The motivating example came from Boeing. They had a bunch of CAD workstations all with lots of RAM (by the standards of the day). However, looking at any nontrivial part of the design required more memory than any single workstation. Paging to disk was S-L-O-W. So why not use the frequently idle memory on the other workstations? The result of the UW work was a sort of global memory management, with paging to remote workstations in the cluster as well as to disk. Using memory on the remote workstations was significantly faster than using the local disk.

    So what about latency from the network stack? IIRC (and it has been five years since I talked to Mike about this...) they used myranet. In some sense myranet is basically DMA to remote workstations. One myranet node issues a write request in software, which includes the source address in memory for the data to be copied, a target node in the cluster, and the target memory address on the target node. The myranet hardware on the local workstation does DMA from the source memory location, fires it over fibre to the remote workstation, which dutifully does DMA from the myranet card to the memory locations specified by the sender. This is very fast, but not the stuff traditional general-purpose computing has been made of.

    Brian

  4. Re:Eh?? by mkldev · · Score: 2, Interesting

    Wouldn't the bottleneck of the NICs be an issue?

    Not as big as the bottleneck of the CPU. I don't know what type of video editing this person is doing, but odds are about 10:1 it's DV. I actually worked through the math on a video editing mailing list once and showed the fallacy of assuming that faster disk performance made any real difference. I'm actually rather surprised nobody else has already mentioned the logical flaw, given that this is slashdot, after all. :-)

    Amdahl's law tells us that speeding up a portion of an operation results in an overall speedup proportional to the ratio between the part that was sped up and the total operation, and that the maximum overall speed-up couldn't be more than the percentage of the total time that the operation used originally.

    Say you have an operation that takes ten seconds. It is divided into two parts, one of which takes 1 seconds, one of which takes 9. If you speed up the part that takes 3 seconds by a factor of two, you have only knocked a meager 5% off the overall time of the operation.

    In video editing, the amount of time taken for reading a block off a disk is totally insignificant compared to the amount of time needed to process the frame to such an extent that the disk access time is generally lost entirely due to the precision of most people's calculations.

    Basically, the time to seek to a given block was something like one sixth of a percent of the total time needed to decompress a single DV frame with a 1GHz CPU. Even if you cut your seek overhead by a factor of a thousand, you couldn't cut your time by even one percent.

    Worse, if the code is written correctly, it is possible --indeed trivial, given the well-defined access patterns involved in video editing -- to prefetch the blocks of data before the processor needs them, making the speedup absolutely zero, regardless of the speed of the disk.

    Long story short, while it's a cool idea, it won't have any noticeable benefit. Now if your friend were doing -audio- editing with multiple channels, where it is actually possible (even easy) to exceed the speed of a single 5400 RPM hard drive, that would be a different story. But at least for DV editing, there is no benefit of even moving from a single 5400 RPM drive to a 7200 RPM drive apart from having a greater safety margin to avoid the risk of dropouts when capturing. The benefits of moving to a RAID are even less, and to a huge ramdisk, thus, less still.

    --
    120 character sigs suck. Make it 250.