Slashdot Mirror


RAMdisk RAID?

drew_92123 asks: "I've got a friend who does a LOT of video editing but is on a limited budget. He is currently using a raid array and while quite large, it's not as fast as he had hoped. I had an idea and wanted to know if there is a way to make it a reality, so of course I though of all the brilliant minds here. I have about a dozen Pentium II computers with 1GB of RAM. I would like to upgrade them to 2GB and throw in some gigabit NICs and create a 1.9GB RAMdisk on each one. Then I want to use one of the computers to RAID the RAMdisks together to be shared via Samba most likely. They are all 1U systems, with no HDD's, just a 64MB IDE flash disk. Any ideas out there?" Has anyone successfully put together such a system? How well did it work for you and are there any caveats that you would like to share with others who would do the same?

26 of 114 comments (clear)

  1. Eh?? by foooo · · Score: 5, Informative

    Wouldn't the bottleneck of the NICs be an issue?

    You might just try reconfiguring the raid to be Raid 0+1 (Striped and Mirrored) That would give you the redundancy and speedy access. If the RAM is already maxed out on the video workstation it might be more cost affective to get a better motherboard that supports more RAM.

    ~george

    1. Re:Eh?? by mkldev · · Score: 2, Interesting

      Wouldn't the bottleneck of the NICs be an issue?

      Not as big as the bottleneck of the CPU. I don't know what type of video editing this person is doing, but odds are about 10:1 it's DV. I actually worked through the math on a video editing mailing list once and showed the fallacy of assuming that faster disk performance made any real difference. I'm actually rather surprised nobody else has already mentioned the logical flaw, given that this is slashdot, after all. :-)

      Amdahl's law tells us that speeding up a portion of an operation results in an overall speedup proportional to the ratio between the part that was sped up and the total operation, and that the maximum overall speed-up couldn't be more than the percentage of the total time that the operation used originally.

      Say you have an operation that takes ten seconds. It is divided into two parts, one of which takes 1 seconds, one of which takes 9. If you speed up the part that takes 3 seconds by a factor of two, you have only knocked a meager 5% off the overall time of the operation.

      In video editing, the amount of time taken for reading a block off a disk is totally insignificant compared to the amount of time needed to process the frame to such an extent that the disk access time is generally lost entirely due to the precision of most people's calculations.

      Basically, the time to seek to a given block was something like one sixth of a percent of the total time needed to decompress a single DV frame with a 1GHz CPU. Even if you cut your seek overhead by a factor of a thousand, you couldn't cut your time by even one percent.

      Worse, if the code is written correctly, it is possible --indeed trivial, given the well-defined access patterns involved in video editing -- to prefetch the blocks of data before the processor needs them, making the speedup absolutely zero, regardless of the speed of the disk.

      Long story short, while it's a cool idea, it won't have any noticeable benefit. Now if your friend were doing -audio- editing with multiple channels, where it is actually possible (even easy) to exceed the speed of a single 5400 RPM hard drive, that would be a different story. But at least for DV editing, there is no benefit of even moving from a single 5400 RPM drive to a 7200 RPM drive apart from having a greater safety margin to avoid the risk of dropouts when capturing. The benefits of moving to a RAID are even less, and to a huge ramdisk, thus, less still.

      --
      120 character sigs suck. Make it 250.
  2. Do some tests first by HotNeedleOfInquiry · · Score: 5, Informative

    Before you buy a bunch of hardware, set up one ramdisk with a network link and find out what your real-life tranfer bandwidth is. I'll bet that the gain, if any, would not be worth the effort.

    --
    "Eve of Destruction", it's not just for old hippies anymore...
    1. Re:Do some tests first by DetrimentalFiend · · Score: 3, Insightful

      I completely second this. The costs of the hardware will not be worth it. I suspect that you would do a lot better adding disks to the array, creating another array, or upgrading the ram on the main machine. Although it sounds like a cool idea in concept, this is not a good idea if you're doing anything but playing arround with the tech. It does sound like a cool project, though.

  3. what? by Anonymous Coward · · Score: 2, Insightful

    you want to fragment all filesystem access into super-tiny 1500 byte ethernet packet access to each of several hosts for a mere 20gb ram "filesystem"?

    gig E does not perform that well on a single host. You'll -might- get lower latency than a cheap raid array but the thruput won't be any better.

    why waste time with "RAID" anyways? this ram is all ECC ram and is non-persistent so there's no point in tossing that extra computation into the mix to make it worse.

    1. Re:what? by otuz · · Score: 3, Funny

      he needs the RAID part when one of these machines crashes.

  4. Power outage by CounterZer0 · · Score: 2, Informative

    Just remember, he's SCREWED if the power goes out and he hasn't flushed that /huge/ RAMdisk to a real disk.

    1. Re:Power outage by coryboehne · · Score: 3, Insightful

      Ahh, come one... Anyone who is crazy enough to want to setup a lan-based-ram-raid-array (Whew...) is certainly going to be sane enough to set up some sort of serious power backup.

      Of course just to repeat and be redundant, the bottle neck will simply come down to the NIC card, a possible solution would be to install several NIC's and spread the load out... But this person would be much better off simply buying a system that will either (a) handle 24 GB of RAM, or (b) has fast enough disks to please him...

      Of course a better idea might be to buy a system that would support both (a) and (b) and then set the system up to use the ram as a primary holding area for data, then flush to the (now faster) disk array...

      Either way this person is still affected by "The Sickness" that we all seem to suffer from (More, bigger, better, faster....)

    2. Re:Power outage by Radical+Rad · · Score: 2, Funny

      Maybe thats the point. If he runs a porn site he could wipe all the evidence by hitting a kill switch but with RAID and UPS he is protected from accidental data loss. He could also rig a software deadman switch which shuts down the systems unless it receives a signal every so often.

  5. Also... by foooo · · Score: 5, Insightful

    Ever thought of just upgrading your SCSI controller? You can get RAID controllers that have insane amounts of RAM in them. That might patch up any access issues.

    If your concern is extended duration throughput the multiple rack computers with ram *might* be an option, but most normal users wouldn't consider it due to the latency involved with going through the southbridge then the nic then the nic then the southbridge (of the other computer) then the north bridge. And that's just a one way trip.

    Just don't shell out a bunch of money before you do a proof of concept.

    ~george

  6. First reactions. by FreeLinux · · Score: 4, Informative

    At first glance, this sounds like an incredible waste of time. RAID RAMDisk? Why? Are you crazy? What's the point?

    But, if you give it some thought it is an interesting idea. Basically you are trying to build a clustered RAM disk..

    There is however, a major drawback to this idea. The whole advantage of a RAM disk is speed/performance. Locally, the RAM disk is MUCH faster than a normal disk drive. But, the problem arises when you connect your "RAID RAM disk". You must network the machines in order for them to communicate with each other and suddenly, your performance has dropped to nothing. In fact is is below the performance of a normal disk drive.

    In order for your RAID RAM disk to perform equally with a good disk drive you would require a switched gigabit network between your nodes. This will cost more than the "normal" disk. Additionally, even with a switched gigabit network the performance is highly unlikely to exceed the performance of highend disk drives.

    So, when you get right down to it, the RAID RAM disk is an interesting idea, just to see if you can do it. But, there isn't really any advantage to it.

    1. Re:First reactions. by CounterZer0 · · Score: 2, Insightful

      Sorry, but I don't think so.
      RAM -> RAM across a network (assuming at LEAST 100mbit ethernet) will be FASTER than accessing a RAID of local disks. It's all memory to memory transfer at that point - no spin up, no seek time. The disk's may get close for a very long sequential write/read, where the multiple drives can actually come close to using the bandwitdh available via the RAID controller.
      But for random access...no way. RAM 'seek time' is measured in NANOSECONDS, while even the fastest drive is in the miliseconds! RAM is over 1000 times faster!

    2. Re:First reactions. by Harik · · Score: 5, Informative
      Sayeth CounterZer0:
      Sorry, but I don't think so. RAM -> RAM across a network (assuming at LEAST 100mbit ethernet) will be FASTER than accessing a RAID of local disks. It's all memory to memory transfer at that point - no spin up, no seek time. The disk's may get close for a very long sequential write/read, where the multiple drives can actually come close to using the bandwitdh available via the RAID controller.

      I, however, beg to differ.

      harik@taz:~$ ping -s 1492 192.168.100.99 PING 192.168.100.99 (192.168.100.99) 1492(1520) bytes of data. 1500 bytes from 192.168.100.99: icmp_seq=1 ttl=64 time=2.80 ms 1500 bytes from 192.168.100.99: icmp_seq=2 ttl=64 time=2.77 ms 1500 bytes from 192.168.100.99: icmp_seq=3 ttl=64 time=2.77 ms 1500 bytes from 192.168.100.99: icmp_seq=4 ttl=64 time=2.77 ms 1500 bytes from 192.168.100.99: icmp_seq=5 ttl=64 time=2.77 ms
      This is two machines sitting side by side on a seperate, completely unloaded switch. Don't just go by the 500ns ping time, you actually have to transfer data. You're talking at LEAST 3ms PER BLOCK... and thats with some insanely optimized code.

      Now, for video editing 99% of the effort is linear (unless you are horribly fragmented) so you're talking ONE 6ms seek ONCE then thousands upon thousands of linear reads.

      Secondly, his "raid array" sucks if the performance is bad. I buy low end LSI Express 500s (Ultra 160 LVD) and they have stellar performance. For doing AV, this is my reccomendation:

      Buy a multi-channel Ultra160 or Ultra320 SCSI Raid controller (160s are pretty cheap now that 320s are on the market) Load it up with 5 large drives. Set the stripe size to the maximum. Buy a cheaper IDE RAID and set it in mode 15 (Mirror two RAID5 arrays together, harder to lose data that way.)

      Use the SCSI for your working set, and reformat it frequently (or at least delete all files) to defrag. Use RAID0, it's faster. Save your finished projects to the IDE raid, burn to DVD, DLT, whatever.

      It will _STILL_ be cheaper then putting 2gig of RAM unto a pile of boxes, AND faster. single-channel ultra-320 can hit you with up to 40 megaBYTES per second, all on a measly 5ms initial seek. (Remember, ALL the drives seek in parallel) Putting drives on the second channel can whollup you with 80MB/second. You're talking around $1500 for the card, of course. But have you priced out a 1U server with 2gig ram lately?

  7. Not really cost-effective by baka_boy · · Score: 2, Informative

    Assuming that you'd have to buy at least some of the Gb networking hardware (switches, cables, etc.), you're really not going to be saving much. Assuming at least $100 per 'RAMdisk server', you'll be spending $1200+ for a ~20GB RAID array that will lose everything the minute the power blinks, not to mention drawing several kilowatts of AC.

    On the other hand, if you just throw four 100GB ATA-100 drives in a standard tower case with a decent IDE RAID controller, you get five times as much storage for probably about half the money.

    Also, remember that most low-to-mid-range PCs can't actually fully take advantage of a gigabit network link, since the PCI bus and CPU get saturated long before the network does.

  8. Trading disk latency for network latency by pjcreath · · Score: 3, Interesting

    Um, aren't the network latency and bandwidth constraints going to obliterate any benefit you get from using RAM disks?

    1. Re:Trading disk latency for network latency by GoRK · · Score: 4, Informative

      I'd bet money that he doesn't even have a 64-bit PCI slot in his beefy video editing client. Even marginal IDE RAID cards with 4+ 7200RPM drives can saturate that with no problem. Get a nice 64bit SATA RAID controller and pick up aobut 8 of those new WD SATA 10KRPM drives due out this month and you'll have a local solution that will easily max out a 64bit PCI bus.

  9. This is probably the problem. by FreeLinux · · Score: 2, Insightful

    [FlameSuit On] I'm sure you all will flame me for this but, I can take it.{/FlameSuit Off]

    It is very likely that he is already using IDE or ATA disks, and that is part of his problem. When large amounts of data need to be transferred quickly, SCSI is what you need. There is nothing faster than 15,000 RPM SCSI drives connected to good RAID controllers that have large amounts of cache RAM. Nothing.

    If you want high performance then you must use high performance gear. Yes, it does cost 5 to 10 times more than the IDE RAID solution but, there is a VERY good reason for that.

    Ok, now comes the flames from the know-it-all masses who's experience is limited to home PCs and no-traffic webservers.

  10. And for the obligatory... by KurdtX · · Score: 3, Insightful

    How about a be- *bang!*

    *smack* *thump*

    *mass cheering*

    Btw, it does seem to be a (disturbing) recent trend at Slashdot to try to troll whole stories, instead of just trolling comments. C'mon anyone who's taken even one networking or hardware class knows the speed heirarchy:

    cache > memory > disk > network

    And, with the amount of physical RAM drives out there (very few), you'd quickly realize that even a local RAM drive doesn't offer enough of a speed benefit to offset it's cost. C'mon editors, I know it sounds cool, but do you really have to post it?

    --

    Kurdt
    I'm not anti-social. Just pro-technology.
  11. Not the most in-expensive option by doozer · · Score: 3, Interesting

    Though what your suggesting would work, and I've done similar things before,
    think of the additional costs:

    12GB of Ram
    12 Gig nic cards
    1 >12 port Gig ethernet switch.
    Setup time

    For what your looking at spending, it may be the same cost as buying some U320
    scsi disks and some sort of SCSI raid card.

  12. Network Block Device by Halvard · · Score: 2, Informative

    Sounds like it nbd may be your ticket if you are using Linux. nbd is designed to take a block device, like a hard drive and make it available over a network on a different host. It will also do RAID 0,1,5. Perhaps it will work with a ramdisk. I can't swear that this will work but is sure might, since after all, a ramdisk is implemented as a block device.

    RAM is cheap. If you are unconcerned about high electricity costs and need a large *F*A*S*T* device for storage, stripping a number of ramdisks could be the thing to do. PC133 1GB DIMMs are currently about US$200 and are on their way down. Sure, it's expensive compared to RAID 5, but I'm sure it's a lot faster. Just make sure you write out anything you need prior to downing the whole array.

  13. Closer is better by Anonymous Coward · · Score: 4, Informative

    Buy the RAM and use it with a few of these solid state disks. 4GB per PCI slot. But don't be disappointed if it still isn't as fast as you want it to be: The disks are probably not the bottleneck. I'd be surprised if a properly configured RAID array couldn't deliver adequate performance for video editing. Even single disks are fast enough to work with uncompressed video these days.

  14. You need to make block devices by bill_mcgonigle · · Score: 2, Informative

    This isn't a complete answer, for sure, but for linux RAID you need to present the RAM on computers A, B, and C to computer M (the mux) as block devices. You'll probably need to write a device driver for machine M that presents a block interface and speaks a UDP protocol to machines A, B, and C, where your server stores blocks on the local RAM disk according to whatever scheme works for you. Then 'just' edit the raidtab and build your md0 from the block devices. The reason 'just' is in quotes is because who knows if it'll work with a non-disk block device.

    Don't forget to deal with lost UDP packets, but you don't even want to go near TCP's latency on this. If you put them all on a switch your packet loss should be negligable anyhow.

    I don't think it's practical for your application but it would be a very cool hack. Good luck!

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  15. Waste of time by MrResistor · · Score: 5, Informative

    I do customer service repair and testing for high end video servers, particularly the RAIDs attached to them. Based on my experience, what you're proposing seems like a waste of time and money. I think your friend would be much better served with a more traditional RAID setup. For single-user editing station a 4 drive IDE RAID-0 should be able to handle the load, and a similar SCSI array should be more than capable. I recommend SCSI.

    In typical storage situations you have 2 issues you need to consider: bandwidth and space. With video you add a 3rd issue which can easily eclipse the other 2 in importance: latency. Latency can cause hiccups in your data stream which would be unnoticable in any other application, but become painfully obvious with video, and any networked sollution is going to add latency. The more network is involved, the more latency will be added, which is why I would absolutely advise against a distributed solution. For that reason even if your network has the same theoretical bandwidth as your RAID, it will be slower, and that will kill the video stream.

    Anyway, your friends needs should be taken care of with an older (cheaper) SCSI RAID controller and some older SCSI drives, say 9-18GB 7200RPM. In a RAID-0 configuration they should be able to handle simultaneous record and playback of 12Mbps per drive, with a 3 hour capacity at that maximum bandwidth. For example: my test fixture has space for 5 drives, so it can handle 3 hours worth of 60Mbps video, which is decent for HD and ludicrous for SD. You should be able to pull together something like that for less than what you're planning to spend on RAM and Gig-E network gear, and will be more reliable (minimized data loss in case of power outage) and a hell of a lot cheaper to operate (unless your friend lives in the magical land of free electricity).

    Bear in mind that what I've described is the test I use to verify the fitness of our drives, and we use it because we've found that it is more strenuous than any commercially available SCSI test setup. Most new drives are able to handle it, but used drives can be a different story even though they might be perfectly good for any other application. With used drives you may have to drop your expectations to 10 or even 8Mbps per drive, so plan accordingly.

    That said, you also want to take a close look at your encoding/decoding hardware, as that can be a source of problems. Don't just look at the hardware specs, either, as all to often the driver capabilities fall far short of what the hardware can theoretically do.

    --
    Under capitalism man exploits man. Under communism it's the other way around.
  16. Network wins over disk... by BDW · · Score: 2, Interesting

    ...but only if you can deal with the OS latency. My very rough understanding says any networking based on the OSI model is going to pay a sufficiently large penalty in OS latencies that remote memory probably won't be any faster than a good local disk subsystem. However, if you can get rid of that latency, you can win BIG.

    Since the questioner is looking at using commodity hardware with a commodity OS using a commodity networking protocol, my gut feeling is that (s)he doesn't have a prayer. It is a cool idea, but latencies are likely to be too high.

    The /. dreamers don't need to give up all hope, however. :) There is relevant work in the academic literature, using specialized hardware and software of course. The work I'm familiar with is from Hank Levy's group at UW. To sum up, based on what I remember from a class I took back in '98 from Mike Feeley (first author on said paper; also did his PhD thesis on the topic):

    The motivating example came from Boeing. They had a bunch of CAD workstations all with lots of RAM (by the standards of the day). However, looking at any nontrivial part of the design required more memory than any single workstation. Paging to disk was S-L-O-W. So why not use the frequently idle memory on the other workstations? The result of the UW work was a sort of global memory management, with paging to remote workstations in the cluster as well as to disk. Using memory on the remote workstations was significantly faster than using the local disk.

    So what about latency from the network stack? IIRC (and it has been five years since I talked to Mike about this...) they used myranet. In some sense myranet is basically DMA to remote workstations. One myranet node issues a write request in software, which includes the source address in memory for the data to be copied, a target node in the cluster, and the target memory address on the target node. The myranet hardware on the local workstation does DMA from the source memory location, fires it over fibre to the remote workstation, which dutifully does DMA from the myranet card to the memory locations specified by the sender. This is very fast, but not the stuff traditional general-purpose computing has been made of.

    Brian

  17. Is this a troll? Or are you on crack? by Wakko+Warner · · Score: 2, Funny

    I hate to be frank, but your solution is equal parts ambitious, elaborate, expensive, unreliable, slow, kludgey, and stupid, with an extra helping of stupid.

    Buy a single SCSI RAID card with three channels, three 36GB U160 drives (10 or 15K, doesn't really matter), and set up a hardware RAID 0 stripe. You'll save money and be able to edit any amount of video you want. Hell, buy a SINGLE 72 gig 10K drive and a high-quality single-channel SCSI controller. You'll save even more money.

    This is the best way to do this. You've at least proven there's at least one other way to do it.

    - A.P.

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
  18. Space, Power, Noise, Setup Time? How about... by Qbertino · · Score: 2, Informative

    ..one or two of these
    Briefly said: Kicks any RAID (SCSI or not) and your RAMdisk solution up and down the street.
    It could be a tad pricey though, as you might wanna suspect. :-)

    --
    We suffer more in our imagination than in reality. - Seneca