Slashdot Mirror


Dumping Lots of Data to Disk in Realtime?

AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "

6 of 127 comments (clear)

  1. 2-stage approach by eagl · · Score: 5, Informative

    Have you considered a 2-stage approach? Stuff it to disk, and process/index it separately? A fast stream of data would let it all get recorded without loss, and then you could use whatever resources are necessary to index and search without impacting the data dump.

    Cost... Are you going to go for local storage or NAS? Need SCSI and RAID or a less expensive hardware setup? Do you think gigabit ethernet will be sufficient for the transfer from the data dump hardware to the processing/indexing/search machines?

    Sounds like you might want to run a test case using commodity hardware first.

  2. Suuuure. by Seumas · · Score: 4, Funny

    Yeah, like it isn't obvious that this guy works for the government's TIA program and is looking for ways to maintain all of the data culled from the thousands of audio and video sensors they have planted around.

    Suuuure.

  3. Wonderware InSQL by Dios · · Score: 4, Informative


    Check out wonderware InSQL. We update roughly 50k points every 30 seconds without loading the server much at all. Pretty nice product, also has some custom extensions to SQL built in for querying the data (eg cyclic, resolution, delta storage, etc etc).

    http://www.wonderware.com/

    Of course, you'll need your data to come from an OPC/Suitelink/other supported protocol, but should work nicely for you.

    - Joshua

  4. A commercial RDMS can cut it by jbplou · · Score: 4, Informative

    You can definitely use Oracle to write out 2000 updates per second if your hardware is up to it and your db skills are good.

    1. Re:A commercial RDMS can cut it by gvc · · Score: 4, Interesting

      "Can [the storage backend] handle 2000 random seeks per second?"

      The short answer is "no."

      A 10,000 RPM disk has a period of 6 mSec. That's 3 mSec latency on average for random access (not counting seek time or the fact that read-modify-write will take at least 3 times this long: read, wait one full rotation, write).

      So one disk can do, as a generous upper bound, 333 random accesses per second. I'll spare you the details of the Poisson distribution, but if you managed to spread these updates randomly over a disk farm, you'd need about 2000/333*e = 16 independent spindles.

      The trick to high throughput is harnessing, and creating, non-randomness. You can do a much better job of this with a purpose-built solution.

  5. Ramdisk database by Glonoinha · · Score: 4, Informative

    Here's a thought - just use a hard-RAM based database.
    Either make a big ramdisk and put your database out there (see my Journal from a few months back, ramdisk throughput is pretty damn fast from the local machine, given certain constraints, and random access writing is hella fast), or use a database that runs entirely in memory (think Derby, aka Cloudscape that comes with WebSphere Application Developer.)

    When you got your data, save it out to the hard drive.

    Granted it helps to have a box with a ton of memory in it, but they are out there now, almost affordable. If you are collecting more than 4G of data in one session, well YMMV - but 4G is a LOT of data, perhaps consider your approach.

    --
    Glonoinha the MebiByte Slayer