Dumping Lots of Data to Disk in Realtime?

Posted by Cliff on Saturday May 14, 2005 @01:04AM from the too-much-for-an-RDBMS? dept.

AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "

1 of 127 comments (clear)

Min score:

Reason:

Sort:

Re:A commercial RDMS can cut it by gvc · 2005-05-14 03:56 · Score: 4, Interesting

"Can [the storage backend] handle 2000 random seeks per second?"

The short answer is "no."

A 10,000 RPM disk has a period of 6 mSec. That's 3 mSec latency on average for random access (not counting seek time or the fact that read-modify-write will take at least 3 times this long: read, wait one full rotation, write).

So one disk can do, as a generous upper bound, 333 random accesses per second. I'll spare you the details of the Poisson distribution, but if you managed to spread these updates randomly over a disk farm, you'd need about 2000/333*e = 16 independent spindles.

The trick to high throughput is harnessing, and creating, non-randomness. You can do a much better job of this with a purpose-built solution.