Dumping Lots of Data to Disk in Realtime?
AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "
fourth ask slashdot in a row. slow day eh?
Have you considered a 2-stage approach? Stuff it to disk, and process/index it separately? A fast stream of data would let it all get recorded without loss, and then you could use whatever resources are necessary to index and search without impacting the data dump.
Cost... Are you going to go for local storage or NAS? Need SCSI and RAID or a less expensive hardware setup? Do you think gigabit ethernet will be sufficient for the transfer from the data dump hardware to the processing/indexing/search machines?
Sounds like you might want to run a test case using commodity hardware first.
Yeah, like it isn't obvious that this guy works for the government's TIA program and is looking for ways to maintain all of the data culled from the thousands of audio and video sensors they have planted around.
Suuuure.
Check out wonderware InSQL. We update roughly 50k points every 30 seconds without loading the server much at all. Pretty nice product, also has some custom extensions to SQL built in for querying the data (eg cyclic, resolution, delta storage, etc etc).
http://www.wonderware.com/
Of course, you'll need your data to come from an OPC/Suitelink/other supported protocol, but should work nicely for you.
- Joshua
Unless you really want to do a LOT of work. This sounds very much like a SCADA system. There are vendors of such systems. Most of the realtime databases are designed to stay in a large, proprietary, RAM database which is occasionally dumped to disk for backup purposes.
In order to process so many points realtime, it usually will have to be in RAM for performance reasons.
Zed's dead baby. Zed's dead.
I know your working with windows but when I read this I said yes.
I'm guessing someone out there has done something like this before.
Google has a cluster of machines far larger than you need but their approach was a Linux cluster. Plus, for the amount of writes going on your going to want not to have any burdens on the system that are not needed.
You may want to look how video streams are composed, but basic idea is very simple - just dump it all in the arrival order and keep track of what did you write at which offset in some table of contents. Dump tables of contents at some regular offsets so you would be able to find them easily. That's it. Just one thing - use offsets relative to TOC, this way they'd consume less bits each, and align data - it also saves several bits from the other side.
And remember - Keep It Simply Stupid. Be sure you can reed it in hex editor when trouble comes.
Keep a file per device. The OS will cache appropriately. The files will eventually get horribly fragmented, depending on which file system you choose. This should not be too much of a problem, depending on the read access pattern -- and if it is a problem, just be careful about which file system you pick. Reiser4 with automatic repacking would be the perfect candidate, but I haven't followed the development closely or tried the repacking myself.
Finally! A year of moderation! Ready for 2019?
You can definitely use Oracle to write out 2000 updates per second if your hardware is up to it and your db skills are good.
so. why are you not thinking about a real big enterprise level database? if NASDAQ can do it you can too.
going with the flat file/caching solution: if you're handling that many transactions is a windows os/file system truely a viable solution? i'm not bashing MS here i'm just curious what others think about so "many" disk and cache transactions in say 2003 or longhorn.
With your specs, chances are you will either need a very beefy machine, or a distributed approach spreading the load across many machines, regardless of the software approach. But I wouldn't be surprised if a good RDBMS would outperform a flatfile approach. It is what they're designed for after all.
11*43+456^2
I have a system that can record 32 streams of data 44,100 times per second. It's called a recording studio, and I make music with it.
If your data streams are continuous, and can be represented as audio data, then you are pretty much dealing with a solved problem, and your other problem of selecting from large number of possible 'instruments' is solved by an audio patchbay.
If this isn't feasible, then a number of solutions might be appropriate (spreading the load over a number of machines/huge ram caches/buffering/looking at the problem and thinking of a less intensive sampling strategy/etc.) but without more information on the sort of data you are collecting, and exactly how quickly you need to access it, it's very hard to be specific.
A pizza of radius z and thickness a has a volume of pi z z a
Where I work, they handle like 300 million users and have data associated with each user. Unlike AOL which used sybase to store users (and crawled) these guys use a filesystem based repository. It's a fast replicated database indexed by only one key - the username. It scales great and works on FreeBSD.
this patent and related patent should answer a few questions.... (Google fs is not as good for search scans)
Sure, optimize single node performance first, but keep in mind that horizontal scaling is something to look for. Put N machines behind a load balancer, ingest gets scattered among 'n' machines, queries go to all simultaneously. Redundant Array of Inexpensive Databases :-)
Linux Virtual Server in front of several instances of your windows box will do, with some proxying stuff for queries. Probably cheaper than spending months trying to tweak single node to get to your scaling target, and will scale trivially much farther out.
You will likely need to run this baby all in RAM, with optional persistant storage if needed. If you don't have enough memory, go for distributed solution: data from devices a,b,c go to machine1, from devices d,e,f to machine2, etc. The per device distribution algorithm should consider the amount of data from each device.
Sig ?
the solution to your problem comes in the form of a little known software application from a vender called Microsoft.
:P
The program is called Microsoft Access 97
Check out Kx or VhaYu
You could use the XFS file system to get faster read/write speeds. In addition I'd recommend a special RAID setup. You would want SCSI320 RAID striping over 4 drives, in addition you'd want it mirroring over a further 4 drives. You'd need to set up a RAID array to achieve this, but it's well worth it for the performance gains. Make sure your RAID is 8xAGP or PCI-X. PCI is far too slow.
Open-Source > *
You didn't specify some key parameters. How big are these updates, and how do they get multiplexed? What kind of retrieval do you want to do in the data?
If your data are already arriving on a single socket, just mark up the data and write it out. Then you can retrieve anything you like with linear search. And you can be reasonably certain that you have captured all the data and will never lose it due to having trusted it to some mysterious DB software.
If linear search isn't good enough, you have to specify the sorts of queries you want. All information from a particular sensor? Information from all sensors at a particular time? Does this information have to be available on-line, or can you answer your queries in batch. Sort/merge is really efficient if you don't need real-time queries. You can build indexes in real-time almost as efficiently, if you know what you want to index. The basic technique is the same, but more complicated to set up - batch up the information to be indexed, and do a series of sort-merges to accumulate the indexes.
block it.
interleave it.
write a new timestamp periodically
as for what instruments you are recording and their parameters, use a simple hash table.
the time stamp that corresponds to the introduction or deletion of instruments or change in recording parameters is hashed with the corresponding configuration. This allows 100% utilization of existing file system speed and space for recording. be careful with parameter record so you don't lose sync to data that looks like a time stamp. probabliity might be once in a million years but Murphy will have it happen 40 times in your most important hour of recording.
it's sort of rocket science...but
more like geophysical data recording in oil exploration industry, where you might look for examples.
If you need some help, i'm available for systems and algorithm design. I WILL NOT code. $2K/day plus first class travel and expenses.
I've some 35 years experience in instrumentation and telemetry.
would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be.
I'd find yourself someone with data warehousing experience (not the same thing as standard DBAs). I've worked with such people and 2000 updates a second isn't a big deal. We have no problem doing hourly bursts of millions of records with Oracle on some relatively modest hardware. It will cost you though...
iSeries
Here's a thought - just use a hard-RAM based database.
Either make a big ramdisk and put your database out there (see my Journal from a few months back, ramdisk throughput is pretty damn fast from the local machine, given certain constraints, and random access writing is hella fast), or use a database that runs entirely in memory (think Derby, aka Cloudscape that comes with WebSphere Application Developer.)
When you got your data, save it out to the hard drive.
Granted it helps to have a box with a ton of memory in it, but they are out there now, almost affordable. If you are collecting more than 4G of data in one session, well YMMV - but 4G is a LOT of data, perhaps consider your approach.
Glonoinha the MebiByte Slayer
2.000 items/sec means that you must do bulk updates. You cannot flush to disk 2.000 times per second. So you program will have to store the items temporarily in a buffer, which gets flushed by a secondary thread when a timer expires or when the buffer gets full. use a two-buffer approach so you can stil receive while committing to the database.
Depending on you application it may be beneficial to keep a cache of the most recent items for all instruments.
You also have to consider the disk setup. If you have to store all the items then any multi-disk setup will do. If you actually only store a few items per instrument and update them, then raid-5 will kill you because it performs poorly with tiny scattered updates.
Do you have to backup the items? How will you you handle backups while your program is running? This affects your choice of flat-file or database implementation.
The design of a data acquisition systems will of course differ, depending on how much data it records per sensor, how many sensors there are, how often to record the data, and if the data is to be available for online or offline processing.
In most of the "hard" cases, you will use a pipelined architecture, where data is received on one or more realtime boxes, and buffered for an appropriate (short) period. A second stage occurs when data is collected from these buffers, and buffered/reordered/processed to make writing the desired format to a file or DBMS easier. The last stage, is, of course, to write it. You might use zero or more computers at each stage, with a fast dedicated network in-between. You might even decide to split up some of the stages even further. Depending on how much you care about your data, you may also add redundancy. And make sure it's fault-tolerant, it's generally better to loose some data, as long as it's tagged as missing, than to loose it all. To check this in real-time you can also add data-monitoring anywhere it makes sense for your system.
In the simper cases, you simply remove things not needed, such as a soundcard instead of dedicated realtime-boxes, redundancy, monitoring, dedicated network, etc...
Some commercial off-the-shelf systems will surely do this. But the more advanced systems, you still build yourself, either from scratch, or by reusing code you find in other similar projects (I'm sure there are some scientific code available from people interested in medical science, biology, astrophysics, geophysics, meteorology, etc...).
Most of the "heavy" systems will not run on Windows, or even Intel, due to limitations of that platform for fast I/O. This has obviously changed a lot recently, so it's no longer the stupid choice it was, but don't expect too many projects of this kind to have noticed, as they probably have existed much longer.
I did some work on a DVD-Video authoring system that had some incredible file system requirments (obviously, when involving video data and the typical 4 GB data load for a single DVD disc).
The standard file API architechture just didn't hold up, so we (the development team I was working with) had to rewrite some of the file management routines ourselves and work directly with the memory mapped architechture directly. This does give you some other advantages beyond speed as well, as once you establish the file link and set it in a memory address range you can treat the data in the file as if it were RAM within your program, having fun with pointers and everything else you can imagine. Copying data to the file is simply a matter of a memory move operation, or copying from one pointer to another.
The thing to remember is that Windows (this is undocumented) won't allow you to open a memory-mapped file that is larger than 1 GB, and under FAT32 file systems (Windows 95/98/ME/and some low-end XP systems) the total of all memory mapped files on the entire operating system must be below 1 GB (this requirement really sucks the breath out of some applications).
Remember that if you are putting pointers into the file directly, that it works better if the pointers are relative offsets rather than direct memory pointers, even though direct memory pointers are in theory possible during a single session run.
They have their own algorithm that claims to be 200x faster than normal RDBMs, using 'tick tables'. modulus RDM server
This may be gross overkill, but there's specialized hardware specifically designed for sustained high-throughput disk storage. A company called Conduant makes specialized disk controllers that use on board microcontrollers to drive arrays of disks. When I last saw them demoed, they could sustain writes of 100MB/sec using direct card to card transfers across the PCI bus. They can configure a data acquisition card to directly store information into a shared buffer on the disk controller across the PCI bus. The disk controller then picks the data up and drives it across ten IDE channels. That was a few years ago, these days it looks like they can sustain 200MB/sec with a controller, and up to 600MB/sec and 6TB of capacity with custom box mounted in a rack.
I'm not so sure what their story is regarding reading or querying. My guess is you lose a lot of bandwidth, but not all. Anyway, it might be worth checking out.
http://www.conduant.com/products/overview.html
Another thing is that modern computers cam have lots innate capacity themselves. My hunch is that you could do a lot with a couple modern disks on seperate SATA channels and several GB of RAM. Maybe this is only a software problem...
as others have said, just stream the data to disk with some kind of big RAM buffer in between. each instrument can go to a separate directory, each minute or hour of data goes to a separate file. A separate thread indexes or processes the data as needed.
And don't forget the magic words: striping. you should interleave your data across many disks, and the index files should be on separate disks as well.
Do striping+mirroring for data protection. do the striping at the app level for maximum throughput, do the mirroring at the hardware level.
When you aren't going through layers of crap like an SQL database, you should *fly* like this on modern hardware.
IMO, it is done by both O/S and SCSI hardware
Kdb+ by KX Systems (http://www.kx.com/ is by far and away the best thing for this. Its main use is to store tick data from financial markets, and is excellent at this (if expensive).
From how you descibed your needs, this would probably bit the bill..
The moral of the story is to determine up-front how much of that data you really need.
If you don't want crime to pay, let the government run it.
You don't mention the type of instruments or data. Perhaps you could store it via syslog on a remote syslog server.
NetCDF and HDF5 are optimized binary file formats for storing incredibly large amounts of data and quickly retrieving it.
I'm more familiar with NetCDF (because I use it) so let me tell you some of the things it can do. (HDF5 can also do these things, I'm sure).
With NetCDF, you can store +2 gigabyte files on a 32 bit machine (it supports Large File support). I've saved 12 gigabyte files with no problems. It supports both sequential and direct access, meaning you can read and write either starting from the beginning of the file or at any point in the middle of the file.
The format is array-based. You define dimensions of arrays and variables consisting of zero, one, or more dimensions. You can also define attributes that are used as metadata, information describing the data inside your variables.
You can read or write slices of your data, including strides and hyperslabs. This allows you to read/write only the data you're interested in and makes disk access much faster.
It's also easy to use with good APIs. They have APIs for C, Fortran95, C++, MATLAB, Python, Perl, Java, and Ruby.
Take a look at it. It might be what you're looking for.
-Howard Salis
Favorite
Man, that must be some cool sounding music if it has thousands of instruments playing at the same time. Care to share the name of this supergroup?
Is that a real poncho? I mean, is that a Mexican poncho or is that a Sears poncho?
I seem to remember the SQLite homepage saying it could handle a few million inserts in a few seconds. So asuming you mean 2000+ updates a second in total and not 2000+ per instrument thats quite a safety magin.
You need to do 2000+ updates a second?
*Many* RDBMS systems can do this without breaking a sweat.
Do some googling on Interbase for example - one of the success stories for IB is a system that does 150,000 inserts per second - sustained. It's a data capture system that may well be similar to yours.
Oracle can definately do it - but you'll probably need a good Oracle DBA to tune it up properly.
Informix can definately do it as well - don't know about the latest version, never used it, but whatever was current circa 1999 (v5?) could handle your needs as well.
Hello, this is your bank calling ...
"MasterCard member banks added an EMV chip to 40% of the 200 million MasterCard"
"In Asia Pacific, Visa has a greater market share than all other payment card brands combined with 59 percent of all card purchases at the point of sale being made using Visa cards. There are currently more than 365 million Visa cards in the region." (2003)
If Visa had 365 million cards holder just in Asia Pacific in 2003, I wonder how many they have worldwide nowaday...
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
On the coding end, there are numerous (hell, hundreds) of commercial, F/OSS, and books on ISAM libraries for you to use for the actual storage and retrieval. It may even be included in your existing libraries given how old the technique is now. I was doing this back in the '80s for the US Navy using a 24 bit, very slow, mini-computer, so any normal box should be able to handle it today!
We use these techniques in electronic instrument monitoring, logistical systems, systems engineering, you get the idea. You may want to mosey over to the HP developer web site to see if there is a drop in solution, as I imagine there is (sorry, haven't looked).
I hope this helps.
"[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go
If you want speed, I'd look into either of these.
If you plan on inserting into a database at some point, whether directly or buffered, pay attention to insert performance. There are two lessons I learned. One, the typical ODBC interface creates an implied transaction for each separate insert statement. So, group many thousands of inserts into one transaction. The second point is using bulk inserts. ODBC has a mechanism for sending arrays of parameters for an insert statment. So, you could create arrays of 2000 parameters and send one bulk instruction to the server, rather than 2000 individual inserts. This makes a huge difference in performance. The problem being not all ODBC drivers are up to it. I am able to insert many thousands of records in a few seconds using nothing special hardware. Good luck.
This family of databases is the heart of sendmail, and some SQL engines are built on top (MySQL if memory serves).
The interface is a model of simplicity: pointers to arbitrary length buffers for keys and data. All you need is key scheme that provides the post acquisition access that you require.
Berkeley offers hash and BTree style organization of the keys.
It may use memory mapped FileIO under the hood and handles all transfer of multiple buffers.
It provides multipe files or multiple tables in one file and you can control the cachesize.
It can run 2,000 inserts per second on hardware from the mid 90s. (UltraSparc II 450)
Berkeley DB (www.sleepycat.com)
As far a I know it runs on just about everything including several embedded OS's, Windows and every variant of Unix.
---- Smokin' another sig.
Based on my (relatively basic) knowledge of how databases work these days, using large in-memory caches and fast commits, I wouldn't be surprised if a good enterprise database could handle this rate of commits.
You should remember that 2000 commits != 2000 random disk accesses!
Maybe what you look for is already solved by high energy phycisists: The ROOT toolkit is at least supposed to handle very large datasets (I never tried that, though).
Faircom CTree-Plus might.
Advantages:
- it's fast and it's not constrained by column length. If you want a table with 16,000 columns, go right ahead.
- it's very portable. Runs on just about every operating system that has more than 100 users.
The disadvantages:
- last time I looked (admittedly) about 4 years ago, their SQL integration could have been better.
- it's not a high-level database. To work most effectively with it, you need to know about the way that your data is stored.
I'm sure it's improved a great deal since then.
here. Effects of filesystem/RAM/CPU/SCSI on the results are discussed.
bundaegi is good for you
I tried writting a prototype a while back to see if it could be done like that. The peformance was ok with 1 file per instrument, as far as to program goes. I had two threads. The writter would append all the entries for one instrument at a time to the end of its file.
When you click near the directory (parent directory did it) in the "file explorer" the whole thing locks up for few minutes, desktop, task bar, etc. I haven't tried making a tree of subdirrectories to avoid this problem. I'm not too sure it would help.
I'm thinking it's probaby be better to have just a few files and allocate blocks out of them.
Cool, but these are file format specifications. Are there any engines that work with these which are really fast? Do they cache a bunch of stuff in memory or will that still be my job?
Searching freshmeat I actually found some projects with ISAM in thier names. I'm not sure they look too promising though.
Thanks. I now know the name for it. It still looks like I might be better off writting something from scratch. Maybe I can slap it up on sourceforge afterwards.
Boy I didn't expect this thread to explode like this while I was gone. Some people asked for more info so I'll just make some points:
* database is 5GB right now, after improving the thing's performance, it could be 10 times bigger. 50GB
* Yes, some people guessed it. It's financial data. I'm tring to dump all the trades of all stocks and futures in the US and EU. Right now we do a subset, but there's always something missing.
* Hardware. Yes I can get one or two monster machines for our server farms. Some of our customers run the current software locally, so I can't demand anything too fancy, like a better OS, Oracle, or 20 disk RAID from them.
* The data needs to be accessed in real time. If you've got a chart open you want to see the ticks coming in real time, and you want to be able to scroll back a few weeks.
* As far as clustering/load balancing. Yeah I can do this in our server farm, but I want each unit to work better first.
* The individual entries are small(~50B) and fixed sized per instrument.
* "How much processing has to be done per item?" - almost nil
* "How long can you delay comitting them to a database?" as long as they'll fit in memory
* I'd say clients ask for a chart a few times a second. A chart doesn't require all the data points of an instrument.
The people who develope these formats are used to dealing with large data sets that need to be read and written fast. I've seen terabyte files used as inputs/outputs for scientific computing applications. They've certainly thought about the fastest ways of doing I/O. You can even substitute your own FFIO routines (people using Crays do this).
You can set the buffer to whatever you want and it really depends on your computing architecture on how buffering is handled. Normally, the data is kept in memory until flushed or until the file is closed.
If you have more questions, check out their websites and send emails to their mailing lists.
Favorite
data acquisition systems for large experiments. Such things like the hardware and software used at particle physics labs on their detectors: lots of individual sensors in a huge array that has to be sampled a hell of a lot in one second.
:D
Another thing to look into is testing for dynamic loads on cars or aircraft. At least for aircraft, they'll put thousands of accelerometers all over the frame to measure the various accelerations.
Both of those are prime examples of a similar system to yours. And such users would have insight into the problems you are looking at, as well as be potential customers in the future.
The application is vibration monitoring in industrial machinery. A reasonable "session" in such an environment would be a machine startup or stop - depending on the machine this could take several minutes to hours. For the "hours" scenarios, you are typically looking at heat soaking - run the machine up part way, soak, run it up some more, soak, etc. You want to avoid differential thermal expansion. That's where the rotor of the machine expands faster ('cause it's nearer the heat source) and grows right into the case of the machine. That's bad... :-(
Anyhow, I have no doubt there are even higher speed collection systems out there... likely for very specialized applications.
Anyway, the only alternative to using the OS file system is to implement your own file system in user space, or use a database as a file system. If you choose to roll your own, remember to implement fsck and possibly journalling, otherwise you risk losing everything if the power goes.
Finally! A year of moderation! Ready for 2019?
I haven't looked in the C++ libs in quite a while but I would be rather surprised if the functionality were not in an existing library. I would, however, put serious thought into rolling your own. I'd offer to help but it's been far too long since I mucked with either C++ or rolling my own db code (25 years). Sadly, these days it's all SQL, XML, and web services, and that is about as interesting as watching paint dry, or grass grow {sigh}.
"[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go