Slashdot Mirror


User: AmiChris

AmiChris's activity in the archive.

Stories
0
Comments
9
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 9

  1. Re:Have you considered memory-mapped files? on Dumping Lots of Data to Disk in Realtime? · · Score: 1
    Actually having started I do see a major advantage of the memory mapping. I don't have to worry about multiple threads read/writing to the same file. Right now I'd have to put a mutex around the file, or do some kind of file sharing between threads. Why? Because you have to seek and then read/write.

    I'm curious about the 5GB files though... I was under the impression that NTFS in general won't allow you to create or open a file > 1GB.
    Nahh, I've had files of ~4Gb. Fat32 has a limit around here at 2^32B. The old DB had multiple files. NTFS will theoreticall hold ~2^64 bytes per file and volume:
    Size Limitations in NTFS and FAT File Systems

    Oh this is interesting:
    If you use large numbers of files in an NTFS folder (300,000 or more), disable short-file name generation, especially if the first six characters of the long file names are similar. For more information, see "Optimizing NTFS Performance" later in this chapter.

    Maybe that's why things sucked so bad when I had a bunch of files in one dirrectory. It also seems to imply I am allowed to have that many files.

  2. Wrap up on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Boy I didn't expect this thread to explode like this while I was gone. Some people asked for more info so I'll just make some points:

    * database is 5GB right now, after improving the thing's performance, it could be 10 times bigger. 50GB

    * Yes, some people guessed it. It's financial data. I'm tring to dump all the trades of all stocks and futures in the US and EU. Right now we do a subset, but there's always something missing.

    * Hardware. Yes I can get one or two monster machines for our server farms. Some of our customers run the current software locally, so I can't demand anything too fancy, like a better OS, Oracle, or 20 disk RAID from them.

    * The data needs to be accessed in real time. If you've got a chart open you want to see the ticks coming in real time, and you want to be able to scroll back a few weeks.

    * As far as clustering/load balancing. Yeah I can do this in our server farm, but I want each unit to work better first.

    * The individual entries are small(~50B) and fixed sized per instrument.

    * "How much processing has to be done per item?" - almost nil
    * "How long can you delay comitting them to a database?" as long as they'll fit in memory
    * I'd say clients ask for a chart a few times a second. A chart doesn't require all the data points of an instrument.

  3. Re:sqlite @ 120,000 inserts per second on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Umm, with your example. Is that 120K/s for the first 10s, or will it keep that up for a few months? Is it all in memory or can I have serveral GB of data?

  4. Re:in-memory on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    An entire hour of updates might well fit in RAM! My proposed solution with flat files would take advantage of this. I'm thinking of using linked lists to store the entries for each instrument and having a background thread come round and write out all the entries of an instrument at once.

    This thing is going to run for months at a time. Eventually the stuff has to go to disk.

  5. Re:DBM Family: esp GDBM and Berkeley DB on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Ok, looking at BDB at sleepycat. It looks like you've got one table per file with this thing. There's also the problem that I can't open up our source :-(

    GDBM looks really simple. It also seems to have just one table per file. So all my instruments would have to go in that one. I'm wondering if something that looks so simple really has the performance I need.

  6. Re:HP-IB and ISAM - Ahh now I know as name for it. on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Searching freshmeat I actually found some projects with ISAM in thier names. I'm not sure they look too promising though.

    Thanks. I now know the name for it. It still looks like I might be better off writting something from scratch. Maybe I can slap it up on sourceforge afterwards.

  7. Re:NetCDF or HDF5 - interesting on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Cool, but these are file format specifications. Are there any engines that work with these which are really fast? Do they cache a bunch of stuff in memory or will that still be my job?

  8. Re:Have you considered memory-mapped files? on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    Won't fly. The current beast I'm tring to improve has a files larger than 5GB, which is larger than any 32bit memory space can be.

    I'm also not sure I see an advantage. I've considered trying windows "overlapped IO", but I'd like to stear clear of everything platform specific even if it means using lots of threads.

  9. Re:Just use the file system - windows blows on Dumping Lots of Data to Disk in Realtime? · · Score: 1

    I tried writting a prototype a while back to see if it could be done like that. The peformance was ok with 1 file per instrument, as far as to program goes. I had two threads. The writter would append all the entries for one instrument at a time to the end of its file.

    When you click near the directory (parent directory did it) in the "file explorer" the whole thing locks up for few minutes, desktop, task bar, etc. I haven't tried making a tree of subdirrectories to avoid this problem. I'm not too sure it would help.

    I'm thinking it's probaby be better to have just a few files and allocate blocks out of them.