Slashdot Mirror


Linux Databases with Huge Tables?

Eugene writes "I am working on a project to create an expansive network monitoring and all encompassing database solution. With archival and ageing of all data we are looking at a database that grows by 40Gb a year, with some tables being well over the 2Gb file limit in Linux/x86. We have narrowed ourselves down to Oracle 8i (for it's stalwart referencial integrity checking, and PL/SQL for tracing across aged route information) and PostgreSQL (for it's object polymorphism and CIDR data types for IP addresses). We are concerned with the robustness of Postgres compared to "grown up" Oracle, as well as file size limitations. Can anyone give us any advice on these issues? "

4 of 220 comments (clear)

  1. Postgresql is good by Grim · · Score: 4
    We have been using postgresql for all our database needs for the last 18 months where I work. It has gone through a number of changes and is now a LOT more stable than it was only a year ago.

    If your data is critical and 100% availability is essential, I would probably still go for oracle, as it has a far more mature background and, whilst closed source, is very stable. Otherwise, postgresql is a very good choice. It handles tables of effectively infinite size (it just breaks them into 1GB chunks to avoid the filesystem limit). And it has the bonus that is is open source. If you need a feature, you can add it. I am sure that if you are part of an organisatin generating 40GB of data a year, you probably have a few programmers floating around to add enhancements.

    As an addition, I have found the postgres development team to be VERY responsive and helpful. Personally I would say they are probably the most active and helpful open source dev team I have seen.

    If you want something out of the box, with 100% availability, go oracle. If you want something that you can control, that will grow with you, and will eventually get 100% reliability (it is very close right now), go postgres.

  2. PostgreSQL and Large Tables on Linux by EricTheRed · · Score: 5

    PostgreSQL on Linux handles large tables pretty well. It breaks each table into 1Gb segments to get round the 2Gb file size limit. We chose 1Gb as it was a nice round figure, which makes it easy to organise the tables when they get really big.

    When they do become huge, you can then move the segments onto different filesystems to ease the load (although currently this is not a simple task). This also applies to indices.

    I know of a few large databases out there using PostgreSQL, but one I have access to is the TASS project (http://www.tass-survey.org), which currently has about 28Gb of astronomical data.

    The largest database I have is currently 5Gb, but that's primarily limited by disk space and not by postgresql.

    --
    Java gaming nut - http://www.retep.org/ or for the rail http://uktra.in/
  3. Be aware of the complexity of Oracle by ZamZ · · Score: 5
    If you are looking for a 'black box' to put your data in be aware that Oracle, even at only 40G of data, requires some heavy administration at times. You may want to consider MySql for an easier ride on admin.



    As well as knowledge on tuning the database you'll have very specific Oracle SQL tuning to do. I've been doing this type of thing for a number of years now and I still don't know 100% of what is needed - there are always knew features to get aquainted with.


    In saying that there are some good resources to get you started. Have a look at http://www.orapub.com for white papers and before you even start to lay the database down see the Oracle white paper (from their website) 'How to stop defragmenting and start living'.


    You should also take into account that 8i is a new-ish release of Oracle. Keep an eye on the bug lists for it and make sure you won't be impacted by anything.


    Apart from the warnings, I've run around 60G Oracle db's on Linux and as long as your server is beefy enough you should have no problem.


    One big thing, if you do go down the Oracle route have t-shirts made up with 'I've looked at v$system_event today, have you?' written on them and wear them. If there is one way of finding what your performance problems are/will be its this table. The only real exception is bad SQL which you need to trap in other ways

  4. Interystems' Cach� by dublin · · Score: 4

    I'm not a database expert, but I've done a fair amount of healthcare IT consulting over the years, and run across Intersystems' Caché database quite often.

    Most people aren't familiar with Caché from Intersystems, but if you're building a very large, sparsely populated transactional database (like an electronic medical record, for instance), it's at least an order of magnitude faster than Oracle, even after doing unnatural things with Oracle. (This info came directly from a major EMR vendor that benchmarked Cache and all the usual suspects and found nothing in the same league for this application. According to his tests, Caché was 30x faster than Oracle out of the box - Intersystems' website claims Caché is 20x faster than RDBMS competitors.)

    The technology is quite interesting in some regards: It is very mature, having its roots in the old MUMPS or "M" system (about as old as Unix), but has been updated quite nicely as time has passed to provide quite capable SQL and object-oriented interfaces, even though it is neither an RDBMS or an OODB internally.

    Another nice feature is its distributed caching protocol, which allows you to build a logically huge database server piecewise from a number of distributed servers. A number of large installations, including Boston Children's Hospital (IIRC), use this for their EMR - last I heard, BCH's database was spread over 100-150 servers. This would seem to fit nicely with the Linux way of doing things.

    It runs on Linux and a free download is even available on thier website, or they'll send you a CD for the asking if you're one of the unfortunates stuck on the end of a phone line.

    You can find Intersystems at http://www.intersys.com, not exactly the world's most intuitive URL, so I thought I should include it here...

    --
    "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post