Slashdot Mirror


Can You Compress Berkeley DB Databases?

Paul Gear asks: "I'm wanting to create a database using Berkeley DB that includes a lot of textual information, and because of the bulk of the data and its obvious compressibility, I was wondering whether it was possible to have the DB libraries automatically compress it at the file level, rather than me compressing each data (rather small) item before putting it in (which would result in much less gain in compression). Section 4.1 of the paper "Challenges in Embedded Database System Administration" talks about automatic compression, but that is the only place in the documentation that it is mentioned. Can anyone point me in the right direction?"

2 of 10 comments (clear)

  1. ZlibC by Bazman · · Score: 3
    from The zlibc web site

    Zlibc is a read-only compressed file-system emulation. It allows executables to uncompress their data files on the fly. No kernel patch, no re-compilation of the executables and the libraries is needed. Using gzip -9, a compression ratio of 1:3 can easily be achieved! (See examples below). This program has (almost) the same effect as a (read-only) compressed file system.

    See the web page for more.

    Baz

  2. Re:Tradeoffs by Deven · · Score: 3

    Reading through a compressed random-access file may not be as big of a win as you think, since the db will need to decompress things to determine where your data is. OTOH, if you have a fast processor and slower media (CDROM) and plenty of RAM, the drawbacks will disappear after a few spins through due to caching.

    The caching won't save you from uncompressing the blocks repeatedly. If you really want to compress the database metadata, you basically need a block-oriented compressed filesystem that allows random access within compressed files. I don't know if such a thing already exists, but it's effectively what you'd be writing to do it...

    I'd just use zlib to compress the individual entries and not try to compress the entire database as a whole. I've done this before, and it actually works better than you'd think. Even with data entries as small as 50-100 bytes, you get reasonable compression. Yes, you'd get much better compression across the entire database, but you can't hope to access a fully-compressed database without uncompressing it or doing a lot of work to make random-access possible. (And like I said, at that point you might as well be making a compressed filesystem.)

    --

    Deven

    "Simple things should be simple, and complex things should be possible." - Alan Kay