Python-LMDB In a High-Performance Environment

← Back to Stories (view on slashdot.org)

Python-LMDB In a High-Performance Environment

Posted by Soulskill on Friday October 17, 2014 @05:01AM from the fast-enough-to-cause-drama dept.

lkcl writes: In an open letter to the core developers behind OpenLDAP (Howard Chu) and Python-LMDB (David Wilson) is a story of a successful creation of a high-performance task scheduling engine written (perplexingly) in Python. With only partial optimization allowing tasks to be executed in parallel at a phenomenal rate of 240,000 per second, the choice to use Python-LMDB for the per-task database store based on its benchmarks, as well as its well-researched design criteria, turned out to be the right decision. Part of the success was also due to earlier architectural advice gratefully received here on Slashdot. What is puzzling, though, is that LMDB on Wikipedia is being constantly deleted, despite its "notability" by way of being used in a seriously-long list of prominent software libre projects, which has been, in part, motivated by the Oracle-driven BerkeleyDB license change. It would appear that the original complaint about notability came from an Oracle employee as well.

1 of 98 comments (clear)

Min score:

Reason:

Sort:

Re:Oh my... by lkcl · 2014-10-17 07:14 · Score: 5, Interesting

The use cases for LMDB are pretty limited.
weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.
basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.
this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.
so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....
please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.
but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].
this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.