Python-LMDB In a High-Performance Environment

← Back to Stories (view on slashdot.org)

Python-LMDB In a High-Performance Environment

Posted by Soulskill on Friday October 17, 2014 @05:01AM from the fast-enough-to-cause-drama dept.

lkcl writes: In an open letter to the core developers behind OpenLDAP (Howard Chu) and Python-LMDB (David Wilson) is a story of a successful creation of a high-performance task scheduling engine written (perplexingly) in Python. With only partial optimization allowing tasks to be executed in parallel at a phenomenal rate of 240,000 per second, the choice to use Python-LMDB for the per-task database store based on its benchmarks, as well as its well-researched design criteria, turned out to be the right decision. Part of the success was also due to earlier architectural advice gratefully received here on Slashdot. What is puzzling, though, is that LMDB on Wikipedia is being constantly deleted, despite its "notability" by way of being used in a seriously-long list of prominent software libre projects, which has been, in part, motivated by the Oracle-driven BerkeleyDB license change. It would appear that the original complaint about notability came from an Oracle employee as well.

2 of 98 comments (clear)

Min score:

Reason:

Sort:

Re:Oh my... by lkcl · 2014-10-17 07:14 · Score: 5, Interesting

The use cases for LMDB are pretty limited.
weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.
basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.
this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.
so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....
please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.
but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].
this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.
Re:I can't wait for it by Alsee · 2014-10-17 08:48 · Score: 3, Interesting

I was involved in a example of this recently. TheFederalist.com is a one-year-old rightwing website. They ran an attack piece on Neil degrasse Tyson. It was picked up by the rightwing blogosphere, but was totally non-newsworthy (as established by the lack of news coverage). Someone tried to insert it into Wikipedia's biographical page on Neil degrasse Tyson. That edit was promptly reverted because Wikipedia has a policy of being extremely cautious about adding negative material to the Biography of Living Persons. A blogosphere rant against someone doesn't qualify. So then TheFederalist.com writer started screaming CENSORSHIP and equating Wikipedia editors to religious fundamentalist terrorists for not writing his hit-job into Tyson's biography. *THIS* picked up some minor coverage for the story from other sources.
At this point someone noticed that we had an tiny article page on TheFederalist.com, and the only sourcing for that article was TheFederalist itself and a blog page from MediaMatters. The TheFederalist page was nominated for deletion. A massive effort was made by many people trying to find an sources talking about TheFederalist.com, searching for any sources we could use to fix the article. The search turned up squat. Then TheFederalist.com wrote about Wikipedia nominating their article for deletion, and *THAT* got picked up by a few sources. And *THOSE* stories gave us enough information about TheFederalist.com in order to write a an article on it.
So yeah..... it was painfully circular. ~~~~
-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.