Python-LMDB In a High-Performance Environment

← Back to Stories (view on slashdot.org)

Python-LMDB In a High-Performance Environment

Posted by Soulskill on Friday October 17, 2014 @05:01AM from the fast-enough-to-cause-drama dept.

lkcl writes: In an open letter to the core developers behind OpenLDAP (Howard Chu) and Python-LMDB (David Wilson) is a story of a successful creation of a high-performance task scheduling engine written (perplexingly) in Python. With only partial optimization allowing tasks to be executed in parallel at a phenomenal rate of 240,000 per second, the choice to use Python-LMDB for the per-task database store based on its benchmarks, as well as its well-researched design criteria, turned out to be the right decision. Part of the success was also due to earlier architectural advice gratefully received here on Slashdot. What is puzzling, though, is that LMDB on Wikipedia is being constantly deleted, despite its "notability" by way of being used in a seriously-long list of prominent software libre projects, which has been, in part, motivated by the Oracle-driven BerkeleyDB license change. It would appear that the original complaint about notability came from an Oracle employee as well.

5 of 98 comments (clear)

Min score:

Reason:

Sort:

Re:Submitter doesn't understand Wikipedia notabili by Fwipp · 2014-10-17 05:22 · Score: 4, Insightful

Seriously, googling "python lmdb" shows up only documentation and lmdb blog updates on the first few pages, and also a link to this slashdot article. I don't see anybody interested in it or singing its praises.
It's okay that your pet project isn't wikipedia-noteworthy yet. Concentrate on your evangelism (like, get at least one person who isn't you to write about it), and then try submitting again once you have some sources to cite.
Would it hurt ... by lkcl · 2014-10-17 05:32 · Score: 5, Informative

OpenLDAP was originally using Berkeley DB, until recently. they'd worked with it for years, and got fed up with it. in order to minimise the amount of disruption to the code-base, LMDB was written as a near-drop-in replacement.
LMDB is - according to the web site and also the deleted wikipedia page - a key-value store. however its performance absolutely pisses over everything else around it, on pretty much every metric that can be measured, with very few exceptions.
basically howard's extensive experience combined with the intelligence to do thorough research (even to computing papers dating back to the 1960s) led him to make some absolutely critical but perfectly rational design choices, the ultimate combination of which is that LMDB outshines pretty much every key-value store ever written.
i mean, if you are running benchmark programs in *python* and getting sequential read access to records at a rate of 2,500,000 (2.5 MILLION) records per second... in a *scripted* programming language for goodness sake... then they have to be doing something right.
the random write speed of the python-based benchmarks showed 250,000 records written per second. the _sequential_ ones managed just over 900,000 per second!
there are several key differences between Berkeley DB's API and LMDB's API. the first is that LMDB can be put into "append" mode (as mentioned above). basically what you do is you *guarantee* that the key of new records is lexicographically greater than all other records. with this guarantee LMDB baiscally lets you put the new record _right_ at the end of its B+ Tree. this results in something like an astonishing 5x performance increase in writes.
the second key difference is that LMDB allows you to add duplicate values per key. in fact i think there's also a special mode (never used it) where if you do guaranteed fixed (identical) record sizes LMDB will let you store the values in a more space-efficient manner.
so it's pretty sophisticated.
from a technical perspective, there are two key differences between LMDB and *all* other key-value stores.
the first is: it uses "append-only" when adding new records. basically this has some guarantees that there can never be any corruption of existing data just because a new record is added.
the second is: it uses shared memory "copy-on-write" semantics. what that means is that the (one allowed) writer NEVER - and i mean never - blocks readers, whilst importantly being able to guarantee data integrity and transaction atomicity as well.
the way this is achieved is that because Copy-on-write is enabled, the "writer" may make as many writes it wants, knowing full well that all the readers will NOT be interfered with (because any write creates a COPY of the memory page being written to). then, finally, once everything is done, and the new top level parent B+ Tree is finished, the VERY last thing is a single simple LOCK, update-pointer-to-top-level, UNLOCK.
so as long as Reads do the exact same LOCK, get-pointer-to-top-level-of-B-Tree, UNLOCK, there is NO FURTHER NEED for any kind of locking AT ALL.
i am just simply amazed at the simplicity, and how this technique has just... never been deployed in any database engine before, until now. the reasons as howard makes clear are that the original research back in the 1960s was restricted to 32-bit memory spaces. now we have 64-bit so shared memory may refer to absolutely enormous files, so there is no problem deploying this technique, now.
all incredibly cool.
Oh my... by lkcl · 2014-10-17 05:43 · Score: 5, Informative

"a high-performance task scheduling engine written (perplexingly) in Python"
guys, there is this thing, it's called "algorithm"....
yeah.... except that algorithm took a staggering 3 months to develop. and it wasn't one algorithm, it was several, along with creating a networking IPC stack and having to create several unusual client-server design decisions. i can't go into the details because i was working in a secure environment, but basically even though i was the one that wrote the code i was taken aback that *python* - a scripted programming language - was capable of such extreme processing rates.
normally those kinds of speed rates would be associated with c for example.
but the key point of the article - leaving that speed aside - is that if something like PostgreSQL had been used as the back-end store, that rate would be somewhere around 30,000 tasks per second or possibly even less than that, over the long term, because of the overwhelming overhead associated with SQL (and NoSQL) databases maintaining transaction logs and making other guarantees in ways that are clearly *significantly* less efficient than the ways that LMDB do it, by way of those guarantees being integrated at a fundamental design level into LMDB.
Re:Oh my... by lkcl · 2014-10-17 07:14 · Score: 5, Interesting

The use cases for LMDB are pretty limited.
weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.
basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.
this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.
so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....
please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.
but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].
this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.
Re:(not)perplexingly by Alsee · 2014-10-17 07:59 · Score: 4, Informative

Wikipedia editors aren't allowed to have opinions about a topic. The Neutral Point of View policy mandates that edits be deleted or re-written to present a reasonably neutral description of a topic. (And if needed, a neutral description of the sides in a controversial topic.)
Wikipedia editors aren't allowed to "know stuff" about a topic. The No Original Research policy mandates that facts and information must be Verifiable in published Reliable Sources. The sources need to exist, even if they aren't cited. Any information which is challenged, or is likely to be challenged, can be removed or tagged with {{citation needed}}.
Wikipedia editors aren't allowed to decide how "important" a topic is. This one causes the most confusion. Wikipedia's has a specific and somewhat unusual definition of Notability. Wikipedia Notability means that multiple independent Reliable Sources have published significant discussion of the subject. A musician who barely shows up at the #100 slot on a Billboard-top-100 list is Notable because The Wold has created the Billboard top-100 list to Take Note of musicians, and because a few paragraphs about the musician here and there in magazines give us Verifiable information from which to build an article. A Youtuber with more fans than the musician isn't Notable because (generally) books and magazines and the news don't publish any discussion of popular Youtubers. That means we have no independent sources from which to build an article.
So.... the reason this article was deleted rather than tagged "needs more verifiable sources" was that the number of independent usable sources was ZERO when it was nominated for deletion, and because everyone who participated in the deletion discussion did a search for more sources and came up with ZERO.
You can't built a valid Wikipedia article without verifiable sources, and you can't fix a broken article by adding sources to when the sources don't exist.
People can't write Wikipedia articles about themselves saying how awesome they are, or their company, or their pet project. (Well, they can write the article, but it will be deleted if it doesn't cite multiple independent published Reliable Sources discussing the subject).
It doesn't matter how awesome someone thinks their Python-LMDB project is. It doesn't matter how important someone thinks their Python-LMDB project is. If there's no magazines or books or news talking about it, then it's a dead-duck under Wikipedia Notability policy. We can't build an article based on just their own promotional materials, and editors can't just claim "personal knowledge" to make up stuff to write an article.
And no, this lame Slashdot story won't change that. ~~~~
-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.