Python-LMDB In a High-Performance Environment

Re:perplexingly by Z00L00K · 2014-10-17 05:15 · Score: 1

I question the reason for deleting the article instead of tagging it that it needs more verifiable sources.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.

I can't wait for it by i+kan+reed · 2014-10-17 05:15 · Score: 3, Funny

At some point there will be an article on Wikipedia, that only meets Wikipedia's notability requirements due to media spillover complaining about the notability requirements.

Re:I can't wait for it by Alsee · 2014-10-17 08:48 · Score: 3, Interesting

I was involved in a example of this recently. TheFederalist.com is a one-year-old rightwing website. They ran an attack piece on Neil degrasse Tyson. It was picked up by the rightwing blogosphere, but was totally non-newsworthy (as established by the lack of news coverage). Someone tried to insert it into Wikipedia's biographical page on Neil degrasse Tyson. That edit was promptly reverted because Wikipedia has a policy of being extremely cautious about adding negative material to the Biography of Living Persons. A blogosphere rant against someone doesn't qualify. So then TheFederalist.com writer started screaming CENSORSHIP and equating Wikipedia editors to religious fundamentalist terrorists for not writing his hit-job into Tyson's biography. *THIS* picked up some minor coverage for the story from other sources.
At this point someone noticed that we had an tiny article page on TheFederalist.com, and the only sourcing for that article was TheFederalist itself and a blog page from MediaMatters. The TheFederalist page was nominated for deletion. A massive effort was made by many people trying to find an sources talking about TheFederalist.com, searching for any sources we could use to fix the article. The search turned up squat. Then TheFederalist.com wrote about Wikipedia nominating their article for deletion, and *THAT* got picked up by a few sources. And *THOSE* stories gave us enough information about TheFederalist.com in order to write a an article on it.
So yeah..... it was painfully circular. ~~~~
-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

notability by nimbius · 2014-10-17 05:15 · Score: 3, Funny

I believe the definition here is that the software hasnt become so notable as to pose an immediate threat to a certain large database corporation which would require it to bury it in cash, rebrand it, lock it down, and pedal what little innovative or remarkable features the application had into the ground while pretending that somehow forking projects of the original software arent making them look like a complete failure.

--
Good people go to bed earlier.

Re:notability by wiredlogic · 2014-10-17 07:10 · Score: 1

Yeah. Wikipedia has more important things to concern itself with like a comprehensive list of all Pokemon.

--
I am becoming gerund, destroyer of verbs.

Wikipedia article deleted by fnj · 2014-10-17 05:17 · Score: 1, Flamebait

If Wikipedia was a person I would smack it upside the head for shit like this. There is absolutely no reason not to have an article on LMDB, and deleting a perfectly good article for no reason is evidence of a mental disorder. It's not like they have to spend an extra penny for a piece of paper to hold the article, possibly making the book too thick. Wake up.

Yeah, I'm FAR from a Wikipedia hater, but when it pulls shit like this it reveals its stupidity.

Re:Wikipedia article deleted by i+kan+reed · 2014-10-17 05:21 · Score: 2

Wikipedia has rules. While those rules exist for good reasons, by nature of being rules they are most easily navigated by bureaucratically minded, officious mindset.
People have this false mindset where wikipedia, by virtue of their "anyone can edit" policy is an infinite bastion of free expression. When really, it's just a whole lot of people disagreeing and squabbling and working and editing to make and upkeep an encyclopedia.
Re:Wikipedia article deleted by Anonymous Coward · 2014-10-17 05:33 · Score: 2, Funny

[citation needed]
Re:Wikipedia article deleted by Wdomburg · 2014-10-17 05:37 · Score: 1

If the rules legitimately preclude a page on LMDB, they certainly should preclude individual pages for MySQL backends like Falcon, Aria, and Toku, shouldn't it? And yet there they are.
Re:Wikipedia article deleted by jeffmeden · 2014-10-17 05:42 · Score: 2

If Wikipedia was a person I would smack it upside the head for shit like this. There is absolutely no reason not to have an article on LMDB, and deleting a perfectly good article for no reason is evidence of a mental disorder. It's not like they have to spend an extra penny for a piece of paper to hold the article, possibly making the book too thick. Wake up.
Yeah, I'm FAR from a Wikipedia hater, but when it pulls shit like this it reveals its stupidity.
Wikipedia has a pretty standard bar for articles it should curate (which is decidedly not free) and that is, does the subject have any sort of peer-reviewed literature available (and source code comments, howtos, etc don't count)? This goes directly to the "no original research" policy, which basically asserts that Wikipedia editors (including the one that created the page) should not be writing the article based on their original work, since Wikipedia is not the place for peer review to happen. Long story short, the author/editor should do his peer review somewhere else (preferrably not some other wiki site) and then submit that work as a source. They do this to keep the amount of edit wars/debates/flame-fests to a minimum (and there are still plenty, even when sources are available). Wikipedia is trying quite frantically to focus on its core competency as editors walk away due to political/moral/religious squabbles, and this is one way to do that.
Re:Wikipedia article deleted by Hognoxious · 2014-10-17 05:51 · Score: 1

Wikipedia has WP:rules. While those WP:rules exist for good WP:reasons, by nature of being WP:rules they are most easily[opinion] navigated by WP:bureaucratically minded, officious mindset.

FTFY. And [citation needed]

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Wikipedia article deleted by UnknownSoldier · 2014-10-17 06:00 · Score: 1

Some of Wikipedia's rules are ass-backwards asinine. Such as Avoid Trivia
One man's trivia is another man's noise.
Oh I see, so only if it is _popular_ does the "truthiness" count.
Fuck that. I want an _inclusive_ dictionary / encyclopedia / reference, not an _exclusive_ based on some "arbitrary" rules simply because something is not popular. I am there in the first place to _learn_ about things I don't know about ! Not because some asshat decided "not enough people care about this topic."
It is not like a extra web page take up THAT much storage in the first place.
Re:Wikipedia article deleted by MillionthMonkey · 2014-10-17 06:26 · Score: 2

If Wikipedia was a person I would smack it upside the head for shit like this.
If Wikipedia were a person, you could just edit his face.
Re:Wikipedia article deleted by WaffleMonster · 2014-10-17 06:40 · Score: 1

If Wikipedia was a person I would smack it upside the head for shit like this. There is absolutely no reason not to have an article on LMDB, and deleting a perfectly good article for no reason is evidence of a mental disorder. It's not like they have to spend an extra penny for a piece of paper to hold the article, possibly making the book too thick. Wake up.
Speaking only from personal experience
there seems to be a disconnect between what people actually derive value from and rules + perhaps original intent of Wikipedia.
We seem to be stuck in a situation where lack of enforcement itself is supporting quite a bit of value and interest in the site... A situation ripe for leverage by personal whims and selfish persuasion.
I don't think there are any easy answers yet the rampant deletions are particularly annoying and unhelpful to me as a user of Wikipedia.
Re:Wikipedia article deleted by Alsee · 2014-10-17 09:05 · Score: 1

<ref>https://en.wikipedia.org/wiki/Wikipedia:List_of_policies_and_guidelines</ref>

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Re:Submitter doesn't understand Wikipedia notabili by Fwipp · 2014-10-17 05:22 · Score: 4, Insightful

Seriously, googling "python lmdb" shows up only documentation and lmdb blog updates on the first few pages, and also a link to this slashdot article. I don't see anybody interested in it or singing its praises.

It's okay that your pet project isn't wikipedia-noteworthy yet. Concentrate on your evangelism (like, get at least one person who isn't you to write about it), and then try submitting again once you have some sources to cite.

Re:Submitter doesn't understand Wikipedia notabili by Wdomburg · 2014-10-17 05:28 · Score: 1

LMDB was deleted from Wikipedia, not Python-LMDB.

The bindings are not especially notable. The embedded database is.

Re:Submitter doesn't understand Wikipedia notabili by i+kan+reed · 2014-10-17 05:31 · Score: 2

Yep, and there are like 2 dozen python wikis where no one would mind a write-up, and links. Sometimes things just aren't really encyclopedia topics. And that's fine.

Wikipedia, in spite of all odds, somehow manages to hold onto a tiny reputation for informational quality. Part of that is not having more information than the users can reliably fact-check. And I've never held it against wikipedia if I look something up and it just doesn't have an article. I fall back to the whole rest of the internet right away.

Would it hurt ... by lkcl · 2014-10-17 05:32 · Score: 5, Informative

OpenLDAP was originally using Berkeley DB, until recently. they'd worked with it for years, and got fed up with it. in order to minimise the amount of disruption to the code-base, LMDB was written as a near-drop-in replacement.

LMDB is - according to the web site and also the deleted wikipedia page - a key-value store. however its performance absolutely pisses over everything else around it, on pretty much every metric that can be measured, with very few exceptions.

basically howard's extensive experience combined with the intelligence to do thorough research (even to computing papers dating back to the 1960s) led him to make some absolutely critical but perfectly rational design choices, the ultimate combination of which is that LMDB outshines pretty much every key-value store ever written.

i mean, if you are running benchmark programs in *python* and getting sequential read access to records at a rate of 2,500,000 (2.5 MILLION) records per second... in a *scripted* programming language for goodness sake... then they have to be doing something right.

the random write speed of the python-based benchmarks showed 250,000 records written per second. the _sequential_ ones managed just over 900,000 per second!

there are several key differences between Berkeley DB's API and LMDB's API. the first is that LMDB can be put into "append" mode (as mentioned above). basically what you do is you *guarantee* that the key of new records is lexicographically greater than all other records. with this guarantee LMDB baiscally lets you put the new record _right_ at the end of its B+ Tree. this results in something like an astonishing 5x performance increase in writes.

the second key difference is that LMDB allows you to add duplicate values per key. in fact i think there's also a special mode (never used it) where if you do guaranteed fixed (identical) record sizes LMDB will let you store the values in a more space-efficient manner.

so it's pretty sophisticated.

from a technical perspective, there are two key differences between LMDB and *all* other key-value stores.

the first is: it uses "append-only" when adding new records. basically this has some guarantees that there can never be any corruption of existing data just because a new record is added.

the second is: it uses shared memory "copy-on-write" semantics. what that means is that the (one allowed) writer NEVER - and i mean never - blocks readers, whilst importantly being able to guarantee data integrity and transaction atomicity as well.

the way this is achieved is that because Copy-on-write is enabled, the "writer" may make as many writes it wants, knowing full well that all the readers will NOT be interfered with (because any write creates a COPY of the memory page being written to). then, finally, once everything is done, and the new top level parent B+ Tree is finished, the VERY last thing is a single simple LOCK, update-pointer-to-top-level, UNLOCK.

so as long as Reads do the exact same LOCK, get-pointer-to-top-level-of-B-Tree, UNLOCK, there is NO FURTHER NEED for any kind of locking AT ALL.

i am just simply amazed at the simplicity, and how this technique has just... never been deployed in any database engine before, until now. the reasons as howard makes clear are that the original research back in the 1960s was restricted to 32-bit memory spaces. now we have 64-bit so shared memory may refer to absolutely enormous files, so there is no problem deploying this technique, now.

all incredibly cool.

I can't wait for it by lkcl · 2014-10-17 05:34 · Score: 1

At some point there will be an article on Wikipedia, that only meets Wikipedia's notability requirements due to media spillover complaining about the notability requirements.

yaaay! :) works for me. wasn't there a journalist who published a blog and used that as the only notable reference to create a fake article? :)

Re:perplexingly by i+kan+reed · 2014-10-17 05:41 · Score: 1

Because, in the end, misinforming is often worse than not informing. If there's no discernible way for the people reviewing the article to check if it's valid, there's serious concern about PR and marketing injecting false information into your supposedly neutral encyclopedia, misleading everyone using your site.

The line is going to be somewhere. They have verbal debates about all-but-the-most-obvious of deletions(which officially still require four eyes, one pair to propose speedy deletion, one to delete).

Oh my... by lkcl · 2014-10-17 05:43 · Score: 5, Informative

"a high-performance task scheduling engine written (perplexingly) in Python"

guys, there is this thing, it's called "algorithm"....

yeah.... except that algorithm took a staggering 3 months to develop. and it wasn't one algorithm, it was several, along with creating a networking IPC stack and having to create several unusual client-server design decisions. i can't go into the details because i was working in a secure environment, but basically even though i was the one that wrote the code i was taken aback that *python* - a scripted programming language - was capable of such extreme processing rates.

normally those kinds of speed rates would be associated with c for example.

but the key point of the article - leaving that speed aside - is that if something like PostgreSQL had been used as the back-end store, that rate would be somewhere around 30,000 tasks per second or possibly even less than that, over the long term, because of the overwhelming overhead associated with SQL (and NoSQL) databases maintaining transaction logs and making other guarantees in ways that are clearly *significantly* less efficient than the ways that LMDB do it, by way of those guarantees being integrated at a fundamental design level into LMDB.

Deletionists by HeckRuler · 2014-10-17 05:49 · Score: 3, Insightful

I never understood the deletionist mentality on Wikipedia. But there's a whole group of people that want to remove information from the public view.

I semi-understand the idea that this "very important" encyclopedia is "too important" for such things as a page for each character from a game I never played. And somehow by culling these frivolous thing they somehow make wikipedia higher quality on the whole? Maybe? Kinda? I don't think these people understand how search works.

There are the obvious shills and PR people that want to sweep things under the rug. These are nefarious and to be found and fought.

There are fools who think it's expensive to store this information. As if an edit-war to remove it was cheaper.

I understand people don't want articles that are just free advertising. But I doubt anyone is going to delete the page for Monanto.

But fundamentally, I just don't get their worldview.

Re:Deletionists by Alsee · 2014-10-17 10:00 · Score: 1

The "worldview" is that Wikipedia is supposed to be an Encyclopedia. Wikipedia is the Encyclopedia That Anyone Can Edit, not a public blog-space. The only thing that prevents Wikipedia from becoming a scribble-board are the Wikipedia Policies, and editor dedication to those policies. If you throw out Wikipedia content-verifiability policies then it would start looking a lot less like an Encyclopedia.

I don't think these people understand how search works.
How search works: If you type a search term into Google you'll get random writings about the topic, no matter how trivial. If you type a search term into Wikipedia you'll get an encyclopedia-style article with Verifiable information cited to independent Reliable Sources, if we have one. ~~~~
-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Re:Deletionists by HeckRuler · 2014-10-20 06:06 · Score: 1

Sure sure, verifiable is important. But even with something to verify the information on the page, you still get those deletionists that will claim notability, and fast-track the page for deletion.
I don't give a rats fucking ass if you don't think that rat-asses are notable or not. If there are citable facts on the page, LEAVE IT BE. And let me make this clear. In your VERY NEXT BREATH you went from "it'll be a scribble-board without verifiability" to "no matter how trivial".
Who the fuck cares who trivial it is? You do. Because you have a stick up your ass about how important Wikipedia is.
And here's how search works: you type a search term into Google and the first hit you'll get is always wikipedia because it's the best and highest quality source for the topic at hand. The term "the sum of all humanities knowledge" comes to mind. But no, you're high and mighty and you just don't give a fuck about how many pokemon there are.
Hey man, you want to trim down Wikipedia of random meaningless shit nobody cares about? Try taking on football. Seriously, go hit that random wiki button a few dozen times and tell me how many miniscule sports trivia tidbits you get.
Re:Deletionists by Alsee · 2014-10-20 17:05 · Score: 1

Sure sure, verifiable is important. But even with something to verify the information on the page, you still get those deletionists that will claim notability, and fast-track the page for deletion.
If you were paying attention, I explained exactly how to prevent an article from being deleted. Include a couple of independent Reliable Sources talking about the topic, saying things that can be used to build an article. Once you have that then primary sources can help expand the article if used properly, but we have rules against articles built solely with primary sources because primary-source-only articles raise a shitton of problems.

But no, you're high and mighty and you just don't give a fuck about how many pokemon there are.
What the hell are you ranting about? Not only does Wikipedia have an article on Pokemon, we've got literally hundreds of Pokemon articles. That includes a list of SEVEN HUNDRED AND NINETEEN pokemon running up to Number 719: Diancie.

Hey man, you want to trim down Wikipedia of random meaningless shit nobody cares about? Try taking on football.
I would personally be delighted if the world got over it's nutty fascination with football. However the fact is that the world does treat football as important, and there does exist an crazy amount of Published sources Taking Note of every minute facet of football. As a Wikipedia Editor I accept it's not my place to delete other people's football contributions based on my opinion of football's level of "importance". If someone complies with Wikipedia policies, if their article satisfies sourcing requirements etc., then I'll either leave the article alone or I'll work to improve it. Hell, some of my most resent edits were fixes to professional Wrestling articles, which I consider about 42 level lower than football in stupidity. Football is a genuine idiotic violent sport, Wrestling is a fake idiotic violent sport. ~~~~
-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Re:Did you make any effort to get this undeleted? by lkcl · 2014-10-17 05:49 · Score: 1

there isn't a python-lmdb wikipedia article, and one has never been created. the discussion involves the LMDB page (not the python bindings) despite LMDB having significant notable uses.

Submitter doesn't understand Wikipedia notability by lkcl · 2014-10-17 05:52 · Score: 1

Never mind what projects use it; what have independent reliable sources written about LMDB?

i've written something and i'm pretty wubwubwubreliawibble oh look pretty coloured lights...

Re:Submitter doesn't understand Wikipedia notabili by Wdomburg · 2014-10-17 05:58 · Score: 1

Don't know, don't care. The case against notability was stronger when it was first submitted, likely, but it is certainly hard to defend now. There are mature bindings for most languages, it underpins a number of higher level data stores (including OpenLDAP, of course, but also FineDB, Hustle), and is a supported back-end for a large number of projects (sometimes as a default component, as with CFengine and Zimbra).

Algorithm? by AqD · 2014-10-17 05:59 · Score: 1

Why don't they publish the algorithm on wikipedia instead? Putting a non-popular library on wikipedia seems a bit extreme. It may well include all github projects including mine...

bitching by MSG · 2014-10-17 06:12 · Score: 1

I'll start with: LMDB is awesome, and I am super SUPER impressed with OpenLDAP's benchmarks over the last several years. I do not question LMDB's worth.

I'm just not really sure that this letter is evidence thereof. The author got poor performance from a SQL database with no indexing, which degraded as the number of records grew? You don't say! A database that has to do a full scan for reads performs poorly?

Surprise about load average seems equally naive. If you fork a bunch of processes that are doing IO, of COURSE the load increases. Load is a measure of the number of processes not sleeping. That's all it is. I don't understand his surprise that a system steadily doing a great deal of IO would show a lot of time spent in IO calls in profiling.

Reading that letter made me cringe. It didn't help that it sounds like another NSA project.

Re:bitching by kefalonia · 2014-10-17 13:54 · Score: 1

no NSA project here, unless Digital Pine is a subsidiary? small world, isn't it?

database performance by lkcl · 2014-10-17 06:29 · Score: 2

The author got poor performance from a SQL database with no indexing, which degraded as the number of records grew? You don't say! A database that has to do a full scan for reads performs poorly?

yes. it was that i had to do that analysis in a formal repeatable independent way, which i had never done before, and i was very surprised at the poor results. i was at least expecting a *consistent* and reliable rate of... well, i don't know: i was kinda expecting PostgreSQL to be top of the list and i was kinda expecting it to reach 100,000 or 200,000 records per second... and it just... couldn't. i was *completely* caught off-guard by the need to switch off all the safety checks, and by how dramatic the effect on performance of adding indexes really was.

so it was then by complete contrast that, for example, the py-lmdb benchmarks got an ORDER OF MAGNITUDE better sequential-read-speeds (2.5 million per second) than i was expecting that made me really sit up and take notice.

Surprise about load average seems equally naive. If you fork a bunch of processes that are doing IO, of COURSE the load increases. Load is a measure of the number of processes not sleeping. That's all it is. I don't understand his surprise that a system steadily doing a great deal of IO would show a lot of time spent in IO calls in profiling.

you've missed the point. it was that the exact same design using 20 (or so) shm file handles instead of 200 file handles opening to the exact same data (effectively) resulted in a reasonable loadavg, whereas having the 200 file handles open had a loadavg that ground the system completely to a halt.

so it's not the *actual* loadavg that is relevant but that the *relative* loadavg before and after that one simple change was so dramatically shifted from "completely unusable and in no way deployable in a live production environment" to a "this might actually fly, jim" level.

Re:database performance by MSG · 2014-10-17 08:13 · Score: 1

so it's not the *actual* loadavg that is relevant but that the *relative* loadavg before and after that one simple change was so dramatically shifted from "completely unusable and in no way deployable in a live production environment" to a "this might actually fly, jim" level.
That's not loadavg, that's IO latency. You should probably be using iostat to get useful numbers.
loadavg is completely useless when discussing system performance, it is in no way related.
Re:database performance by lkcl · 2014-10-17 08:31 · Score: 1

That's not loadavg, that's IO latency. You should probably be using iostat to get useful numbers.
oo, thank you very much for that tip, i'll try to pass it on and will definitely remember it for the next projects i work on. thank you.
Re:database performance by MSG · 2014-10-17 10:39 · Score: 1

If you haven't used iostat before: Run "iostat -x 2" to get a report of block device utilization every two seconds. Ignore the first report; it details utilization since system boot. All subsequent reports will be for the period after the previous report.
If you can repeat your earlier tests, and want to see if there's actually a Linux bug, compare numbers when the program opens DBs before forking, and when it opens them after. If you're seeing bad latency in the former case, but similar B/s, that might indicate a bug. If you're seeing much higher %system (CPU), that might be a bug. Maybe. Otherwise, it's probably an indication that the program behaves differently in those cases, which is not a Linux bug.

Re:Did you make any effort to get this undeleted? by lkcl · 2014-10-17 06:35 · Score: 1

I apologize for that, I was wrong and spoke too quickly. If you can find notable sources for P-LMDB, then it's worth a shot bringing it to that user's attention.

hey not a problem. you're right about py-lmdb - my main concern is to get LMDB the recognition that its peer stores (such as BerkeleyDB) already have: http://en.wikipedia.org/wiki/B... - someone else mentioned that there are other such key-value stores (some of them at the same development period as LMDB) which already have articles. and it's that an *oracle* employee marked the page for deletion that's the main issue of contention here.

Re:Did you make any effort to get this undeleted? by Fwipp · 2014-10-17 06:53 · Score: 1

Then it shouldn't be hard to find somebody who's noted them.

I can't wait for it by Medievalist · 2014-10-17 07:06 · Score: 2

>wasn't there a journalist who published a blog and used that as the only notable reference to create a fake article? :)

I can recommend you a fascinating pair of books: The Secret History of the War on Cancer by Devra Davis and The Merchants of Doubt by Naomi Oreskes. There is a very long history of circular self-reference among dishonest journalists and scientists; for example Fred Singer would write a letter to the Wall Street Journal, then write an Op-Ed piece for a smaller outfit using the Wall Street Journal as a reference, then write an article for WSJ referencing the op-ed. In each case the claimed accuracy of the sources would be boosted - first in the letter it might make a bald claim like "tobacco is proven not harmful" or "global warming is beneficial" and the the op-ed would go on to state that "the wall street journal says tobacco is proven not harmful" then in the final piece "prominent scientists have repeatedly proven that tobacco is not harmful" (Singer really is a physicist or something like that). Eventually the final WSJ article would be cited in thousands of journals and papers funded by Singer's paymasters - this is still going on, the articles are still cited today. Read the books to find out more.

Re:Oh my... by lkcl · 2014-10-17 07:14 · Score: 5, Interesting

The use cases for LMDB are pretty limited.

weeelll.... the article _did_ say "high performance", so there are some sacrifices that can be made especially when those features provided by SQL databases are clearly not even needed.

basically what was needed then was to actually *re-implement* some of the missing features (indexes for example) and that took quite some research. it turns out that (after finding an article written by someone who has implemented a SQL database using the very same key-value stores that everyone uses) you can implement secondary indexes *using* a key-value store with range capabilities by concatenating the value that you wish to have range-search on with the primary key of the record that you wish to access, and then storing that as the key with a zero-length value in the secondary-index key-value store.

this was what i had to implement - directly - in python, to provide secondary indexing using timestamps so that records could be deleted for example once they were no longer needed. it was actually incredibly efficient, *because of the performance of LMDB*.

so... yeah. didn't need SQL queries. added some basic secondary-indexing manually. got the transactional guarantees directly from the implementation of LMDB. got many other cool features....

please remember that i am keenly aware that SQLite, MySQL and i think even PostgreSQL can now be compiled to use LMDB as its back-end data store... but that the application was _so demanding_ that even if that had been done it still would not have been enough.

but, apart from that: i don't believe you are correct in saying that there are a limited number of use cases for LMDB *itself* - the statement "there are a limited number of use cases for range-based key-value stores" *might* be a bit more accurate, but there are clearly quite a _lot_ of use cases for range-based key-value stores [including as the back-end of more complex data management systems such as SQL and NOSQL servers].

this high-performance task scheduler application happens to be one of them... and the main point of the article is that, amongst the available key-value stores currently in existence, my research tells me that i picked the absolute best of them all.

Re:Would it hurt ... by lkcl · 2014-10-17 07:25 · Score: 1

A lot of the locking semantics you mentioned sound pretty similar to RCU which is used extensively in the Linux kernel, and allows for lockless reading on certain architectures.

http://en.wikipedia.org/wiki/R... .... yes, i think so. now imagine that all the copying is done by the OS using the OS's virtual memory page-table granularity (so does not have any very very very significant overhead). and also imagine that the library is intelligent enough to move the older page into its record of free pages during a cleanup phase that doesn't cost very much either. and also remember that on accessing B+ trees to find a record you only need to know the "top" (root) node... so you can update (or create) using those COW semantics as many B+ tree nodes as you like, knowing that it's *only* the root node that you need (after the fact) to tell (new) readers about... ... and now it's no longer expensive to do those RCU style operations, and the performance is streets ahead of any other key-value store.

but i am not an expert on these things. i'm sure that if howard chipped in here (and he _is_ an expert on the linux kernel and on high-performance efficient algorithm implementation) he'd be able to tell you more and probably a lot more accurately than i can.

Do not use joins by Hognoxious · 2014-10-17 07:26 · Score: 2

if something like PostgreSQL had been used as the back-end store, that rate would be somewhere around 30,000 tasks per second or possibly even less than that

You should pipe it to /dev/nul. That's webscale.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Do not use joins by lkcl · 2014-10-17 07:29 · Score: 2

if something like PostgreSQL had been used as the back-end store, that rate would be somewhere around 30,000 tasks per second or possibly even less than that

You should pipe it to /dev/nul. That's webscale.

don't jest... please :) jokes about "you should just have a big LED on the box with a switch and a battery" _not_ appreciated :)

but, seriously: the complete lack of need in this application for joins (as well as any other features of SQL or NOSQL databases) was what led me to research key-value stores in the first place.

Over-emphasizing "scripted" or "scripting" by Jizzbug · 2014-10-17 07:53 · Score: 1

CPython is a compiler. It compiles Python source code to Python bytecode, and the Python runtime executes the compiled bytecode.

CPython has one major weakness, the GIL (global interpreter lock). I've seen the GIL harm high-throughput, multi-threaded event processing systems not dissimilar from the one you describe.

If you must insist on Python and want to avoid multi-threaded I/O bound weaknesses of the GIL, then use Jython.

--

-=/\- Jizzbug -/\=-

Re:Over-emphasizing "scripted" or "scripting" by styrotech · 2014-10-17 11:59 · Score: 1

If you must insist on Python and want to avoid multi-threaded I/O bound weaknesses of the GIL, then use Jython.
I was under the impression the GIL was a problem for CPU bound miltithreaded CPython code, but that the GIL is released when waiting on I/O (or native libraries). ie I/O bound workloads are the ones that still make some sense for multithreading in CPython.

Re:(not)perplexingly by Alsee · 2014-10-17 07:59 · Score: 4, Informative

Wikipedia editors aren't allowed to have opinions about a topic. The Neutral Point of View policy mandates that edits be deleted or re-written to present a reasonably neutral description of a topic. (And if needed, a neutral description of the sides in a controversial topic.)

Wikipedia editors aren't allowed to "know stuff" about a topic. The No Original Research policy mandates that facts and information must be Verifiable in published Reliable Sources. The sources need to exist, even if they aren't cited. Any information which is challenged, or is likely to be challenged, can be removed or tagged with {{citation needed}}.

Wikipedia editors aren't allowed to decide how "important" a topic is. This one causes the most confusion. Wikipedia's has a specific and somewhat unusual definition of Notability. Wikipedia Notability means that multiple independent Reliable Sources have published significant discussion of the subject. A musician who barely shows up at the #100 slot on a Billboard-top-100 list is Notable because The Wold has created the Billboard top-100 list to Take Note of musicians, and because a few paragraphs about the musician here and there in magazines give us Verifiable information from which to build an article. A Youtuber with more fans than the musician isn't Notable because (generally) books and magazines and the news don't publish any discussion of popular Youtubers. That means we have no independent sources from which to build an article.

So.... the reason this article was deleted rather than tagged "needs more verifiable sources" was that the number of independent usable sources was ZERO when it was nominated for deletion, and because everyone who participated in the deletion discussion did a search for more sources and came up with ZERO.

You can't built a valid Wikipedia article without verifiable sources, and you can't fix a broken article by adding sources to when the sources don't exist.

People can't write Wikipedia articles about themselves saying how awesome they are, or their company, or their pet project. (Well, they can write the article, but it will be deleted if it doesn't cite multiple independent published Reliable Sources discussing the subject).

It doesn't matter how awesome someone thinks their Python-LMDB project is. It doesn't matter how important someone thinks their Python-LMDB project is. If there's no magazines or books or news talking about it, then it's a dead-duck under Wikipedia Notability policy. We can't build an article based on just their own promotional materials, and editors can't just claim "personal knowledge" to make up stuff to write an article.

And no, this lame Slashdot story won't change that. ~~~~

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Over-emphasizing by lkcl · 2014-10-17 08:24 · Score: 1

CPython is a compiler.

it's an interpreter which was [originally] based on a FORTH engine.

It compiles Python source code to Python bytecode,

there is a compiler which does that, yes.

and the Python runtime executes the compiled bytecode.

it interprets it.

CPython has one major weakness, the GIL (global interpreter lock).

*sigh* it does. the effect that this has on threading is to reduce threads to the role of a mutually-exclusive task-switching mechanism.

I've seen the GIL harm high-throughput, multi-threaded event processing systems not dissimilar from the one you describe.

yes. you are one of the people who will appreciate, given that the codebase could not be written in (or converted to) any other language, due to time-constraints, that using processes and custom-written IPC because threads (which you'd think would be perfect to get high-performance on event processing because there would be no overhead on passing data between threads) couldn't be used, means that the end-result is going to be... complicated.

If you must insist on Python and want to avoid multi-threaded I/O bound weaknesses of the GIL, then use Jython.

not a snowball in hell's chance of that happening :) not in a milllion years. not on this project, and not on any project i will actively and happily be involved in. and *especially* i cannot ever endorse the use of java for high performance reliable applications. i'm familiar with python's advantages and disadvantages, the way that the garbage collector works, and am familiar with the size of the actual python interpreter and am happy that it is implemented in c.

java on the other hand i just... i don't even want to begin describing why i don't want to be involved in its deployment - i'm sure there are many here on slashdot happy to explain why java is unsuitable.

there are many other ways in which the limitation of threads in python imposed by the GIL may be avoided. i chose to work around the problem by using processes and custom-writing an IPC infrastructure using edge-triggered epoll. it was... hard. others may choose to use stackless python. others may agree with the idea to use jython, but honestly if the application was required to be reasonably reliable as well as high-performance there would be absolutely no way that i could ever endorse such an idea. sorry :)

Re:Over-emphasizing by Jizzbug · 2014-10-17 09:03 · Score: 1

Thanks for correcting some of my semantics. :)
My point was that Python is a VM-backed language, similar to the JVM (more correctly, similar but lacking JIT), and unless you hit the GIL it performs quite well. Same as with Perl.
Here is Python's VM: https://hg.python.org/cpython/file/3.3/Python/ceval.c#l790
I'll attempt to agree with your Java sentiment by saying that Java only became worth a damn in 1.5.
Today it is respectable for a number of use-cases. My favorite use-case is JAX-RS 2.0. Anyone who writes REST interfaces in Node.js or almost anything else just likes to type a lot of unnecessary lines of code, manually injecting request parameters into business logic and manually creating encoded responses, etc. (Webmachine in Erlang, Ruby or Python is almost as respectable as JAX-RS 2.0 & Jersey.) In JAX-RS 2.0, my same web-annotated business objects and structure-annotated value objects can serve application/xml, application/json and application/x-www-form-urlencoded inputs and outputs without me having to write a single line of plumbing or conversion code, letting me focus on the business logic and domain object model alone.
In the event processing system I referred to, we rewrote it in Java and acheived many improvements in speed over Python (due to our I/O bound multithreading and, of course, avoiding IPC-in-Python along the way, which would have helped, as you say).
FWIW, Puppet is moving from Ruby to JRuby... I'm not a huge JVM fanboy, but it has its benefits on occassion, especially if you can avoid all instances of legacy code and legacy APIs. (Java has a done a better job of learning from their mistakes, but the mistakes linger in legacy code.)
PS: It is not well-known, but it is possible to do reified generics in Java with some hackery (with concrete anonymous abstract classes), if you really need some C++ template love in your codebase.
Nice chatting!

--

-=/\- Jizzbug -/\=-
Re:Over-emphasizing by Jizzbug · 2014-10-17 09:30 · Score: 1

PPS: Given your custom IPC for Python, could you go us one further and write an OSGi for Python using it? Pretty please! ;)

--

-=/\- Jizzbug -/\=-
Re:Over-emphasizing by lkcl · 2014-10-18 06:03 · Score: 1

PPS: Given your custom IPC for Python, could you go us one further and write an OSGi for Python using it? Pretty please! ;)
:) i'd love to but sadly it's one of the [few] contracts where i was in a proprietary environment. if i meet a software libre project some time in the future that needs that kind of stuff i'll certainly attempt to recreate it but it would need to be at least a year before i consider that.

Re:(not)perplexingly by lkcl · 2014-10-17 08:28 · Score: 1

It doesn't matter how awesome someone thinks their Python-LMDB project is. It doesn't matter how important someone thinks their Python-LMDB project is.

the mistake you've made has been raised a number of times in the slashdot comments (3 so far). the wikipedia page that was deleted was about LMDB, not python-lmdb. python-lmdb is just bindings to LMDB and that is not notable in any significant way.

Deletionists by Anonymous Coward · 2014-10-17 08:56 · Score: 1

My theory is that they have a craving to exercise power over others, and Wikipedia deletion is as close as they can get to that goal. If they were intelligent enough and introspective enough to figure out their own motives, they'd be cops, teachers, politicians or drill sergeants. But since they either aren't smart enough or aren't self-aware enough to see this, and they get a righteousness buzz whenever they delete somebody else's work, they'll look for clever-sounding rationales to justify their behavior to themselves so they can continue getting that buzz.

But maybe I'm just cynical.

Re:Submitter doesn't understand Wikipedia notabili by Pinky's+Brain · 2014-10-17 09:09 · Score: 1

From an English language point of view? Yes.

Re:perplexingly by i+kan+reed · 2014-10-17 12:03 · Score: 1

Not quite true. Very recently registered and non-registered users can't create articles. There's a page for them to suggest those articles to people with the amazing skill of creating a username and password.

Re:Oh my... by K.+S.+Kyosuke · 2014-10-17 12:38 · Score: 1

i can't go into the details because i was working in a secure environment

What?

but basically even though i was the one that wrote the code i was taken aback that *python* - a scripted programming language - was capable of such extreme processing rates

Given that these are basically FFI memory accesses, I don't think how one could find that surprising. Although it would be interesting to see it stacked up against DataDraw.

--
Ezekiel 23:20

Well, this is to be used for the benefit of all... by kefalonia · 2014-10-17 13:51 · Score: 1

... only hypothetically, via Pine Digital's business which prominently displays in wikileaks: https://www.wikileaks.org/spyf...

Re:Would it hurt ... by hyc · 2014-10-17 17:06 · Score: 1

CouchDB is a pure append-only design which means that within a few dozen write operations, 90+% of its space is filled with out-of-date records. It requires frequent periodic compaction phases, and each compaction phase has a significant negative impact on latency and throughput. LMDB requires no compaction, and provides consistent latency and throughput at all times.

They are similar in that both use COW, but the similarity ends there.

--
-- *My* journal is more interesting than *yours*...

Re:Would it hurt ... by hyc · 2014-10-17 17:33 · Score: 1

MongoDB uses mmap but the similarity ends there. It uses a journal, not COW. It suffers from a number of durability and consistency vulnerabilities. LMDB has no such weaknesses.

http://www.slideshare.net/mong...

This research group at University of Wisconsin cites 1 vulnerability for LMDB, but they were mistaken:

https://www.usenix.org/confere...

http://www.openldap.org/lists/...

--
-- *My* journal is more interesting than *yours*...

Re:Submitter doesn't understand Wikipedia notabili by hyc · 2014-10-17 17:44 · Score: 1

A couple new answers to that question have popped up in the intervening time.

https://www.usenix.org/confere...

--
-- *My* journal is more interesting than *yours*...

Re:(not)perplexingly by dotancohen · 2014-10-18 01:57 · Score: 1

Wikipedia editors aren't allowed to "know stuff" about a topic. The No Original Research policy mandates that facts and information must be Verifiable in published Reliable Sources.

Thank you. That seems to explain why Bjork's dress has it's own wikipedia article.

--
It is dangerous to be right when the government is wrong.

Re:Oh my... by anon+mouse-cow-aard · 2014-10-18 02:02 · Score: 1

as someone who led a project to replace a suite of 750 kloc apps written in compiled languages with 20 klocs of python that performed between 4x and 100x better, I do not find it perplexing at al. Concision is a value in and of itself. It brings better algorithms within reach, when the implementation in C would collapse under the effort of getting it right. There is nothing that stops you from re-implementing in C after you have explored and implemented in python. In practice, I have never found it necessary to do so, by ymmv.

Re:Would it hurt ... by Lennie · 2014-10-18 03:26 · Score: 1

Loved the talk at FOSDEM about OpenLDAP and LMDB:
https://www.youtube.com/watch?...

I was hoping it would be adopted by the Influxdb developers but it seems to not be a perfect (performance) fit:
http://influxdb.com/blog/2014/...

--
New things are always on the horizon

Re:(not)perplexingly by Alsee · 2014-10-18 11:14 · Score: 1

You mixed up the policies. No Original Research is unrelated to why Bjork's Academy Awards dress has it's own Wikipedia article. No Original Research is why the article doesn't contain any new ideas or opinions by the article-writers themselves. The article accurately describes what The World has to say about the dress. The article has 13 sources cited 18 times providing external documentation for almost every sentence in the article.

The policy you wanted was "Wikipedia editors aren't allowed to decide how 'important' a topic is... Wikipedia Notability means that multiple independent Reliable Sources have published significant discussion of the subject." The World decides what is and isn't Notable, not me. As a Wikipedia editor I'm not allowed the opinion that it's embarrassment to humanity that Academy-Awards-Dresses are considered newsworthy. (I can have the opinion, but I can't delete the article based on my opinion.)

The sources include: telegraph.co.uk, shine.yahoo.com, Filmology: A Movie-a-Day Guide to the Movies You Need to Know ISBN 978-1-4405-0753-3, All about Oscar: The History and Politics of the Academy Awards ISBN 978-0-8264-1452-6, Vanity Fair magazine, Spin magazine, New York magazine, Reel Winners: Movie Award Trivia ISBN 978-1-55002-574-3, BjÃrk: wow and flutter ISBN 978-1-55022-556-3, The Advocate magazine, today.msnbc.msn.com. And there is no doubt that there are countless other uncited sources that exist. The World has clearly decided that this topic is worthy of significant published coverage.

By the way, this particular article has been getting around 55 pageviews a day. That's a lot higher than many of our more serious minor topics. Apparently there are a fair number of people coming to Wikipedia searching for this article.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Re:(not)perplexingly by dotancohen · 2014-10-19 01:06 · Score: 1

I see, thank you.

--
It is dangerous to be right when the government is wrong.

Slashdot Mirror

Python-LMDB In a High-Performance Environment

64 of 98 comments (clear)