MemSQL Makers Say They've Created the Fastest Database On the Planet
mikejuk writes "Two former Facebook developers have created a new database that they say is the world's fastest and it is MySQL compatible. According to Eric Frenkiel and Nikita Shamgunov, MemSQL, the database they have developed over the past year, is thirty times faster than conventional disk-based databases. MemSQL has put together a video showing MySQL versus MemSQL carrying out a sequence of queries, in which MySQL performs at around 3,500 queries per second, while MemSQL achieves around 80,000 queries per second. The documentation says that MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory, and that using a combination of write-ahead logging and snapshotting ensures your data is secure. There is a free version but so far how much a full version will cost isn't given." (See also this article at SlashBI.)
It sounds cool, but we can get 200k iops on Raid10 SSD without degradation.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
Show me benchmarks vs Oracle, PostgreSQL or SQLServer. Spare me the comparison with MySQL or some other toy.
Ok, so both article and video is extremely thin on details, the explanation for the massive performance is pretty much gibberish and their argumentation for ACID compliance is bullshit.
Just leaves me with the question, what are they trying to get out of this BS?
The most over-the-top DB God I know started in Pick-land (ca 1972?). Although he does (is forced to?) use SQL nowadays, he thinks in ways that do not come out of any SQL DBA handbook. As a result he gets DBMSs to do things that are ... unnatural.
He is currently doing some data-cubing stuff for us that I didn't think could be done with something less than a DOD budget. He says his touchstone is thinking in Pick and then 'translating' to SQL.
I still think that the 2 missing courses from any CS degree program are 1) how to debug, and 2) history of computing.
Speed's fine, but what kind? Or more specifically, over what timeframe? High transaction rates are fine, but they don't do any good if you can only sustain them for a few seconds or minutes before the whole thing collapses. I want to know the transaction rate the thing can sustain over 24 hours of continuous operation. In the real world you have to be able to keep processing transactions continuously.
That long-time-period test also shows up another potential problem area: disk bottleneck. In-memory's fine, but few serious databases are small enough to fit completely in memory. And even if it will fit, you can't lose your database when you shut down to upgrade the software so eventually the data has to be written to disk. And that becomes a bottleneck. If your system can't flush to disk at least as rapidly as you're handling transactions, your disk writes start to lag behind. Sooner or later that'll cause a collapse as the buffers needed to hold data waiting to be written to disk compete for memory with the actual data. You can play algorithmic games to minimize the competition, but sooner or later you run up against the hard wall of disk throughput. And the higher your transactions rates are, the harder you're going to hit that wall.
They did have an ad to lure in "Top Coders" at http://developers.memsql.com/blog/
Apart from their ad, what they said about Top Coders was interesting - with the exception of top coders memorizing who books filled with algorithms, because top coders do not memorize nothing - top coders do not get to be top coders by memorizing.
Instead, top coders have that instinct to _know_ which algorithm to adapt and apply, and top coders know where (and how to) look for the algorithm (either from their own archive, from books, from old magazines, or from some strange corners on the Web)
Muchas Gracias, Señor Edward Snowden !
Not just that - you can get a FusionIO ramdisk device for really big databases and get performance that's somewhere between SSD and memory. Those are all battery backed and such, so no monkeying around with whether the ACID was done right or not.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
So what is the difference between MemSQL and TimesTen?
Other than the 16 years TimesTen has been out longer, the fact that Oracle now owns TimesTen, that it runs on both 32bit and 64bit Linux and Windows, that it can run in front of another database engine to give it a boost, and that it has customer installations up to the Terabyte range.
Just another lame attempt to reinvent the wheel.
It's a bit more complex. There's four main ways to do MySQL storage in RAM (which I know of because my current work project is a MySQL application).
First, the NDB Cluster system is there, which is what you've mentioned. That's basically just a MySQL frontend to a distributed, memory-based NoSQL database, though. Convenient, but not truly "MySQL".
The second is using the "Memory" storage engine, where it actually stores a normal MyISAM table in memory. However, this is a surprisingly crappy option, because it uses table-level locks for writing, so parallel write performance is only marginally faster than disk.
The third is to store regular InnoDB tables on a ramdisk. This can be crazy fast, but it also means that if your server crashes or loses power, you're *fucked*
The fourth is to use Memcached, which isn't really a MySQL thing at all. You're basically just caching data in a memory-only NoSQL database, at the application level. This is actually what we ended up doing, because all the others are pretty crappy options - Cluster is the best one, but the hardware requirements are higher than we could justify spending given our performance requirements. Shoving memcached onto the web server (which has RAM to spare) and setting certain queries to cache their results there sped things up significantly, at minimal cost.
As far as I can tell from the summary (I refuse to read the articles for such a blantant slashvertisement), this "MemSQL" doesn't do anything you can't do by configuring MySQL properly, although they likely optimized some rarely-used modules to make them faster.
The biggest issue with RAM drives are their cost.
Yes and no. If you can fit the Innodb writeahead-logs and a few of the worst bottleneck tables on, say, an 8 GB ram drive, it's a bargain.
HyperDrive: $300
2 * 4GB 240-Pin DDR2-800 SDRAM ECC: $234
16 GB CF card for backup: $30
Total: $564
That's downright cheap compared to what a RAID 10 or 50 of SSDs or short-stroked 10k/15k rpm drives would cost.
If it solves a bottleneck, it could be a big money saver.
I work on a system like that right now in a really big company. Let me tell you something- it's shit. If you need concurrent access to the files/directories by several processes, you'll have a heap of issues. Consumers pick up files before they are completely written by the producers (now fixed by file renaming, but required work). Sime directories now hold 300k files, and any file operations are extremely slow- filesystems aren't designed for this (in process of being fixed by splitting directories squid style into many subdirectories). On top of that, because we need to access the repository from multiple machines and we want reliability, it's now on NAS. This means any file open/close operation takes ~6 ms, at least 2 ping trips to the NAS server and back. This means 166 IO operations per second and no more. You want notification when a file arrives- you get to use polling. FS file modification notifications don't work on NAS. It is terrible. I wasn't there when this thing was designed, but I'd like to find the architector who thought this is a good idea and punch him in the face.
Seriously, why do people, and then I mean slashdot nerds, think 'fast database' and then think 'mysql' ? 'MySQL-compatible' equals 'bad' in my world and, in comparison with Oracle, 'not so fast at all'.
Religion is what happens when nature strikes and groupthink goes wrong.