MemSQL Makers Say They've Created the Fastest Database On the Planet
mikejuk writes "Two former Facebook developers have created a new database that they say is the world's fastest and it is MySQL compatible. According to Eric Frenkiel and Nikita Shamgunov, MemSQL, the database they have developed over the past year, is thirty times faster than conventional disk-based databases. MemSQL has put together a video showing MySQL versus MemSQL carrying out a sequence of queries, in which MySQL performs at around 3,500 queries per second, while MemSQL achieves around 80,000 queries per second. The documentation says that MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory, and that using a combination of write-ahead logging and snapshotting ensures your data is secure. There is a free version but so far how much a full version will cost isn't given." (See also this article at SlashBI.)
It sounds cool, but we can get 200k iops on Raid10 SSD without degradation.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
Really? Accessing RAM is faster than accessing a disk? What a novel discovery!
It seems to me that MySQL can also be run in memory. Apparently that's how the clustered database works (or used to work). I've never tried it, but let's see some benchmarks between MemSQL and an entirely memory-based MySQL.
"You cannot simultaneously prevent and prepare for war." -- Albert Einstein
When I think of fast databases to compare to, the first thing I think of is MySQL.
/Actually, I'd rather see a comparison to Pick or other lightning fast MV dbs
Show me benchmarks vs Oracle, PostgreSQL or SQLServer. Spare me the comparison with MySQL or some other toy.
Ok, so both article and video is extremely thin on details, the explanation for the massive performance is pretty much gibberish and their argumentation for ACID compliance is bullshit.
Just leaves me with the question, what are they trying to get out of this BS?
Give me fast enough, robust, easy to administer and standards compliant. Maybe a little less fast means you throw more hardware at a problem, but it doesn't matter if overall the overall cost and risk is inflated. A platform decision boils down to three things: (1) is it good enough; (2) is it economical; (3) if we decide later this doesn't work for us, are we totally screwed.
In any case, there's no meaningful way you can make a claim that a database management system is the fastest on the planet. All you have is benchmarks, and different benchmarks apply to different use-cases.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
I wouldn't run my toaster on software engineered by someone from Facebook, let alone a database. I'd have to spend ten minutes searching for my toast, and it would show up the following week.
And then the next week, your toast would have changed from white bread to wholegrain and you're just going to have to get used to it.
As a long time SysAd/webmaster/developer, I'm certainly interested
At the risk of sounding incredibly condescending....
If you were really a sysadmin who could benefit from that kind of speed improvement, you'd know that it's possible to achieve that level of performance with MySQL already, by either running it from memory or by using a fast hard drive array. The simplest/cheapest option to drastically improve MySQL performance is to throw a large amount of RAM at a system and point MySQL at the memory. MySQL can be configured to keep the database in active memory and sync to the disk on a regular basis, which is almost exactly the kind of behaviour described for MemSQL... for an exceptionally large database that can't be stored in system memory, I imagine that the advantage that MemSQL is boasting would evapourate. There are other ways to go about doing it, such as running a fast disk array or a cluster, in order to get around the limitations of using RAM, but ultimately the prime determining factor for speed in MySQL is speed of access to the database file itself.
The most over-the-top DB God I know started in Pick-land (ca 1972?). Although he does (is forced to?) use SQL nowadays, he thinks in ways that do not come out of any SQL DBA handbook. As a result he gets DBMSs to do things that are ... unnatural.
He is currently doing some data-cubing stuff for us that I didn't think could be done with something less than a DOD budget. He says his touchstone is thinking in Pick and then 'translating' to SQL.
I still think that the 2 missing courses from any CS degree program are 1) how to debug, and 2) history of computing.
Speed's fine, but what kind? Or more specifically, over what timeframe? High transaction rates are fine, but they don't do any good if you can only sustain them for a few seconds or minutes before the whole thing collapses. I want to know the transaction rate the thing can sustain over 24 hours of continuous operation. In the real world you have to be able to keep processing transactions continuously.
That long-time-period test also shows up another potential problem area: disk bottleneck. In-memory's fine, but few serious databases are small enough to fit completely in memory. And even if it will fit, you can't lose your database when you shut down to upgrade the software so eventually the data has to be written to disk. And that becomes a bottleneck. If your system can't flush to disk at least as rapidly as you're handling transactions, your disk writes start to lag behind. Sooner or later that'll cause a collapse as the buffers needed to hold data waiting to be written to disk compete for memory with the actual data. You can play algorithmic games to minimize the competition, but sooner or later you run up against the hard wall of disk throughput. And the higher your transactions rates are, the harder you're going to hit that wall.
They did have an ad to lure in "Top Coders" at http://developers.memsql.com/blog/
Apart from their ad, what they said about Top Coders was interesting - with the exception of top coders memorizing who books filled with algorithms, because top coders do not memorize nothing - top coders do not get to be top coders by memorizing.
Instead, top coders have that instinct to _know_ which algorithm to adapt and apply, and top coders know where (and how to) look for the algorithm (either from their own archive, from books, from old magazines, or from some strange corners on the Web)
Muchas Gracias, Señor Edward Snowden !
SSD is significantly faster than HDD at both sequential and random writes. Top 15K SAS drives write ~250MB/s sequential. Top SSD write 550MB/s sequential. Write random and it gets much worse for the SAS drive. Try to even find an enterprise HDD benchmark done in the last year. No one bothers because enterprise buys SSD if they care about performance.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
I've had a love-hate relationship with MySQL for over ten years now, and have as much cause to hate it as anyone, but I have to point this out. Read the MemSQL docs carefully, and here's the killer - they only support single-query transactions, and only at isolation level READ COMMITTED.
Until those two facts change, then its hardly a fair comparison.
I work on a system like that right now in a really big company. Let me tell you something- it's shit. If you need concurrent access to the files/directories by several processes, you'll have a heap of issues. Consumers pick up files before they are completely written by the producers (now fixed by file renaming, but required work). Sime directories now hold 300k files, and any file operations are extremely slow- filesystems aren't designed for this (in process of being fixed by splitting directories squid style into many subdirectories). On top of that, because we need to access the repository from multiple machines and we want reliability, it's now on NAS. This means any file open/close operation takes ~6 ms, at least 2 ping trips to the NAS server and back. This means 166 IO operations per second and no more. You want notification when a file arrives- you get to use polling. FS file modification notifications don't work on NAS. It is terrible. I wasn't there when this thing was designed, but I'd like to find the architector who thought this is a good idea and punch him in the face.