MemSQL Makers Say They've Created the Fastest Database On the Planet
mikejuk writes "Two former Facebook developers have created a new database that they say is the world's fastest and it is MySQL compatible. According to Eric Frenkiel and Nikita Shamgunov, MemSQL, the database they have developed over the past year, is thirty times faster than conventional disk-based databases. MemSQL has put together a video showing MySQL versus MemSQL carrying out a sequence of queries, in which MySQL performs at around 3,500 queries per second, while MemSQL achieves around 80,000 queries per second. The documentation says that MemSQL writes back to disk/SSD as soon as the transaction is acknowledged in memory, and that using a combination of write-ahead logging and snapshotting ensures your data is secure. There is a free version but so far how much a full version will cost isn't given." (See also this article at SlashBI.)
It sounds cool, but we can get 200k iops on Raid10 SSD without degradation.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
Really? Accessing RAM is faster than accessing a disk? What a novel discovery!
It seems to me that MySQL can also be run in memory. Apparently that's how the clustered database works (or used to work). I've never tried it, but let's see some benchmarks between MemSQL and an entirely memory-based MySQL.
"You cannot simultaneously prevent and prepare for war." -- Albert Einstein
When I think of fast databases to compare to, the first thing I think of is MySQL.
/Actually, I'd rather see a comparison to Pick or other lightning fast MV dbs
Show me benchmarks vs Oracle, PostgreSQL or SQLServer. Spare me the comparison with MySQL or some other toy.
Ok, so both article and video is extremely thin on details, the explanation for the massive performance is pretty much gibberish and their argumentation for ACID compliance is bullshit.
Just leaves me with the question, what are they trying to get out of this BS?
Give me fast enough, robust, easy to administer and standards compliant. Maybe a little less fast means you throw more hardware at a problem, but it doesn't matter if overall the overall cost and risk is inflated. A platform decision boils down to three things: (1) is it good enough; (2) is it economical; (3) if we decide later this doesn't work for us, are we totally screwed.
In any case, there's no meaningful way you can make a claim that a database management system is the fastest on the planet. All you have is benchmarks, and different benchmarks apply to different use-cases.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
What you have there is (or may be) the fastest database management system.
I have the worlds fastest database. One table, one record, and one field (NULL).
Have gnu, will travel.
I wouldn't run my toaster on software engineered by someone from Facebook, let alone a database. I'd have to spend ten minutes searching for my toast, and it would show up the following week.
And then the next week, your toast would have changed from white bread to wholegrain and you're just going to have to get used to it.
As a long time SysAd/webmaster/developer, I'm certainly interested
At the risk of sounding incredibly condescending....
If you were really a sysadmin who could benefit from that kind of speed improvement, you'd know that it's possible to achieve that level of performance with MySQL already, by either running it from memory or by using a fast hard drive array. The simplest/cheapest option to drastically improve MySQL performance is to throw a large amount of RAM at a system and point MySQL at the memory. MySQL can be configured to keep the database in active memory and sync to the disk on a regular basis, which is almost exactly the kind of behaviour described for MemSQL... for an exceptionally large database that can't be stored in system memory, I imagine that the advantage that MemSQL is boasting would evapourate. There are other ways to go about doing it, such as running a fast disk array or a cluster, in order to get around the limitations of using RAM, but ultimately the prime determining factor for speed in MySQL is speed of access to the database file itself.
Or its nowadays name: CACHE? The best, the fastest, and the most reliable commercial database on the planet? Common, guys, get real.
History
The ARPANET, the predecessor of the Internet, had no distributed host name database. Each network node maintained its own map of the network nodes as needed and assigned them names that were memorable to the users of the system. There was no method for ensuring that all references to a given node in a network were using the same name, nor was there a way to read the hosts file of another computer to automatically obtain a copy.
The small size of the ARPANET kept the administrative overhead small to maintain an accurate hosts file. Network nodes typically had one address and could have many names. As local area TCP/IP computer networks gained popularity, however, the maintenance of hosts files became a larger burden on system administrators as networks and network nodes were being added to the system with increasing frequency.
http://en.wikipedia.org/wiki/Hosts_(file)
Dilbert RSS feed
They're durable and synchronously log all changes to disk, so what makes them faster? They do say this, from: http://developers.memsql.com/docs/1b/durability.html
Reconfigure the server to use a faster disk. MemSQL exclusively relies on sequential (not random) disk writes, so using an SSD will dramatically improve durability write performance.
Are SSDs better at sequential writes? I thought their advantage was random reads, and they weren't any faster at writes then HDDs. Also, the data would become hopelessly out of order by only doing sequential writes, unless they're periodically re-writing all the data in order, which would mean lots more I/O then a typical DB.
tomorrow who's gonna fuss
I'd love to see their tests when this DB needs to go into swap / pagefile. It's double the slowdown, needs to write into the swap (disk I/O) and then sync the DB (disk I/O again).
I can't, for the life of me, understand where this will be better than the already available options.
The most over-the-top DB God I know started in Pick-land (ca 1972?). Although he does (is forced to?) use SQL nowadays, he thinks in ways that do not come out of any SQL DBA handbook. As a result he gets DBMSs to do things that are ... unnatural.
He is currently doing some data-cubing stuff for us that I didn't think could be done with something less than a DOD budget. He says his touchstone is thinking in Pick and then 'translating' to SQL.
I still think that the 2 missing courses from any CS degree program are 1) how to debug, and 2) history of computing.
Speed's fine, but what kind? Or more specifically, over what timeframe? High transaction rates are fine, but they don't do any good if you can only sustain them for a few seconds or minutes before the whole thing collapses. I want to know the transaction rate the thing can sustain over 24 hours of continuous operation. In the real world you have to be able to keep processing transactions continuously.
That long-time-period test also shows up another potential problem area: disk bottleneck. In-memory's fine, but few serious databases are small enough to fit completely in memory. And even if it will fit, you can't lose your database when you shut down to upgrade the software so eventually the data has to be written to disk. And that becomes a bottleneck. If your system can't flush to disk at least as rapidly as you're handling transactions, your disk writes start to lag behind. Sooner or later that'll cause a collapse as the buffers needed to hold data waiting to be written to disk compete for memory with the actual data. You can play algorithmic games to minimize the competition, but sooner or later you run up against the hard wall of disk throughput. And the higher your transactions rates are, the harder you're going to hit that wall.
They did have an ad to lure in "Top Coders" at http://developers.memsql.com/blog/
Apart from their ad, what they said about Top Coders was interesting - with the exception of top coders memorizing who books filled with algorithms, because top coders do not memorize nothing - top coders do not get to be top coders by memorizing.
Instead, top coders have that instinct to _know_ which algorithm to adapt and apply, and top coders know where (and how to) look for the algorithm (either from their own archive, from books, from old magazines, or from some strange corners on the Web)
Muchas Gracias, Señor Edward Snowden !
Newsflash: servers come with up to 2 TB RAM now.
Help stamp out iliturcy.
I meant 2 disk access, some or another. From what I read they would never be simultaneous anyways.
Either way, this would be useful (actually IS, some solutions do this) in the Business Intelligence field. But the whole point of keeping everything in memory is moot when you have petabytes of information that you need to process during your ETL. What matters in this database is, how well does it behave in a cluster and how would it handle concurrency (ACID? Eventually synchronized?).
I doubt this is all that useful for common DB applications like websites and the like. Relational DB's have been proving to be enough for everything (ex: Youtube uses mysql shards - or used to) purely web related for a while now, I doubt this is a gamechanger at all.
If you were really a sysadmin who could benefit from that kind of speed improvement, you'd know that it's possible to achieve that level of performance with MySQL already, by either running it from memory or by using a fast hard drive array.
The guys that wrote it are former Facebook employees. So I have to assume they know how to get the best performance out of MySQL, and that itdoesn't suit their needs for whatever reason.
The article doesn't really go into much detail about why, but my point is really about not jumping to conclusions and admonishing someone because you think you know more than they do. Maybe this whole product is useless, and maybe it's brilliant and useful, but you can't determine that soley from this article.
Oh but come on. Their engineers are super leet! To work at Facebook, you have to win a drunken speed-hacking contest just to be a PHP coder!
"You cannot simultaneously prevent and prepare for war." -- Albert Einstein
So what is the difference between MemSQL and TimesTen?
Other than the 16 years TimesTen has been out longer, the fact that Oracle now owns TimesTen, that it runs on both 32bit and 64bit Linux and Windows, that it can run in front of another database engine to give it a boost, and that it has customer installations up to the Terabyte range.
Just another lame attempt to reinvent the wheel.
Remember the good old days, when XYZ-db wasn't always available (or even disirable)? we used to use files.
Yea, files. Novel concept, these days, mention ISAM to someone and they don't know what you're talking about!
If you really need speed, maybe a database isn't your best bet. Maybe, just maybe, you should consider structuring the data in a way that makes sense for your application using files.
Just say no to swap. It's pointless, except as a crutch for broken software. And it's dangerous on a server. If an application wants disk-backed VM, it can use mmap.
Swap isn't just a crutch for broken software (though it can be), sufficient RAM is not always available. In a perfect world, all servers would have more RAM than their applications ever need, more cores than the processes can take advantage of, and all disks would be RAID-10 arrays of SSD's.
But back in the real world where most of us have to live, swap does come into use at times to let a server accommodate loads that it otherwise couldn't handle due the memory footprint of the software it's running. Swap doesn't have to be a death knell for the server - some light or even moderate swapping can mean that you're using the server more efficiently - especially when you're running on VMWare and it wants to reclaim some RAM of its own via the memory balloon driver. When VMWare is under memory pressure, it's better to let Linux decide what to swap out than to let VMware swap memory out from under the virtual machine.
I'm not so sure that this database would fly that fast if it was running on a beowulf cluster of Raspberry Pi with OSX.
c++;
I've had a love-hate relationship with MySQL for over ten years now, and have as much cause to hate it as anyone, but I have to point this out. Read the MemSQL docs carefully, and here's the killer - they only support single-query transactions, and only at isolation level READ COMMITTED.
Until those two facts change, then its hardly a fair comparison.
It seems to me that MemSQL is just an implementation of MySQL with the commit behavior changed to something like "BATCH, NO WAIT." This would normally introduce a period of time when the transaction could be lost before it is written to disk, if there were a power outage or something, but with battery backups on enterprise RAID cards, the transaction should still be saved in the RAM on the RAID card. I think it would still be possible to lose the transaction if the server crashed mid-transaction, so perhaps this is a safe implementation of BATCH, NO WAIT on MySQL?
Shamgunov has excellent credentials in the database world, in spite of having worked at Microsoft on SQL Server for six years.
FTFY
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
MySQLs handlersocket (included since 5.5) does NoSQL-style read and write operations bypassing the SQL engine. While it has some limitations, it will do >200,000 queries/sec on a low-spec server and there are benchmarks of it doing >750,000 on a 8-core Nehalem (faster than Memcached!), and it's not restricted to in-memory operations. The nice thing is that you can use that for the simpler parts of your app, then use transactional SQL on the same database for more complex operations.
Another one to look at is TokuTek's TokuDB, another InnoDB drop-in replacement, which is particularly good for inserts, low disk use and low-latency replication. They ran a demo doing 1 billion indexed inserts in 7 hours when InnoDB took a week.
For distributed 'cloudy' apps, one of the better choices is Drizzle, which retains the nice bits of MySQL (and MySQL client compatibility) and rewrites all the rest.
I don't think I'll believe MemSQL until Percona have benchmarked it...