Facebook Trapped In MySQL a 'Fate Worse Than Death'
wasimkadak writes with this excerpt from GigaOM: "According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to 'a fate worse than death,' and the only way out is 'bite the bullet and rewrite everything.' Not that it's necessarily Facebook's fault, though. Stonebraker says the social network's predicament is all too common among web startups that start small and grow to epic proportions."
I love the snippets "After all, he explained, SQL was created decades ago before the web, mobile devices and sensors forever changed how and how often databases are accessed" from the article and "We’ve been using stonge age technology to solve problems that didn’t exist 30 years ago." Yes, the problems existed 30 years ago, such as (land-line) telephone billing. I don't know how those problems were solved -- probably with a mainframe and a custom non-SQL database and not a PC running a SQL-based server -- but they were solved.
Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.
Presumably he's tired of Facebook being used as a counter-example to everything he's been preaching.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
A minor difference that exists in 4,000 instances and who knows how many places in the code that's also distributed across multiple servers, isn't minor, especially when there are hundreds or even thousands of minor differences.
And no, the differences in SQL between Oracle and MySQL aren't minor. It's not just syntax, and it's not MySQL-can-Oracle-can't. It's the performance characteristics of various queries, the logic of how they're implemented, and the incredible investment in configuring a large cluster to work smoothly (which MySQL and Oracle do extremely differently. Large scale systems add a layer of complexity all their own that's a totally separate engineering challenge.
Short version: Switching from MySQL to anything else would be the equivalent to a ground-up rewrite, though this is largely true of any database system. MySQL hasn't somehow uniquely trapped them here.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
It's hilarious how the NoSQL fools are now constantly backpedalling these days.
It turns out that writing database queries in JavaScript is a stupid idea! Imagine that! All of their attempts to invent a better query language end up being almost identical to, guess what, SQL!
Then they realize that trying to maintain data consistency using logic written in JavaScript, Ruby or PHP doesn't work so well. Values go unconstrained, and the referential integrity gets fucked up. Soon the data is nearly worthless.
The smarter/less-ignorant ones then think that they'll just use transactions. But wait, their NoSQL database of choice doesn't support that, or doesn't support it properly. So they tell themselves that their data will become "eventually consistent", or worse, they try to implement some shitty ass "transaction" support using Ruby. Regardless of the path chosen, failure is the result.
Now they're realizing that it's mandatory to use a real relational database when working on anything remotely serious. So we see this bullshit about "no" now meaning "not only". That's funny, last month it meant "no", as in, "we will never write a SQL query again, and we will never use a relational database again."
I'm going to make a prediction: Next month, we'll get to read articles and comments from them about these amazing new database systems that they've just discovered. These new systems avoid all of the problems associated with NoSQL databases! What are their names? Oracle, DB/2, SQL Server, PostgreSQL and SQLite.
If you wan't to start a fun/interesting project that you didn't expect any revenue from, it would make more sense to use free software. MySQL is a popular choice for web applications and there is a lot of freely available documentation and examples available. Many people have been successful doing it, so it's a proven path that works.
Oracle is expensive. It would have cost a fortune to start Facebook with Oracle, and I can't imagine what it would cost them now. But even if they have to hire a ton of experts to convert to Oracle( assuming that is the best thing to do...) They can probably be funded by the money saved by not using Oracle over the past couple of years.
Maybe Oracle would have been a mistake, there are companies migrating from Oracle to DB2/DB2 to Oracle/Oracle to Sybase/Sybase to MySQL/Mainframe to AIX/AIX to Solaris/Solaris to Linux/etc.. It seems like nobody can agree to the best hardware/OS/database solution, but there are plenty of people who swear that the solution they know is the best one.
If anything, it's a success story for MySQL.
This isn't true. I just migrated an application from MySQL 4.1 to Postgresql 9.0 at work. It took me about two weeks, but certainly not a complete rewrite from scratch. It varies greatly on the application, the language it's written in, frameworks in use, and the number of product specific features in use. This was a perl / mason app.
If an application was making extensive use of stored procedures, then it would require a lot of effort to rewrite those, but not the whole application. If the application were written in C, it would be a lot of work to change. I think facebook uses PHP and that's not too hard to change out especially if they were sane and used an abstraction layer like PDO.
If the app were written in Java or .NET and using an ORM, it would be TRIVIAL to change to another database.
With my experience, the biggest problems were date functions and the fact that MySQL embeds index creation in the create table syntax whereas postgres requires it be separate and the names of indexes are global. This meant that I had some work cut out for me changing index names. There were also a few quirks with some join queries as MySQL is not picky about ordering in the from clause.
You are correct that they'll have to tune queries and things, but it's not a total rewrite if they wrote their app in a reasonable way.
For the record, Postgresql 9 is faster for many of our queries but seems slower doing INSERT. YMMV
MidnightBSD: The BSD for Everyone
The guy in the article does have some cred. He was a professor at UC Berkeley for 29 years where he was project leader on Ingres and led the creation of its follow up, Postgres.
His new database, VoltDB, based on the 'NewSQL' ideas touched on in the article, is Free Software licensed under the GPLv3.
Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.
Right.
Some of the key architects of Facebook have spoken at Stanford about how the system is put together, and I went to that presentation and had a chance to talk to them. They didn't consider MySQL to be a bottleneck. Their big problem was PHP performance. They were writing a PHP compiler to fix that.
Internally, the user-facing side of Facebook is in PHP. But the front end machines don't talk directly to the databases. They use an RPC system to talk to other machines that do the "business logic" parts of the system. Building a Facebook reply page may involve a hundred machines. There's heavy caching all over the system, of course, so the databases aren't hit for most read requests.
The RPC system isn't HTML, JSON, or SOAP. It's a binary system that doesn't require text parsing. Otherwise, RPC would be the bottleneck.
This makes for a flexible, easy to enhance system. New services go in new machines, which talk to existing machines.
(reposting as a logged in user) I wrote a bit longer response to this:
stonebraker trapped in stonebraker 'fate worse than death'
I think I know a bit more about database situation inside FB than Mr.Stonebraker. Go figure.