Facebook Trapped In MySQL a 'Fate Worse Than Death'
wasimkadak writes with this excerpt from GigaOM: "According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to 'a fate worse than death,' and the only way out is 'bite the bullet and rewrite everything.' Not that it's necessarily Facebook's fault, though. Stonebraker says the social network's predicament is all too common among web startups that start small and grow to epic proportions."
Well. then they convert from one db to another. So what. its not like that would be a completely new thing to happen, and i am sure that oracle or any other big db provider will send experts to help with the task.
Once they started the trend to grow beyond being a toy, they should have redone things right then.
Waiting until you are painted in a corner is irresponsible.
---- Booth was a patriot ----
I was under the impression that there was no feasible way, performance wise, to run something as big as Facebook without using a non-relational database system.
Am I mistaken?
Database vendor says someone elses database is rubbish and they'd be much better to use theirs.
This isn't news.
Why do I feel like I read this a couple days ago?
Professor Plum: What are you afraid of, a fate worse than death?
Mrs. Peacock: No, just death, isn't that enough?
...a product called NewSQL.
Even his product name is an indication that SQL is not the evil it is professed to be.
I don't really understand what is bad about this. Facebook is this big of a site and it seems to work great. Good Job mysql. But most information sites like this run some kind of database back end. I don't know what he would really goto after this. Oracle perhaps? Maybe something from IBM. Forgive me I'm not a programmer so I wouldn't know what would be the best backend for a site like this. But anytime you start out using something. You can get stuck int he world of software.
TTFN,
Jake_Paws
IdahoFur
When you have several billion from investors, you can rewrite your stuff later.
FYI I plan to use PostgresSQL and perhaps a little noSQL in the mix to reduce the need to use joins for multiple tables (where SQL really hurts when doing ACID). I am just not a DBA person nor do I have the experience to learn how to do relational math with a no-SQL database.
Anyway PHP is another problem for such a large scale site. Wasn't Facebook the one who had to rewrite php in C++? I plan to switch to Java or C#.NET later on when I have money, Facebook has the cash to do this. The only downside is it will take hundreds of thousands if not millions of lines of code in either platform and rushing programmers is bad and creates the same problem of bad design in the first place. Are there any frameworks or CMS that only require a few months instead of years to setup that can handle the needs of a company like facebook? I could only imagine it would take years to write a few million lines of code in Java that is well written and tested and this is not an option for Facebook.
http://saveie6.com/
... "Michael 'Ingres' 'Postgres' 'VoltDB' Stonebraker says 'MySQL doesn't scale'".
Delegate scalability downwards. Throw hardware at the problem.
Deleted
Google would love to disagree with you.
They would probably need to build an abstraction layer on top of MySQL. Something like "FBSQL" :) . This would create the ability to move to whatever database system they want.
Oracle databade should definitely work fine !!!
"NOSQL" doesn't mean "don't use SQL at all"; it means "not only Structured Query Language". It's possible to store some data in a relational database and other data in a non-relational database. It's also possible to store the authoritative copy of data in a relational database and various frequently used cached views in a non-relational manner.
The ex-Facebook developers who founded Quora think Facebook is stuck on [PHP] for legacy reasons, not because it's the best choice right now .
Bogtha Bogtha Bogtha
I'm a sysadmin at a large hosting provider, and we see this every day. MySQL replication is a nightmare -- if it's not done perfectly, it will break. (And it will probably break anyway) And then there's the "skip n errors on the slave and restart replication" move we do, usually knowing deep down that we'll have to rebuild the whole slave again anyway.
But MySQL is also one of the biggest causes of site slowdowns and downtime. If for example a site is being cached heavily by Varnish or Nginx, the database will rarely be queried and everything works great. But introduce a header into the code that breaks or stops caching, and peak times of the day will have regular pile-ups of MySQL queries hundreds or thousands of queries deep. Then you have to restart the daemon, and you start risking data loss when you're doing this everyday.
Really, Wordpress and Drupal are the most common offenders. The sequences of database queries they come up with are downright astoundingly bad sometimes. Literally searching millions upon millions of rows of data only to find one row in the end.
This behavior is not sustainable, from a systems perspective, or even from an environmental perspective.
Not mistaken, but the issue the NoSQL people face when trying to replicate something like a Facebook-sized cluster of relational databases is that they have to build the ACID features back in, which tends to negate the performance advantage.
Facebook isn't really using a relational database system either: they have a gigantic memcached layer on top of a gigantic MySQL layer. They have, effectively, a massive in-memory database that's continually being written back to MySQL for permanence. That's the only way they could get sufficient read performance.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
It's a complete fallacy that oracle or SQL server has any advantage over mysql or postgres. Having implemented both Oracle and mysql in large scale environments its not any easier dealing with Oracle. It's a beast to scale and many oracle consultants will simply recommend large multi-CPU systems instead of going to a parallel server mess....or whatever they're calling it these days.
Facebook would be in hell no matter what db they chose. It's more about poor design choices early on than the database software.
Facebook uses Cassandra for a lot of their storage requirements... I am sure that they use MySQL for some things too, but they have an amazing team of people who come up with stuff like Cassandra, thrift, and scribe. My guess is they will manage well enough.
I love the snippets "After all, he explained, SQL was created decades ago before the web, mobile devices and sensors forever changed how and how often databases are accessed" from the article and "We’ve been using stonge age technology to solve problems that didn’t exist 30 years ago." Yes, the problems existed 30 years ago, such as (land-line) telephone billing. I don't know how those problems were solved -- probably with a mainframe and a custom non-SQL database and not a PC running a SQL-based server -- but they were solved.
You got to love problems of companys that go large on epic proportions!
And this opinion has nothing to do with the fact that this is the guy who write PostgreSQL and he has been bitching about how MySQL has a to big market share, for years??
MySQL has been faster that PostgreSQL for years, it doesn't have as many features, but it is **fast** !!
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
What, you want to make 100 billion dollars through an IPO and you moan because you might actually have to WORK?
Seven puppies were harmed during the making of this post.
You people who think Oracle would be any better are hilarious. If they'd gone with Oracle, they'd be in even worse shape. Read the article. He's saying SQL is terrible at Facebook-scale, period.
When starting your company, these are the problems you wish you have one day. As the saying goes, "Those are good problems to have".
"all too common among web startups that start small and grow to epic proportions."
Dear editors: please review the definition of the word "common". This statement is like saying, "confusion is common among people who get hit by a meteorite while pouring hot grits on a naked and petrified Natalie Portman". 99.9% of web startups don't have to worry about this, they need to worry about running out of money...
Could it be that facebook will have vanished before any rewrite is ever done?
Reading through the interview my reaction was: "If MySQL scales well enough it can handle Facebook's load then it can handle just about anything." Really, Facebook is one of the highest traffic sites on the web, saying they use MySQL is a huge endorsement.
Raise your hand if you have a website the size of facebook? It's surprising it's lasted this long is supposedly on a limping system. I don't see what Oracle or other larger database could offer facebook at this point. I'm sure they are already doing replication, transactions, stored procedures, object caching. What more is there? Secondly who are you to even begin to criticizer, there are only a few people on this planet with sites as big as there, so just shut your trap and watch.
Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.
Presumably he's tired of Facebook being used as a counter-example to everything he's been preaching.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
MySQL is 'fast' because its lack of feature and robustness mainly. Implying maketshare means qualit.is like implying that current crappy pop music is better than Classica Music because of the marketshare they get.
That should do the trick, eh?
Oops, time to buy Oracle stock and short FaceBook, I guess!
It's hilarious how the NoSQL fools are now constantly backpedalling these days.
It turns out that writing database queries in JavaScript is a stupid idea! Imagine that! All of their attempts to invent a better query language end up being almost identical to, guess what, SQL!
Then they realize that trying to maintain data consistency using logic written in JavaScript, Ruby or PHP doesn't work so well. Values go unconstrained, and the referential integrity gets fucked up. Soon the data is nearly worthless.
The smarter/less-ignorant ones then think that they'll just use transactions. But wait, their NoSQL database of choice doesn't support that, or doesn't support it properly. So they tell themselves that their data will become "eventually consistent", or worse, they try to implement some shitty ass "transaction" support using Ruby. Regardless of the path chosen, failure is the result.
Now they're realizing that it's mandatory to use a real relational database when working on anything remotely serious. So we see this bullshit about "no" now meaning "not only". That's funny, last month it meant "no", as in, "we will never write a SQL query again, and we will never use a relational database again."
I'm going to make a prediction: Next month, we'll get to read articles and comments from them about these amazing new database systems that they've just discovered. These new systems avoid all of the problems associated with NoSQL databases! What are their names? Oracle, DB/2, SQL Server, PostgreSQL and SQLite.
Actually, if I understand it correctly he worked on Postgres, not PostgreSQL which came later.
New things are always on the horizon
This article shows that mysql can somehow handle possibly one of the largest databases in the world. Who would expect it to be easy? Hard to maintain, painful to manage but after all it works. What is the other option that would work from a small web site scale to facebook scale.
Some retards, including the ones I know in person, will read this article and start saying that mysql has poor performance.
Pentium was *fast* too, just a bit wrong sometimes :-)
If anything, it's a success story for MySQL.
Not that it's necessarily Facebook's fault
Absolute bullshit. So now people are no longer responsible for the decisions they made? Of course they are.
It has been long known MySQL is a low end, non-compliant (slightly better over the years) solution, which teaches poor SQL, poor solutions, and even worse design, generally aimed at people who don't know any better. When MySQL launched, PostgreSQL was always an option. Furthermore, Oracle even had solutions for them to grow into. Now, both solutions look dramatically better than MySQL ever has. Furthermore, commercial support and even very high end HA/clustering PostgreSQL solutions are available. There exists no valid reason to make the same dumb mistake hundreds repeatedly make every day.
So hurray for MySQL. They saved 45-minutes during their installation on day one and now they'll spend a year or two plus millions of dollars to move away from their extremely dumb and uneducated decision. That's got to be one of the most expensive 45-minutes on earth - and yet its one of the single biggest decisions which MySQL users defend on a daily basis.
Sorry, but reality has spoken. Is it Facebook's fault? Absolutely!!!! Anyone who says otherwise has lost any and all credibility and only enforces they should never be in a position to be picking a database solution in the first place.
If what this guy says is true and facebook devs have to rewrite everything the solution(as i see it) is quite simple.
Facebook user data is relatively similar across profiles. Their export application lends credit to this. Correct me if im wrong, but wouldnt it be simple to write a routine to port data from one product to the other? Test this thoroughly to make sure its bulletproof. Then run it until the job is done. To make this simple purge all the bullshit like wall posts, messages, notes and other user content. The only problem I see with this would be scale, and that can be mitigated by doing this a few thousand or million items at a time.
Well said, but there is no excuse for not doing things right first time. This just goes to show you face book is like the Death Star and it will only go one way.
Corporate businesses are so embroilled in FB and have put all their eggs in one basket, counted chickens before they have hatched, actually you can see what is on it's way.
Twitter will survive the death of facebook. Everything is under 256 chars.
All cows eat grass!
And if you need to use memcache, then it doesn't matter what database you're using. You can scale out read only nodes with most larger database systems, but it's not always a good idea.
Usually the issue is write performance. You can only write to one MySQL server in a cluster (unless using the newer mysql cluster storage engine). Some other commercial products let you partition data across multiple servers and write to different servers.
If you have a lot of read only nodes then all that data has to get replicated back out to them and that can get very slow if you're huge like facebook. I bet they use some type of crazy manual partition scheme to pull it off.
MidnightBSD: The BSD for Everyone
I was under the impression that there was no feasible way, performance wise, to run something as big as Facebook without using a non-relational database system.
Am I mistaken?
Plenty of financial institutions are working on a similar scale to Facebook, and they use SQL-based DBMSs. Facebook doesn't need transactional integrity for a lot of what they do. They don't have an elaborate set of regulations that they need to be in compliance with, and a large set of accounts that must all be constantly balanced. And Facebook can't charge a fee for every transaction that takes place.
Your standard SQL DBMS is doing OLTP, or online transactional processing. That usually means that it's running lots of small transactions that must follow business rules. Facebook has a lot of processes that can be extremely lossy, and they do a lot of analytics behind the scenes. So while they could, theoretically, shoehorn it all into Oracle, it would just be ludicrously expensive. The MySQL databases they have would be only one system out of many, probably managing things like logins and privacy and such.
The guy in the article does have some cred. He was a professor at UC Berkeley for 29 years where he was project leader on Ingres and led the creation of its follow up, Postgres.
His new database, VoltDB, based on the 'NewSQL' ideas touched on in the article, is Free Software licensed under the GPLv3.
The benchmarks I have seen are debatable. PostgreSQL performs slower for the first 100 users but quickl outscales it when access becomes heavy with hundreds of concurrent connections. I do admit this was several years ago and Mysql is adding the extra features, but views, triggers, and stored procedures are usefull in conserving memory and optimizing performance with large tables if used properly and this is where PostgreSQL shined as Mysql 4.0 did not have half of these. PostgreSQL comes iwth the most conservative settings by default too on a fresh install on any Linux distro. You need to configure it to boast performance.
Features are not just for easier development, but also to reduce the work on the RDBMS and increase performance. ... I am not a DBA so I dunno. I am learning postgresql for a project I am working on so I am little biased.
http://saveie6.com/
I wouldn't be surprised if a lot of their issues was because of their vain quest to use NoSQL. They were willing to sacrifice good design for minor speed gains and if that was their mentality all along who knows what else they did. The article also reads as another NoSQL plug but instead is plugging NewSQL. These are fads and just need to be ignored and the problems that they are intended to solve only exist because of bad code design, and exist to allow bad coders a way out of their craziness.
I don't think so. Facebook isn't known for having a lot of down time. It is known for opening up information to the public. If anything, that would be considered too much up time. I've used MySQL and PostgreSQL. I found MySQL to be limited but most limitations were easily worked around in code. PostgreSQL wasn't as limited. However, the options that it provided forced the need to vacuum the database. I would rather write code but to each his own.
Having to work for a living is the root of all evil.
While finding te story very interesting, there is no real solution at the end of it. In one or two sentences: what would facebook have to hoose as the database and then : why that option , or weigh the alternatives.
Academic purist discovers that one of the most prolific and successful database users in the world is using a system he doesn't approve of. He decides, with no insider knowledge at all, and despite all evidence to the contrary, that they should throw everything away and start over from scratch using a system that he thinks would allow them to see the performance and scalability that they've already achieved.
Right.
Some of the key architects of Facebook have spoken at Stanford about how the system is put together, and I went to that presentation and had a chance to talk to them. They didn't consider MySQL to be a bottleneck. Their big problem was PHP performance. They were writing a PHP compiler to fix that.
Internally, the user-facing side of Facebook is in PHP. But the front end machines don't talk directly to the databases. They use an RPC system to talk to other machines that do the "business logic" parts of the system. Building a Facebook reply page may involve a hundred machines. There's heavy caching all over the system, of course, so the databases aren't hit for most read requests.
The RPC system isn't HTML, JSON, or SOAP. It's a binary system that doesn't require text parsing. Otherwise, RPC would be the bottleneck.
This makes for a flexible, easy to enhance system. New services go in new machines, which talk to existing machines.
Becoming very, very rich but being locked out from becoming extraordinarily rich is a fate worse than death? Many of us would be very happy to have problems like this.
Consciousness is an illusion caused by an excess of self consciousness.
Over and over we hear about this "scrap and start over" concept. It sounds like a great idea but you are assuming you can do a better job than the guys before and more often than not you will be wrong.
I used to suggest it but now I know better. I have seen new devs with little experience passionately suggest so called "total refactoring". It has never ended well.
HTML is obsolete. It's time for a new, simpler and richer markup language.
The underlying problem according to Stonebrook:
During an interview this week, Stonebraker explained to me that Facebook has split its MySQL database into 4,000 shards in order to handle the site’s massive data volume, and is running 9,000 instances of memcached in order to keep up with the number of transactions the database must serve.
Or you could put MySQL on an IBM Power Systems LPAR and use a commercial MySQL plug-in to store the data in a DB2 database. Then you can get away with maybe a dozen database machines instead of thousands. I have to imagine, btw, that Oracle has a similar offering in the works.
Lesson: academic credentials are no match for real world experience.
Finding God in a Dog
PHP is being developed by Zend. Zend also develops one of the most widely used software frameworks for PHP (Zend Framework), a server environment (Zend Server), a PHP development environment (Zend Studio) and god knows what else... You're in for a world of pain when you have a problem and try to google for help.
Not at all. But it does have something to do with the fact that he is plugging his new product, which implements something he calls "NewSQL."
dZ.
Carol vs. Ghost
They can't switch to Mongo DB because ?
It may not be a hardware problem, it may be a problem that actually has more to do with the fact that Oracle owns MySQL.
It's not unreasonable to suppose Oracle might "nudge" Facebook into the deeper end of Oracle's trough of slimy swill. But who to root for? This is a bit of a conundrum. Seeing Facebook's delicate bits getting squeezed is not an unattractive proposition, but seeing Oracle benefit therefrom would be appalling.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Quick re-write in GWT and host it on Google App Engine. Plenty of time left for the consumption of small woodland creatures.
his CODASYL network database engine in the backroom.
For high-throughput those mainframe databases still rule the world of business on IBM, FUJITSU, UNISYS and other platforms.
They can handle the throughput and maintain ACID compliance but they certainly are not relational. Perhaps Facebook should take a look. Indeed sometimes "a foolish consistency is the hobgoblin of little minds."
First.. yes, they have a mega-ton of data but really... theres not *that* much to their nifty little web app..re-writing it from scratch with 100% hind sight would probably be not omfg difficult and second.. its a pretty good time to start a makeover since the wife just got in google+ and they will be loosing people by droves pretty soon..
If SQL is so old how come he still uses it in his "new" VoltDB that he is trying to sell?
(reposting as a logged in user) I wrote a bit longer response to this:
stonebraker trapped in stonebraker 'fate worse than death'
I think I know a bit more about database situation inside FB than Mr.Stonebraker. Go figure.
"The AdWords system was initially implemented on top of the MySQL database engine. After the system had been launched, management decided to use Oracle instead. The system became much slower, so eventually it was returned to MySQL [3]. The interface has also been revamped to offer better work flow with additional new features, such as Spreadsheet Editing, Search Query Reports, and better conversion metrics."
http://en.wikipedia.org/wiki/Adwords#Technology
views, triggers, and stored procedures are usefull in conserving memory and optimizing performance with large tables if used properly
They are also what lock you into one particular RDBMS. Hence the whole "being tied to" alleged fiasco. I'm not saying they are a bad thing but they can prevent you from abstracting the code away from the database layer.
Phillip.
Property for sale in Nice, France
A nobody claims that a billion dollar company doesn't know what it is doing.
I am reminded of the last tech bubble and all the companies that happily put all their investor money in Sun servers and Oracle databases and never actually delivered a working product.
Facebook used PHP and MySQL and delivered the largest social networking site with millions of daily users and you get deadbeats saying they did it wrong. The guy above even wants to use an existing CMS or framework for something like Facebook. Talk about not getting it.
There are two kinds of people on this planet. Those that get things done and those that spend all their time talking about how they would they do it, someday.
It might well be that Facebook will need to replace a tech someday but they reached this size already with tech that a lot of armchair developers claimed couldn't do it but did. Doesn't that tell you something?
The biggest reason PHP and MySQL and Linux are so successful over other solutions is a simple one. They are used by people that want to get things done. Ask on a python forum about json and you get a discussiuon on json vs xml. On a BSD forum ask about FTP and you get a lecture on plain text passwords. MySQL just gives you the newly created key and doesn't want you to first learn to write a procedure for it. It is about getting stuff done vs talking about it.
There is a place for talking about all the latest gadgets but it is not where investors money is being burned trying to deliver a product. Just take a good hard look at all the new crap like python and ask yourself this, where is the forum and webshop software for it? Non-existent. Why? Because everytime a future chasing developer comes close to delivering something he is on to the next thing. Duke Nukem Forever development style.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
It's faster at small scale if you use a non ACID complaint data store (MyISAM). Use their acid compliant one (Innodb), as all real RDBMSs do, it's no faster than Postgres.
So yeah, if you don't want real data integrity, then sure, it can return the wrong data real fast. Awesome.
Facebook has two options: add very large database support for MySQL (or MariaDB if thats your fancy), or switch to some other database. If they have been using MySQL all this time, then for most people MySQL might be big enough (considering the size of Facebook). Personally, I would prefer to see the former option, since if its open source, then you still keep the keys to your kingdom (you aren't beholden to someone else for your software, you can deal with bugs and problems on your schedule instead of someone elses, you aren't dictated as to what you can do with the software, you aren't forced to upgrade to more expensive versions even if the current software meets your needs (no forced end-of-life), and of course, no expensive licenses, royalties, and you can modify it on your own if you wish to better suit your needs. I'm building a set of web sites with MySQL (actually MariaDB), and if I have problems when I get to be the size of Facebook, I will worry then about scalability issues.
Very true: mod parent +Insightful.
We see the same principle when some individual acquires Sudden Wealth, as for example by winning the lottery. Sudden Wealth -- it's every man's dream, right?
On closer inspection, Sudden Wealth is not a miracle cure for unhappiness or any other problem. Quite the contrary: Sudden Wealth brings new problems, new diseases of the soul.
Example: there is, I'm told, a self-help group (somewhere in America) whose members are Sudden Wealth lottery winners, who meet to share and discuss the problems brought on by Sudden Wealth, ranging from vague and inexplicable dissatisfaction, through family crises and grasping relatives and bitter divorces, all the way to abject misery and blatant death wish.
So too with corporations and other collective enterprises. Growth without preparedness can elevate a Mom 'n' Pop storefront operation to the skyscraper heights of corporate power ... but I would keep a watchful eye for embittered alcoholics and starry-eyed madmen among the board members and executives.
-kgj
Maybe this guy's problem is that Facebook HAS created such a large and successful business without paying Oracle millions of dollars or his company millions of dollars.
Kinda of sounds like that commercial for Scott trade where the Fat Cat broker is trying to keep his clients so he gets his fat commissions.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
MySQL Rulez !!!
If you look at people who use PostgreSQL, you will see that they are not successful.
Myspace uses Postgre. Who are they now ?
Not mentioned is that Facebook's implementation of MySQL uses a custom engine and can perform about 500,000 transactions a second with only 1.2Gb/sec worth of bandwidth. A far cry from 'regular' MySQL and performs better than many enterprise products like Postgre.
That's one of the primary beauties of MySQL, the engine can be changed to suit your needs.
Doesn't Oracle have a tool that will analyze a MySQL schema, (optimize), generate an Oracle schema, query MySQL and populate Oracle? Even for pretty large live databases (~20GB in RAM) that need some cycles to update the new DB to sync with the old DB then cutover, this seems like a highly automatable and deterministic procedure, at least to get started. Then DBAs to optimize the new schema more, using the superior Oracle features.
This seems like exactly why Oracle bought MySQL AB: to be the default upsell path for MySQL instances that have outgrown MySQL. Isn't this something that comes cheap or free with an Oracle database?
If not, this seems like a success-defining opportunity for EnterpriseDB to do the opposite: use Oracle's promotion of MySQL but hard path to Oracle to save MySQL users from their Oracle destiny with free/open migration tools.
--
make install -not war
So this guy Stonebraker thinks Facebook is running too many servers or something, or its just too complex, whatever. Seems to be working for me as a user so not sure what the problem is.
Is there something that can scale to support something as big as Facebook and be done using fewer servers with less complexity? What site has actually proven that? Was it because of the underlying db engine or just a better design from the get go?
This quote: "“old SQL (as he calls it) is good for nothing” and needs to be “sent to the home for retired software.” Is just flame bait (of course I'm taking it though).
His only real point is this: "the problem with MySQL and other SQL databases is that they consume too many resources for overhead tasks (e.g., maintaining ACID compliance and handling multithreading) and relatively few on actually finding and serving data. ". He has some engine he thinks does it better (VoltDB), which does a lot of stuff in memory, similar to memcached, but built into the engine. Good idea. Oracle DB has in memory tables as well.
Has anyone put this VoltDB to the test? Is it better?
Stonebraker abandoned first Ingress as hopelessly broken to build Postgres. Which he then abandoned as hopeless broken to move on to various projects like the more recent VoltDB. He left Postgres a VERY long time ago, and hasn't been involved in the project since well before pg 7.0 hit the scene.
--- It is not the things we do which we regret the most, but the things which we don't do.
If they're using memcache being fed from a database, then it doesn't matter what database they use.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Reply from an engineer who worked at Facebook 2007-2011.
It sounds as though the author blames the problem on too small a transaction scope, in other words they are carrying out a commit on every single Like response from a site visitor. They could queue such non-essential transactions in the middleware and commit en bloc using a queue that commits chunks at a time, or commit them to an unindexed shard that then propagates to the other shards. He says too many resources are being used in one part of the process (data integrity) and too little in another (queries). Index much? Rebalance hardware towards the instances supporting queries? These things do seem soluble by some effort in system design and engineering. I get that the rollout is tricky to test under load and all that, but if your database grows you need to expect to engineer it, there is no such thing as scalability, different designs are optimal at different scales. All this is independent of DBMS. Is a message of the article just that Facebook had no idea they were going to grow so big so didn't design their middleware and backend with that in mind? Of course that happened. As for the book, another DBMS? We have some thanks.
Korma: Good
At the butt-rape cost of switching to Oracle, they could hire an extra small army of programmers to turn MySQL into FBSQL and give the haters the middle finger (Hell, they're already there from what I understand).
Why the hell isn't this modded up?
anyway!
So then, why not use the cheaper / open DB, and build out the thing with that room full of people?
Fair question, particularly given the extreme size in this case.
Blogging because I can...
If we say this loud enough, could it maybe (please?) Zuck up the Facebook IPO. I'd love nothing more than to know that the suits who've spent so much loot on getting control of that particular group of servers were out on the curb looking for food.
If they do indeed need to upgrade, it is SQL, going to a higher performance SQL database probably is not that big of deal.
The irony of this little battle going on here that all the MySQL haters are posting to /. which uses MySQL as a database as does Twitter and Wikipedia. Priceless!
Surely they must have, at some point, refactored their code to use IOC so they can just replace all their MySQL data access classes with another data store, no?
Or, as they now call it: OpenEdge ABL, now there is a really nice RDBMS. An absolute joy to work with.
You know, for all the hatred of MySql's scalability, it's not really MySql that's at fault. It's the storage engine. MySql is actually a pretty lightweight administrative framework for routing queries to storage engines. In what it does, it does it pretty well, except the permissions part (which sucks). The real meat of MySql is Innobase on the back end. But there are many backends to MySql like MyISAM, BDB (cluster), Percona, etc. And lately there's some activity with Hadoop as a backend, which would have probably fit their needs perfectly.
But the problem is that they decided to roll their own syncronous transaction mechanism on top of innobase by patching the shit out of it. The idea is that they have datacenters on both sides of the country and if you make an update to your "wall" on the west coast servers, it needs to post to the east coast servers as well before you can serve up subsequent wall pages from the east coast. The stuff that links it to all your friends is probably NOT a relational DB, but the publishing system is. So anyway, they patched and hacked mysql to kinda do this and it's just not stable. People are getting tired of the slowdowns and delays, especially when other sites are doing it better.
So, they will most likely have to rewrite that part if they move to a new backend that does the distributed synchronous stuff. The beauty of moving to a different backend is that it is totally transparent to the client. But Facebook is screwed because they didn't try to use the "standard" MySql interface, they rolled their own. I'm sure they have some other extraordinary problems as well, with the type of site they are. Sell your Facebook stock! ;)
Oracle, DB2, postgres and MSSQL have more issues than MySQL for huge databases. I've worked huge sites on all of them and MySQL is best! Instead of sharding they need to optimize queries better, decrease data maintained and archive data a little better if they are having issues. Sharding is a temporary measure.
Facebook was always a little unstable, they need to hire some better db engineers if this is their issue. Moving to a new DB is TRIVIAL. I'm ported huge sites from all sorts of dbs from one to another.
Geez, GooberToo, did a MySQL developer kill your father or something? You've posted two giant rants about how MySQL is so unsuitable for anything that it can't possibly work for any serious project. You make it sound like simply installing MySQL causes a server to immediately explode.
You *are* aware that Facebook, Slashdot, Wikipedia, and many other sites use MySQL, yes? Maybe there are better choices (more likely, there are different tradeoffs, but whatever), but MySQL works well enough to power some of the most popular websites in the world. Proof by existence that what you claim is inaccurate.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
When a system grows beyond its practical design limits lets just blaim our choice of tools rather than the architect and their data model.
As low as my opinion of MySQL is I consider the entire argument to be a false choice.
He however still is in a very good relations with the community.
That's just his response to NoSQL.
Mod this up!
There's a comment about VoltDB (is the whole thing a commercial thing for VoltDB? XD)
Anyway, I looked up VoltDB and they provide additional software for a fee, software that is also distributed under the GPL v3.
Wondering why they don't just offer it for free and only ask payment for the service. It's only a matter of time before the software is distributed freely otherwise.
http://voltdb.com/products-services/voltdb-editions-pricing
Learn to identify marketing speak. Stonebreaker has been at it for at least 4 years now.
File under 'M' for 'Manic ranting'
Just asking the question. It seems to be working fine..
I think the real lesson here is you can't assume MySQL is useless and unsuitable for a major website just because Slashdotters hate it and declare it unsuitable for a major website. Even in the midst of news that its being used successfully on the biggest website in the world, Slashdot still says it can't work.
If I were about to die and the only way I could save myself was locking in my employer to a huge, mission critical MySQL installation, I'd prefer the MySQL fate to death.
Call me weird.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Don't worry Facebook. You don't have to rewrite anything from scratch. Google already has, so don't bother. Mark? Sell while you can dude...sell while you can.
Right, it's all too common on startups that reach 600 millon users... Once we reach Facebook's size we might start worrying about MySQL performance.
Practically any non mysql database has all of the listed things described. What makes it proprietary is not the SQL but the stupid proprietary api's and protocols for each RDBMS that co-exist with each SQL statement. It is not simple SQL at all and SQL Server vs Oracle data adapter objects and the the very api's itself are then called upon and finally in there is the SQL statement if you use a language like C# or Java. Yuck.
A few other ways to use it is nothing. I personally think all non free RDBMS hate SQL and secretly want there own API, protocol, and even their own language sdks. Witness Oracle's Java tools (Not the ones from Sun) as an example of tying you into their expensive products.
MySQL has been faster that PostgreSQL for years, it doesn't have as many features, but it is **fast** !!
Fast - and ugly.
MySQL is only faster if you don't need all the ACID functionality in your database.
Everything else - PostgreSQL is way faster and performs better than MySQL.
I mean he's just the guy behind Ingres, Postgres, and Vertica. And his new company VoltDB, which I haven't used, can either put up or shut up.
But I doubt very seriously he's a troll. He's a legend with a strong opinion and he's willing to put his time and energy behind his thoughts instead of just spouting off about them.
mysql will let you choose a storage mechanism that doesn't pay the overhead of ACID. Sounds like the same old tired **** I've heard over and over. "Just rewrite xxx using xxx and it will solve all the problems." Sorry dude, it's a difficult problem and there aren't any easy answers.
- I've got bad karma because I won't parrot everyone else's opinion
MySQL has been faster that PostgreSQL for years, it doesn't have as many features, but it is **fast** !!
/dev/null is even faster, but I wouldn't use that for data storage, either.
Red to red, black to black. Switch it on, but stand well back.
Astonishingly, a few minutes with saint google gives a clue:
http://www.freepatentsonline.com/4468728.html
I suspect that the Bell System invented everything themselves, and did a remarkably good job.
Ever hear of Bell System outages? When I grew up the phone always worked. Always, always.
... that the customer for a SQL (or other) db is not the end user. Except for freakish entities like Facebook that write their own application layer that uses the db, most users have a commercial application which is really what the db is serving. For instance, an application tier software vendor will support MS-SQL or Oracle or MySQL or DB2 (or some combination of these) depending on end user preference. But if an end user decided to be clever and wants a next gen database architecture, they can't decide that for themselves, they need to ask their app vendor to please use and support the db we want. The app tier vendor doesn't particularly want to incur the costs of re-architecting the app to support a different db ESPECIALLY (horrors!) if you want a non-SQL db which will REALLY make the re-engineering painful and create two codes steams - one for you funky odd customers and one for the sensible ones that want to stay on an SQL db.
So, you need a value proposition for app vendors to make this huge change occur. Don't try saying that the overall solution will perform better - from the app vendor's point-of-view, customers can fix that by throwing lots of hardware at it and (golly!) that also means more app tier system (cpu, core, socket or whatever) licenses for the app vendor. So existing app vendors are not likely to do this.
This is only going to happen when the performance of app solutions (probably wholly new ones) that will use non-SQL dbs is so massively superior that insurgent app vendors threaten to displace the incumbents. Don't hold your breath for this though. There are lots of fine application solutions out there with many many person-years of code which are solving real problems and for a large enterprise application, simply having a much faster db backend won't overcome lack of features so the insurgent app vendors will have to be superior there too. Tough job.
So. I'd bet on inertia (never a risky proposition) and assume that for large databases, non SQL dbs (or even new SQL entrants in the dbs market) are not going to make much headway over the next 5 years of so.
Any sophisticated software will have these issues as it matures.
"...predicament is all too common among web startups that start small and grow to epic proportions."
A "common" predicament? Just how many web-whatevers of epic proportions are there? Or is this based on a sample size of 1 (facebook)?
This article may just as well be interpreted as "there's no reason not to use SQL (including MySQL) in your next web-startup, the chances of growing to a point where SQL becomes a problem are next to none."
" all too common among web startups that start small and grow to epic proportions"
How often has this happened?
You were mistaken. Which is odd, since memory shouldn't be a problem for you
when we heard at work that oracle "might" be buying MySQL ... we switched to postgres and never looked back. I wish most hosting companies and software developers would add support for postgres
Why do I still come here...
How is this a failure? Oh My God the mighty facebook isn't running on some uber expensive database like Oracle. This reads like an Ad for VoltDB.
In fairness, /dev/null makes a strong guarantee of what it'll do with your data.
I find it bemusing the "should've done X" arguments being made here and in the referenced article.
Facebook is bigger than anything you've ever done, or likely ever will do, and despite Stonebraker's impressive cred, bigger than anything he's ever done, or will do. That it works as well as it does is enormously impressive. Articles like "oh, just throw Oracle at it, or DB2 could handle it with a couple dozen machines" are completely ludicrous. Is it that easy? Then build it. Throw down or shut up.
strange, on the fosdem a facebook developer told us, that they nearly migrated everything from mysql to h base
And most financial institutions of the nature you are describing don't do everything in real time, thats why there are close off times and most of the fees, etc are added after in batches. This applies to almost all transactions save cash which requires little to no verification.
One of the few institutions I know that deal with such large transaction volumes are stock exchanges and just like banks they close at some point and the real heavy processing happens at night. Same applies to central banks / regional centers for almost any financial institution, I am not aware of any that processes everything in a central facility.
Once nice thing most financial institutions have going for them is that the data is already localized, it is rare that someone in another country wants to see your bank account. What most banks do is have a simple hierarchy to keep the overall load down (the transaction is passed to the appropriate branch for processing).
mySQL is open source. Facebook has so much money and time and design invested in its current db implementation. It seems to me the cheapest, and most straight forward thing to do, is just fix their issues in the mySQL base. How much could that cost? I'll bet less than 1% the cost of a db system change.
I looked at the SQL Injection techniques earlier this week -- just to see if (1) I had a reasonable understanding of tech and (2) was there really big money to be made here ? I'll leave it to you, my SQL heavyweight friends to decide for yourselves. However, I think it is certainly an area of technology worth looking at. Stick with Linux and a really good intrusive language, like Python. Most operations, even financial ones, are running at minimal security. Follow Goldman Sachs lead, rob the country blind now!
DocLazier
(reposting as a logged in user) I wrote a bit longer response to this: stonebraker trapped in stonebraker 'fate worse than death'
I think I know a bit more about database situation inside FB than Mr.Stonebraker. Go figure.
Are you a FB DBA. Mr.Stonebraker is a DBA in FB.
TrustDating is support by skapal.com
It's that Zuckerberg can fix this by the weekend...
If anyone knows, you can have a max size usage of a database, but that does not mean you are limited to it, you can have a database of databases technically....i can use a database to help me manage all my databases in order to avoid overloading any one single database from having too many records....this is done in many companies that have excessive accounting, and drop the older years off in their own separate databases, and use a master database to know where to look...for path info etc...
the only rewrite that facebook would need is to add the extra length and code to alter any reports, queries to include a db alias, in order to pinpoint where the info resides...but not the whole thing, and certainly not everywhere...i could even inject text into connection strings on the fly, in order to avoid rewritting any code at all, if you actually had low level programers knowledged enough about such things...sort of like the malware packets that get injected into dlls etc....
News at 11.
I'm not a fan of mysql but since its an open-source system, I don't understand why Facebook wouldn't put time and money in to transforming in to an ideal rdbms for their specific purposes.
Having worked with Oracle and ms for years in the private and public sector, I know that the primary reason IT management go with commercial software is so that its easier to point a finger at someone outside the company when something goes wrong. The software itself is seldom going to suit your needs and you'll end up customize it anyway (for a major player like facebook).
I think Google has been quite successful because they've taken this approach. It's often seen as risky by management but if you have the right people that can get the job done right-its well worth the investment.