Well, look at it properly, the bug is about optimization of a query that does not make much sense. Sure, it could be done better, but why would you issue such query at all.
If you look at problems that Oracle/MySQL engineering tackled, they are somewhat different - data compression, online DDL, parallel replication, GTIDs, InnoDB scalability, etc - these were huge efforts and get reasonable focus. Think of all the bugs that were not filed against MariaDB...:)
Count InnoDB engineers working for Oracle and for Maria, unfortunately that will not be balanced. Even Percona's InnoDB expert Yasufumi Kinoshita ended up working for Oracle lately.
Sure, Maria can do all sorts of tricks in SQL-land, but it is not the full picture. Oracle has much more engineering power dedicated to supporting MySQL, and they also have customers who are doing bug escalations as well.
Disclaimer: I used to work at MySQL AB and currently am working on a deployment that builds upon Oracle's MySQL tree, see https://www.facebook.com/MySQLatFacebook
if there'd be really a push for it, all you would need is server-side strict mode enforcement. any software engineering intern can add that feature:)
but I guess not too many users really push their vendor for it, do they?
I'm not sure if this was supposed to be funny. It is much slower to enter same character many times (not only brain sucks at counting fast mechanical moves, but also you don't get finger parallelism;-)
One needs to stage man-in-the-middle attack to hijack existing session, whereas broken handshake can be used to establish new connections. Not looking at crypto-analysis, keeping connections open is much more secure;-)
Hi! we run a non-profit website that gets 100 million visitors a day on ~350 servers.
we don't even use any "clustering" technology, just replication for databases, and software (LVS) load balancer in front of both app (PHP) and squids at the edge.
but oh well, you can always waste money on expensive hardware and clustering technology.
and you can always check how we build things
add power costs, difficulty to travel to, possible flooding, etc.
it is all historic reasons, we can't just migrate datacenters at wish - that requires quite a high investment.
and the datacenter choice was simply because the founder lived there in 2001, when all we needed was single server.
--Domas
I covered most of Wikipedia technology bits at my previous year MySQL Conference presentation:
http://dammit.lt/uc/workbook2007.pdf (thats quite detailed report)
ok, office is different, technology people are different, technology is different, datacenters are different, management is different, communities are different, etc.
how the heck anyone can think there're strong ties between Wikia and Wikipedia? They are as strong as between Wikipedia and Mahalo, or Wikipedia and whateverothercollaborativeweb2.0 site. If company name is made by stripping few letters, it doesn't make it immediately closely related. Micro is not Microsoft, Goo is not Google, Wikia is not Wikipedia.
Of course, Jimmy is on board of WMF, but so are other people, from other companies and organizations.
Wikia is not for-profit arm of Wikimedia. It is completely separate organization, just founders are same (talk about serial entrepreneurs:).
Wikia does not pay any core mediawiki developer, and, uhm, Wikia does not pay Jimbo or Angela _for_ Wikipedia/Wikimedia work.
I might be redundant, but Wikia engineering team is different from Wikipedia engineering team.
Wikia organization is different from Wikimedia Foundation organization.
Jimmy is working nowadays on Wikia, not Wikipedia, so of course, it is easy to misinterpret.
WP is not doing with that anything (though of course, having it opensourced is interesting).
One of things missed by PG camp is that MySQL/InnoDB has row visibility information in indexes. That makes index-only reads possible (what beats the crap out of any bitmap based reads).
Also, index-based reads don't need sorting afterwards, as well as index-based GROUP BYs are done. Anyone with that in mind can make really efficient apps.
Do note, that with enough of scatter of data bitmap-based physical-order reads start making no sense..
A ton of money? Any contracts of Wikipedia logo or value-added-services (as in, content pushing) services pay to non-profit foundation, that has underwent audit scrutinity, and has community-trusted people on board.
Wikipedia is much bigger operation than answers.com, serving many more users ( http://www.alexa.com/data/details/traffic_details? site0=www.answers.com&site1=wikipedia.org&site2=&s ite3=&site4=&y=p&z=1&h=300&w=500&range=6m&size=Med ium&url=www.answers.com ), so a _share_ of answers.com revenues would not necessarily cover costs. Of course, Wikipedia is _very_ efficient, as for a site of such size.
Please, don't spread lies, though they are easily verifiable to be as such, still, some people may accidently take them for granted.
Database download is all available, http://download.wikimedia.org/
Image dumps are already out there, we're going to streamline images dump delivery some day. Its a terabyte of media...
Now, regarding the implosion, I can put my absolute trust in people who have access.
Oh the other hand, participating in Wikipedia's "Terminal Pissing Contest" (love it), would ruin the life for anyone;-)
The bigger disaster would be having some of tech guys leave the project, than some drop of some data;)
you probably miss the fact, that the servers handle many thousands of requests per second, and are functioning in somewhat async setup. though we'd be able to set up sync environment at every stack level, there's no need.
mysql fully supports full data consistency, but you do not need in such distributed database environment, like we have one at wikipedia.
Domas @ Wikipedia
Internet is all based on peering. You want to exchange your traffic with neighbors. If nobody wants you, you buy transit. Now you may try to look attractive and provide lots of content. This way Google became nearly-Tier1 provider.
On the other hand, you may look attractive and provide lots of users. When it becomes more complicated, is when single part starts providing lots of cheap bulk traffic (video) just in order to tell 'hey, look, I've got lots of content in terms of bytes per second' and they demand connectivity or even ask others to pay for connectivity.
The other case is when there're ISPs that start selling cheap bandwidth to their clients and later demand peerings because they've got "majority of clients, that need services".
It is always the game of power and who blinks first. On the world scale it happens as well, but on regional/national scales it is every day. Every day new bulk content appears, every day people go into endless negotiations or simply deny each other. Oops, internet.
distributing accross the net is absolutely different than distributing inside the net. where we can control replication, we can achieve much higher efficiency, though, on our own gear.
Well, look at it properly, the bug is about optimization of a query that does not make much sense. Sure, it could be done better, but why would you issue such query at all.
If you look at problems that Oracle/MySQL engineering tackled, they are somewhat different - data compression, online DDL, parallel replication, GTIDs, InnoDB scalability, etc - these were huge efforts and get reasonable focus. Think of all the bugs that were not filed against MariaDB... :)
Count InnoDB engineers working for Oracle and for Maria, unfortunately that will not be balanced. Even Percona's InnoDB expert Yasufumi Kinoshita ended up working for Oracle lately.
Sure, Maria can do all sorts of tricks in SQL-land, but it is not the full picture. Oracle has much more engineering power dedicated to supporting MySQL, and they also have customers who are doing bug escalations as well.
Disclaimer: I used to work at MySQL AB and currently am working on a deployment that builds upon Oracle's MySQL tree, see https://www.facebook.com/MySQLatFacebook
if there'd be really a push for it, all you would need is server-side strict mode enforcement. any software engineering intern can add that feature :)
but I guess not too many users really push their vendor for it, do they?
I had it opposite, once I watched Matt Dillon movie, and "oh wow, now he is acting!" :-)
Don't trust Facebook's "engineers".
I don't think MySpace used PG - they were MS SQL shop (though I hear there were attempts to switch to other engines)
(reposting as a logged in user) I wrote a bit longer response to this:
stonebraker trapped in stonebraker 'fate worse than death'
I think I know a bit more about database situation inside FB than Mr.Stonebraker. Go figure.
I'm not sure if this was supposed to be funny. It is much slower to enter same character many times (not only brain sucks at counting fast mechanical moves, but also you don't get finger parallelism ;-)
One needs to stage man-in-the-middle attack to hijack existing session, whereas broken handshake can be used to establish new connections. Not looking at crypto-analysis, keeping connections open is much more secure ;-)
runs on ~300 servers. you don't need millions of servers to reach everyone and/or be useful.
Hi! we run a non-profit website that gets 100 million visitors a day on ~350 servers. we don't even use any "clustering" technology, just replication for databases, and software (LVS) load balancer in front of both app (PHP) and squids at the edge. but oh well, you can always waste money on expensive hardware and clustering technology. and you can always check how we build things
because Jimmy moved there back then?
add power costs, difficulty to travel to, possible flooding, etc. it is all historic reasons, we can't just migrate datacenters at wish - that requires quite a high investment. and the datacenter choice was simply because the founder lived there in 2001, when all we needed was single server. --Domas
I covered most of Wikipedia technology bits at my previous year MySQL Conference presentation: http://dammit.lt/uc/workbook2007.pdf (thats quite detailed report)
that happens to us once every few years maybe ;-) the fact is that servers don't go down too often.
--Domas
ok, office is different, technology people are different, technology is different, datacenters are different, management is different, communities are different, etc. how the heck anyone can think there're strong ties between Wikia and Wikipedia? They are as strong as between Wikipedia and Mahalo, or Wikipedia and whateverothercollaborativeweb2.0 site. If company name is made by stripping few letters, it doesn't make it immediately closely related. Micro is not Microsoft, Goo is not Google, Wikia is not Wikipedia. Of course, Jimmy is on board of WMF, but so are other people, from other companies and organizations.
Wikia is not for-profit arm of Wikimedia. It is completely separate organization, just founders are same (talk about serial entrepreneurs :).
Wikia does not pay any core mediawiki developer, and, uhm, Wikia does not pay Jimbo or Angela _for_ Wikipedia/Wikimedia work.
I might be redundant, but Wikia engineering team is different from Wikipedia engineering team. Wikia organization is different from Wikimedia Foundation organization. Jimmy is working nowadays on Wikia, not Wikipedia, so of course, it is easy to misinterpret. WP is not doing with that anything (though of course, having it opensourced is interesting).
One of things missed by PG camp is that MySQL/InnoDB has row visibility information in indexes. That makes index-only reads possible (what beats the crap out of any bitmap based reads). Also, index-based reads don't need sorting afterwards, as well as index-based GROUP BYs are done. Anyone with that in mind can make really efficient apps. Do note, that with enough of scatter of data bitmap-based physical-order reads start making no sense..
A ton of money? Any contracts of Wikipedia logo or value-added-services (as in, content pushing) services pay to non-profit foundation, that has underwent audit scrutinity, and has community-trusted people on board. Wikipedia is much bigger operation than answers.com, serving many more users ( http://www.alexa.com/data/details/traffic_details? site0=www.answers.com&site1=wikipedia.org&site2=&s ite3=&site4=&y=p&z=1&h=300&w=500&range=6m&size=Med ium&url=www.answers.com ), so a _share_ of answers.com revenues would not necessarily cover costs. Of course, Wikipedia is _very_ efficient, as for a site of such size.
Please, don't spread lies, though they are easily verifiable to be as such, still, some people may accidently take them for granted.
Database download is all available, http://download.wikimedia.org/ Image dumps are already out there, we're going to streamline images dump delivery some day. Its a terabyte of media... Now, regarding the implosion, I can put my absolute trust in people who have access. Oh the other hand, participating in Wikipedia's "Terminal Pissing Contest" (love it), would ruin the life for anyone ;-)
The bigger disaster would be having some of tech guys leave the project, than some drop of some data ;)
you probably miss the fact, that the servers handle many thousands of requests per second, and are functioning in somewhat async setup. though we'd be able to set up sync environment at every stack level, there's no need. mysql fully supports full data consistency, but you do not need in such distributed database environment, like we have one at wikipedia. Domas @ Wikipedia
FTP also has protocol which tells about endpoints, that may differ....
Most time is spent in select()/poll() anyway. And there's sendfile() for web/ftp servers, hey, that saves syscalls!
Want nodelay? use UDP! :-)
Hehehe, go spend your time on serious issues, folks ;-)
On the other hand, you may look attractive and provide lots of users. When it becomes more complicated, is when single part starts providing lots of cheap bulk traffic (video) just in order to tell 'hey, look, I've got lots of content in terms of bytes per second' and they demand connectivity or even ask others to pay for connectivity.
The other case is when there're ISPs that start selling cheap bandwidth to their clients and later demand peerings because they've got "majority of clients, that need services".
It is always the game of power and who blinks first. On the world scale it happens as well, but on regional/national scales it is every day. Every day new bulk content appears, every day people go into endless negotiations or simply deny each other. Oops, internet.
distributing accross the net is absolutely different than distributing inside the net. where we can control replication, we can achieve much higher efficiency, though, on our own gear.