Sun's Mickos Is OK With Monty's MySQL 5.1 Rant
narramissic writes "Back on November 29, MySQL developer Michael Widenius trashed Sun's decision to give MySQL 5.1 a 'generally available' designation in a now-infamous blog post. Widenius warned users to be 'very cautious about MySQL 5.1' because 'there are still many known and unknown fatal bugs in the new features that are still not addressed.' And now we get Sun's response. In an interview Monday, Marten Mickos, senior VP of Sun's database group, said, 'I learned over many years about the benefits and the painfulness of absolute transparency in open source. A little bit of debate never hurts. This is part of being an open-source company. ... People are free to blog about what they want.' Doubtless, this will do nothing to end the debate over whether Widenius will follow fellow MySQL co-founder David Axmark's lead and leave Sun."
...then how do we know they are fatal?
While I think the AC may be overstating this a bit, I do think the term 'infamous' is being a bit overused here. Ask any random person on the street about this issue and you're probably going to get a response along the lines of "What's MySQL?"
This is my sig. There are many like it but this one is mine.
prepare to hear about known unknowns
A while ago, while browsing Sun's downtown website, and had to take the piss. As the java plugin loaded, I
So tell us, what exactly IS yourSQL?
If code's too transparent you see right through it to the bloody programmers and their coffee and Monster drinks. In fact, you might see into their dark hearts. You'll see little Guitar Heroes and Battlestar Gallacta action figures. Maybe even, gasp!, some Natalie!
Transparency? Shudder!
How to Download YouTube Videos
What's MySQL?
What is the debate? MySQL releases with known crashing bugs. Noone is disputing that.
Is the debate over whether or not it is okay to ship a database with known crashing bugs?
It really surprises me to hear someone from Sun saying that one can debate the merits of a crashing database. If this is the expected level of performance from MySQL, no wonder people shun it. At the very least they could have called it a beta or rc release, that would set the expectation level at something approaching reality.
We use MySQL in a number of critical aspects of our company. I'd rather have a company be honest and let me know I might have some issues with this new release than pretend there are no issues. That lets me stay with my current version and upgrade later.
Rather be on the stable blunt edge in critical infrastructure, not the bleeding edge.
iSQL
50% more pretentious, runs all the coolest sites in the world.
I like postgreSQL, though mySQL is certainly very nice.
It's all fun and games till someone divides by 0. Then it's hilarious.
and this is exactly what can happen when you sellout (your soul) to <large failing company> *
* read: Sun, Microsoft, Oracle, EA, etc.
MySQL has never been a stable database program. I've never had any other database system that just blows a database table at random. Nothing is more exciting then having a website go down because one of the tables got marked "corrupt" and you have to go "REPAIR TABLE". The damn thing might not even have a load on it and it will blow up!
First of all, what is MySQL doing that corrupts tables during normal operation and second of all? Seriously, a database shouldn't crash like that, ever.
Second of all, it might as well try to auto-repair the damn table. I mean, I've never had it loose data, only somehow decide the table was "corrupt" and then taken offline. And who cares if you do it automatically and it looses data, this is MySQL we are talking about here! They make no claim about data integrity and the user base doesn't even know what that means (must be a car or some "enterprise" feature used only by NASA and Fortune 50 companies)! I mean, 0000-00-00 is a valid date according to them!
But alas, this is MySQL we are talking about here. I mean, it isn't like you are putting any sensitive data on it right? I mean, surely only a fool would use it for anything besides storing data like "number of shoes in my closet" or "number of purses owned by the wife", right?
Good 'ol MySQL. I mean, what fun is a database server that is consistent or predictable?
If there really was an iSQL it would cost money, decide which SQL statements you wanted to run and how to run them, have no export to other databases but come in a really really flashy box.
Uh... I didn't know yourSQL was actually a thing, I was trying to make a joke.
So tell us, what exactly IS yourASL?
Fixed that for ya.
I'm not really touching this potato, maybe you're running into some quirks / unfortunate query. Just some quick questions:
- Why don't you have a PK / any index in the address table?
- Did you try a different syntax (e.g. WHERE vs. JOIN ON)?
- Did you try setting different indexes? Tried forcing a specific index?
Why yes, there is an iSQL, though it's a command-line client tool for MS SQL. But it does meet all of your criteria.
Sun buying MySQL is no different than Microsoft buying FoxSoft. Fox Software created Foxpro, a dBase-like database system. Not a true database, but something you could put a lot of data into. Fox Software sold Foxpro to Microsoft. Microsoft already had (has) MSSQL. First release after Microsoft got their hands on it: it could only store 1/500 as much data as before, and suddenly it got slower. It isn't completely dead yet, but no development has been done in 14 years. So its exactly like Sun and MySQL -- Sun bought MySQL to kill it.
What's MySQL?
A child Process!
Puns: Never explain, never apologize.
If your database is crashed and is no longer capable of accepting data, how is that different from losing data? Go ahead and explain that with a straight face. Do they have another data store where you can keep your data until the database is fixed?
Sun should be ashamed of themselves for even calling this abomination a database in the first place. The word 'database' carries a whole host of expectations that the product simply does not live up to. A text file makes a better database than MySQL.
Only if you pronounce it MySEQUEL.
http://www.zombieapocalypse.tv/
It's impressive that a company ships junk and admits it? Since when is that impressive? Boy are you setting the bar low.
I mean this is mysql here, not a real relational database. Kind of like sloppy cowboy VB coders of yore, MySQL has the same kind of attitude. "If it works, who cares if it is right".
I mean, sure people site "Well, Slashdot, FaceBook, and BIGCO use it, so MySQL is okay". But have those people ever realized how easy it is to lock yourself into MySQL? MySQL is so full of non-standard behavior and gotchas that it can be very painful and difficult to migrate to a real database. So what to companies do? Layer on a huge pile of Memcached and crazy "archive databases" to scale when if they had started with a more standard, scalable database system maybe they could have allocated their developer time to something more productive.
Anyway, I rant. I just think MySQL is only used by large companies because either they don't understand how much extra developer hours are spent working around MySQLisms or they know MySQL sucks, but know that it is to costly to migrate to something better.
But that has nothing to do with your post or my original post does it? I'll conclude with the main problem--Like VB, MySQL grew a whole crop of developers who dont know any better. While I dont know if you can blame that on the database or a programming language, I chuckle when I see MySQLisms in code (like never using a "JOIN" because it mysql is "faster if you give it small SELECT statements).
But what happens if you want to do full text search? Besides, your nice ACID InnoDB kinda backfires when half the tables are using MyISAM, doesn't it? And good thing MySQL lets you know when your nice happy transaction will not roll back properly because half the tables are MyISAM, right?
As I said, what fun is a database server that is consistent or predictable?
"...And the bugs got fixed and then we moved on. We moved on."
What?
You're probably not going to like this answer but....
The data is not in an optimal form for MySQL. Consider storing the IP address as a BINARY CHAR field, and not as a number. Order the bytes so that it is in big-endian byte order. Now MySQL can use it's indexes.
The problem is that MySQL treats index keys as a binary string so if you are using a function to join two tables, MySQL does things the hard way.
No sig. Move along - nothing to see here.
Okay... so what's your query?
The problem with this scenario and why it will always bother people who are used to non-MySQL RDBMS, is that really, you haven't had to think about things like that in a decade (more if you were giving your first born to Oracle).
Equivalent where vs joins should give similar query plans. If not, since the SQL standard where JOINs are first class citizen state that its what you should use for linking tables (no matter how exotic the JOIN), it should handle that better, and having to force an index is usually a crutch (even Microsoft will often consider it a bug, and the logical scenarios get fixed between versions... in 2000 you had to force em every so often, in 2005 they solved most of them, in 2008 I haven't seen an occurance where the analyzer got it wrong...).
The lack of index in the address table is indeed fairly illogical here, but for such a simple query, most RDBMS will be able to do it fine anyway, -especially- with table statistics. In this case, my pragmatic self would never expect it to be fast, but in most RDBMS, it will still be zippy. The only ones I've personally tried that will choke (even with gigs of data) are MySQL and PervasiveSQL (Pervasive makes MySQL look like the holy grail, thats for sure). I've had douzens of databases with up to 50-100 gigs of data (though it was spread out over at least 75-100 tables, sometimes up to a thousand) with no indexes aside for the primary keys and the systems were fast, on MSSQL, Oracle and Postgres (not saying indexes wouldn't have helped a ton, but it wasn't my decision to take), so its a bit of a culture shock to many when you have to spell out your intent to the database that much.
please do.
Sure, innoDB gives you transactions, but at the cost of a lot of MySQL's vaunted speed; half of the reason it took so much of the early open source DB business, lo those many years ago (the other being ease of app development.)
I'm a database rube, but even I've left MySQL for PostgreSQL. Try PostgreSQL, just try it. This isn't your old 7.3 postgres anymore, no siree. ACID all the way, kicks InnoDB's butt and is probably faster than MyISAM most of the time for most of your stuff.
8.4 is nearing completion and it's going to be fabulous. Try it.
The problem is, MySQL hasn't had a stable, crash-free release in MANY years. The version you think is stable is only stable with your data set and queries.
Finally, somebody who can explain when the MySQL binary package installer is broken on a bone-stock Solaris 10 sparc machine.
And, yes, it's totally broken, and yes, I reported it... last year. It simply does not, and CANNOT work unless you chmod a directory in /var/tmp from another window halfway through running.
My God People.
Do daemons dream of electric sleep()?
.
Does anyone remember those Windows 2000 source code comments that leaked a few years back?
We should not punish Open Source for being Open Source. We are a community. OK, more like a family at Thanksgiving, bickering and such.
I was wondering why that joke didn't make any sense.
By what name do you wish to be mourned?
A text file makes a better database than MySQL.
A text file does not make a better database than MySQL. No queries. No transactions. No way.
Bzzzt. It exports to Oracle just fine by default, thanks for playing.
Don't worry, it still doesn't. I'm just trying to rationalise the EPIC FAIL of a joke to stop my brain from exploding :)
http://www.zombieapocalypse.tv/
the address table is ~2000 rows, ~10 bytes per row. Unless it has a clustered index (google tells me Innodb uses clustered indexes, MyIsam doesn't) the index should be ignored in favor of in memory sorting.Storing the data in separate xml files, loading, parsing, and writing code to do the joining is probably faster than mysql.
Having experience with this problem, I can tell you that the problem is that MySQL's implementation of b-tree indices doesn't work well for ranges (specifically, it can only eliminate rows on one side of the inequality). The solution is to use rtree indices (GIS functions, 'SPATIAL INDEX').
I didn't come up with the technique, but I can't find the webpage where I found it. I did end up using it for a geolocation system though.
I'd suggest looking into the polygon type. This article may be of some use.
The basic idea is that you create a polygon column and create an entry that corresponds to the start/end points for each row in your table, then you can run a query like this:
SELECT * FROM your_table WHERE MBRContains(polyfield, POINTFROMWKB(POINT(INET_ATON('1.2.3.4'), 0)));
As a point of reference, the above query runs in my local DB here in 0.02 seconds for any IP I can throw at it.
HTH.
Thank you for this interesting suggestion. I want it to work but I tried what you suggest and I don't see any difference in MySQL's query plan. I created exactly the same tables, except all columns of type int(10) unsigned converted to binary(4). The query plan is identical to the original.
Ahh, but is that unacceptable use of "infamous"?. If "random on the street" is your gauge, very little could be labeled such. Personally, I think your analogy is overstating it, rather than the summary. Ask a person using Mysql for production code about this issue. That is the gauge of infamy in this particular case.
I record my sleeptalking
I take it you're nluv4hs? In that case, if its only 2000 rows, this whole thing should be instant even without index indeed (on an average RDBMS), so this is quite the funny situation (though I'm not surprised).
No GP wasn't me. MySQL 5.0.45 takes 10 mins 45 seconds on 2124 rows (vs. 0 mins 1.3 seconds for PostgreSQL 8.1.11). Slow enough? Also GP misstated: an int(10) unsigned isn't 10 bytes wide, it's 4 bytes wide.
Ahh ok, my mistake. 10 mins and 45 seconds on 2124 rows? Wow... ok, I take my back previous comments stating PervasiveSQL was worse. I had similar queries (with a few orders of magnitude more rows) on there and I was bitching that it took over 2 minutes, thinking it was insane.
10 minutes and 45 seconds? What is that thing doing? Loading Crysis as part of the query analysis process?
The whole thing:
select range.id_country from address join range on address.address between range.begin_num and range.end_num
Yeah, I've seen it... I was mostly just joking. Its such a simple query. I noticed one of the other poster showed a query that actually worked with some changes, but it was far from being a standard query, thats for sure.
It's spelled "aristocats". Yeah, for sure!
Nerds all over world would try to replicate it with gnuSQL, hack together only half the functions and when a user messaged them with a bug or feature request scream at them "Doesn't happen on my computer, its open source fix it yourself!"
That's silly; IPs are 32 bit numbers, and MySQL has functions INET_NTOA and INET_ATON specifically to allow IPs to be easily stored as integers.
Looking at his query I'm not really sure what he's trying to do, but it's a full join without join clause or an index on one of the tables which throws up a few red flags. Whatever the guy was trying to do can probably be done in a much better way.
If MySQL can't do inefficient queries efficiently I don't care. It does efficient queries like the ones that run my site (and this one, and google) quick enough.
Not a fanboy, can't comment on 5.1, but "someone optimize my query" isn't a good database criticism.
// MD_Update(&m,buf,j);
We just use binary files in a number of critical aspects of our company, we don't need full data integrity, just pure speed.
Our homemade "database" is ten thousand times faster than MySQL's MyISAM db engine.
I would love to see that query!
And please post something more advanced than 10000 inserts
What I really don't understand with Sun is why they would not want to "play by the rules" by forcing the contributors to give them their GPL rights... If they were a fondation like the FSF, right, no pb... but since they are a big company with not a such good reputation, it would be more reassuring to let the contributors keep their rights, like the Linux spirit and near 100% of the other GPL projects.
Yeah, tell us why Finns name their programs that way?
Linus - Linux
My - MySQL
http://en.wikipedia.org/wiki/Michael_Widenius and http://en.wikipedia.org/wiki/Linus_Torvalds
I fail to see the problem.
MySQL 5.1 isn't upto snuff, just ditch it entirely and just use Postgre.
If someone is passing you on the right, you are an asshole for driving in the wrong lane.
The query is stripped down to the bare essentials, but I'd guess he's trying to track what countries his web visitors are from. The table in question (address) has ~2000 rows, each row with 1 32-bit number. That works out to 3-4 pages. At that size, using an index is detrimental to performance. Maybe it doesn't affect you, or most MySQL users, but MySQL's planner/optimizer sucks when you get beyond sql 101.
How about this one? The queries ACID makes (no relation to ACID of DB studies) against the database data generated by Snort are significantly faster in MySQL than Postgresql with certain operations.
PS, that's really old data but I knew where to find it quickly :-).
- Michael T. Babcock (Yes, I blog)
Don't forget the iSQL Shuffle. It's smaller and costs less, but you have to load in all your SQL queries at the beginning, and it runs them in a random order.
MySQL is a decent row store.
PostgreSQL is a robust relational database.
Please use and evaluate accordingly.
I'm actually as excited about Drizzle as I am hard on MySQL. MySQL has tried to grow into something it just wasn't designed for. Drizzle is a project to return MySQL to its strengths of being a simple, fast row store with SQL interface. I have no problem with that, I just object to MySQL masquerading as an "enterprise class" RDBMS.
Blessed are the pessimists, for they have made backups.
Okay... so what's your query?
Postgres is horrible it can't even handle this extremely basic query!
When I have a kid, I want to put him in one of those strollers for twins and then run around the mall looking frantic.
Really, Transact SQL, identity columns, and bit data types export to Oracle? I've always had to translate them to PL/SQL, sequences, and numbers. Yay, I win.
Fuck off is this flamebait, just because I saw the 'known knowns and unknown knowns' joke before anyone else.
PS, that's really old data but I knew where to find it quickly :-).
MySQL doesn't support transactions. I know that's a really old complaint but I knew where to find it quickly :-).
Dewey, what part of this looks like authorities should be involved?
Instead of testing whether 2000 random IPs are in a given range, then moving onto the next range, why not order the IPs using an index and use an outer join to output their country (because they'll already be sorted into ranges).
If you want to sort a bunch of people into different height ranges you usually order them all first and then split them, instead of measuring the height of each person.
// MD_Update(&m,buf,j);
The default tuning parameters for InnoDB are completely braindead. I don't know why they can't do a better job with that. For instance, the default InnoDB buffer pool size is 8MB. This is woefully inadequate for any nontrivial database operation, because that means your database will only allowed to be able to use 8MB of RAM for data operations.
Have you experienced performance issues with a properly-tuned InnoDB?
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
Thank you for the link and your example code, that helped a lot. I want this to work, but I don't see enough performance improvement yet. Did I do this the way you imagined? I didn't try to finish the query on the full set because it took too long. Here I ran the join on 16 rows.
mysql> create table `geomaddress` (`address` geometry NOT NULL, SPATIAL KEY `address` (`address`)) ENGINE=MyISAM select GeomFromText( concat( 'point(', address, ' 0)' ) ) as `address` from address;
Query OK, 2124 rows affected (0.08 sec)
mysql> CREATE TABLE `polyrange` (`poly` geometry NOT NULL, `id_country` tinyint(3) unsigned default NULL, SPATIAL KEY `poly` (`poly`), KEY `id_country` (`id_country`) ) ENGINE=MyISAM select GeomFromText( concat( 'polygon(( ', begin_num, ' 0, ', end_num, ' 0, ', begin_num, ' 0 ))' ) ) as `poly`, id_country from range;
Query OK, 105920 rows affected (24.07 sec)
mysql> create table `geoma2` (`address` geometry NOT NULL, SPATIAL KEY `address` (`address`)) ENGINE=MyISAM select address from geomaddress limit 16;
Query OK, 16 rows affected (0.00 sec)
mysql> select r.id_country from geoma2 a join polyrange r on MBRContains(r.poly,a.address);
16 rows in set (6.08 sec)
According to EXPLAIN MySQL isn't using the spatial index. It doesn't matter whether I use on or where.
mysql> explain select r.id_country from geoma2 a join polyrange r on MBRContains(r.poly,a.address)\G
*** 1. row ***
id: 1
select_type: SIMPLE
table: a
type: ALL
possible_keys: address
key: NULL
key_len: NULL
ref: NULL
rows: 16
Extra:
*** 2. row ***
id: 1
select_type: SIMPLE
table: r
type: ALL
possible_keys: poly
key: NULL
key_len: NULL
ref: NULL
rows: 105920
Extra: Range checked for each record (index map: 0x1)
2 rows in set (0.00 sec)
6.08 secs/16 = 0.38 secs/row. That would be 13 mins 22 seconds for 2124 rows. Can I do better?
Congratulations! your planner is better than mysql's! That's actually what the db should be doing when it performs that query. (The index is irrelevant with such a small amount of data, though).
The polygon is a representation of the range in dotted-quad representation of the network address. For example, 3520957440 to 3520958463 (209.221.140.0 to 209.221.143.255).
The table definition is as follows:
The poly_ip column is generated as follows:
GEOMFROMWKB(POLYGON(LINESTRING(POINT(3520957440, -1), POINT(3520958463, -1), POINT(3520958463, 1), POINT(3520957440, 1), POINT(3520957440, -1))))
Note that we're describing a rectangle here. Doing it on a line doesn't seem to work, hence the -1,1.
If that doesn't help. Feel free to e-mail me to continue this discussion.
Want to search with full-text indexing? Replicate the table to MyISAM. It's a pain to configure - but when it's done you get ACID and FT search.
How many records in each table? If both address and range have 2000 records then that is 4M combinations, which it might be running through (and discarding the ones that don't meet the join criteria). That isn't the smart way to do it, but I could see how it could happen.
Next thing you'll say something irritating like 'past performance is no indication of future behavior'.
You guys make me sick, with your logic and stuff.
Yeah, right.
Oh, and they shouldn't forget to include the triggers in their custom DB interface. That way they'll know they got it right...until they get it wrong.
Yeah, right.
select count(*) from twobillionrows;
Most of my posts are marked funny, as I am usually in a fairly irreverant mood when reading posts on /.
I have been using mysql for 5 years(Before that 3 years in Oracle), in hundreds of websites and a few intranets. Some driving huge queries. Some driving simple high volume queries. A few enormous transaction numbers in a few.
I have never had mysql "Blow a table" once, or any other issue, the exception being failing drives.
In post Patriot Act America, the library books scan you.
Anyway, shouldn't an industrial-strength RDBMS be able to interpret and optimize the simplest possible range join written as such?
HAHA!
After all, I am strangely colored.
I'm sorry, I forgot that many developers expect their storage engine to BE their application instead of writing good code themselves.
Shouldn't you be validating your dates or numbers or other values BEFORE sending them to your storage system? Shouldn't the database's job be to store your data in a logical fashion so its easy to find later, and then find it when you query it?
I don't understand people who expect the database to replace the middleware of their application.
I'm sorry, I forgot that many developers expect their storage engine to BE their application instead of writing good code themselves.
Has nothing to do with that, and you (should) know it.
I suppose that Oracle, Microsoft SQL Server, Sybase, IBM DB2, Informix (R.I.P.), etc. etc. have it all wrong? Not to mention PostgreSQL and now even MySQL 5.0/5.1, etc.
The purpose of data validation in the database is to ensure that the database preserves valid data. After all, we all know that your middleware is not filled with bugs that may silently corrupt the data that is being stored....
I like Open Source software because its free of the constraints of Proprietary Software. One of those constraints is software excellence. Commercial software manufacturers will only fix/debug software to the point where it becomes uneconomically viable. At that point, they stop. Open Source software developers will keep tilting at the windmill till its right. Someone somewhere will keep hammering away at it till their peeve is gone. Sun saying 'Its not just good, its good enough' is the Krusty-the-Klown(tm) commercial software mantra. Open Source users are accustomed to a much higher standard and are justifiably angry with the 'barely-barely, good-enough' effort. Sun took good software and screwed it up. I'm sticking to 5.0 (last non-Sun release) until I see something better.
In all fairness, I have to comment that I've been using MySQL for all of my projects for years, and I haven't had a crash yet. Because my software is database agnostic I tend to stick to simple inserts, updates, and deletes, and somewhat complex selects. But to date MySQL has been the best open source solution I've come across for flexibility, speed, and ease of use.
*** *** You're just jealous 'cause the voices talk to me... ***
Except, you're wrong? Transaction support has been in MySQL for many years now. I actually gave a valid performance metric to the questioner.
- Michael T. Babcock (Yes, I blog)
The point being that you can't use 8-year-old data to demonstrate one point any more than you can use 8-year-old data to demonstrate its opposite.
Dewey, what part of this looks like authorities should be involved?