So while there is a theoretical benefit to implementing I/O buffer management inside the RDBMS itself, you need to do a lot of work just to equal the implementation you get "for free" with a modern Unix system, before you even see any performance improvement.
Sounds about right. In my specific case I'm talking about Sybase ASE on Linux. From what I understand they are using new Red Hat 7.x runtimes which include async I/O for filesystem devices. So now we can have the best of both worlds -- OS buffered writes (as we all know non-buffered == slow;)) and asynch I/O. And moving RDBMS installs from one box to another is easier with files than raw partitions.
Based upon that example I would tend to guess the raw I/O would be 'built-in' and then filesystem access would utilize native libs of the OS, but it's all guesswork.:D
If you mean fdatasync(), then yeah -- I'd expect that would be faster on most systems. I was just using fsync() for clarity -- PostgreSQL can actually use 4 different but similar methods for forcing the OS to flush it's buffers (see the wal_sync_method GUC var).
Oh ok I wasn't sure. I thought d-synch was the 'best' which is why I though it was odd. IIRC in Solaris they're actually dsynch, fsynch and fsynch does some extra stuff (synchs file info stuff as well as the data which you wouldn't need if you kept device sizes from changing).
IIRC for raw disks it was far easier to implement asynchronous I/O which is a significant performance boost over synchronous I/O. With filesystems you had to rely on the OS which may not have it or implement it differently, and it tends to blow away the disk cache anyway.
I'm not a C/Linux hacker (just a DBA) so I don't know if that is true or not, but that was the feeling I got 'back in the day' when I was asked why asynch I/O was availible only on raw disks and not filesystem ones (although they now have it on filesystem files so I guess either they found a way to do it on Linux or Linux provided the tools to do so)?
p.s. isn't dsync a slightly better method to use than fsync if you're using pre-defined/sized filesystem devices?
Uh.. Duh?:D The parent poster was talking data warehousing and I thought my post was more than clear on that.:)
I've never used IQ for OLTP but I would guess it would puke as you suggested. ASE is more OLTP than DSS but it does have partitioning, parallelism, cost-based optimization, and unions. IQ has all that plus bitmaps and more DSS type functions.
I really like Sybase's choice in segregating the different products -- it's a lot nicer than Oracle's 'all in one' approach which encourages bloat, bugs, and security holes.:D
EXACTLY! It's amazing -- consider their latest 'innovation', the "read cache". Basically every SELECT query's results will be stored in a special hunk of memory. When updates occur, it invalidates the cache. So, this is a feature designed soley for mostly-read applications.
My question is -- "WHY?" If the query cache provides significant performance improvements for your application then you're, by definition, using MySQL as a glorified HTML generator. You can certainly get much better peformance if you took the query results and dumped to a flat text file and had clients read from there. Why go through the overhead of dynamically generating STATIC html? It seems like a vast waste of time and memory. If you're willing to put up with MySQL's limitations to achieve marginal performance improvements then you certainly should have no problem going the extra mile and putting up with just a little bit more inconvenience for another chunk of performance.
If MySQL's memory management was more fine-grained (e.g. setting up caches for specific objects with specific block sizes, etc.) you most likely wouldn't need a query cache. I can set up a dedicated memory cachelet for a certain table or index and know that it will reside in memory anyway, so I can save RAM and get high performance from ALL queries, not just SELECTs. Seems like a 'band-aid' fix for a gunshot wound.
Visit DBDebunk.com. According to Fabian Pascal virtually all SQL based DBMS are not Relational.
Specifically: *MySQL (and Oracle and DB2 etc.) is a SQL DBMS, or in other words, it is a DBMS based on the SQL data language;
*a true RDBMS is a DBMS truly, fully and correctly based on the relational data model;
*in the industry there is a common mistake that SQL-based DBMS are RDBMSs: SQL is not a true, full and correct implementation of the relational model;
So no, once and for all, MySQL is not a Relational Database Management System. It is a SQL-based management system. But then again, none of the SQL-based management systems are relational in the strictest sense either, but once you read Pascal/Date/etc. on that site you realize how much better your life would be as a SQL developer and DBA if they really *were*.
Another thing is to partner with big name 3rd party app vendors. Most shops choose Oracle, DB2, MS SQL, Sybase, etc. not because they necessarily 'need' it, but because it is what runs their SAP, PeopleSoft, Billing, etc. application. Get SAP, et al running on MySQL and they can switch without affecting their entrenched users.
They've already invested significantly into those apps, they're sure as hell not going to switch to something else that runs on MySQL. If a suitable product doesn't exist then they'd have to create it somehow. The cost to re-tool and re-train would far outweigh another check to Oracle.
Boy it seems not a couple weeks ago we were discussing something along these lines here on SlashDot. I think I've said everything I need to say on the lacking features of MySQL, so maybe I'll chat about something else.
I think everyone 'knows' of an Oracle-dependant piece of software in your shop (or school, or website, etc.) which really doesn't 'need' Oracle. We look at apps with 100 users and say - "Heck! Even flat-files would be fine for this app!". Those sorts of Oracle installs are certainly feeling the IT budget crunch. No longer can management write $80K checks to Oracle each year for support and product upgrades. They're looking for a way out and I think some of the smaller more niche products (Sybase ASE, PostgreSQL, MySQL, etc.) need to be ready to step in and take them (so listen up MySQL developers and zealots!).
When we discussed this before I brought up what I thought were valid complaints (no hot backups, no binary dumps, EXPLAIN output is cryptic and could be cleaner, replication is still a little immature, etc. etc.). Regardless, some 'rebuttals' I received were the Open Source Party Line - e.g. 'Why don't you code it yourself?' or 'There's a workaround for that.' That is certainly not a healthy attitude to tell potential customers of your product. These guys dropping $100K on a RDBMS and are feeling the pain would love to pay MySQL support contracts, however if you have the attitude that somehow the end-user, who would like to actually pay you money for your product, needs to keep their mouth shut and gratefully take what you charitably give out you're not going to retain them as a customer. Like it or not, they are used to getting what they pay for. And if you're really focused on getting more Enterprise-ish people to use it then you'd best start acting like it.:)
Give them what they want, treat them like CLIENTS (e.g. deserving of some respect) and not simply another l33t hax0r ("Code it yourself, n00b!") and they'll beat a path to your door.
Problem is things like Zend Accelerator do not work with Apache 2 yet.:(
As soon as PHP4 and Apache 2 play nicely (it wasn't setting cookies correctly on some browers for some reason??) and Zend (et al) solutions work on it we can't use it, although we are REALLY looking forward to it (the hybrid thread model is neat-oh).
Well, I feel that it is NOT suitable for enterprise use, and I thought I explained why. Perhaps I'm spoiled with Oracle, Sybase, etc. in that I don't have to code workarounds (which is what they are) to have hot backups, or spread over multiple dumps, etc.. At least to me that is what the Enterprise world knows, and loves, and expects. If you want MySQL to have a chance, you can't tell a company that they're going to have to write utilities (or custom code, or set up a dedicated replication backup server, etc.) to perform routine things that their RDBMS has been doing for them. They don't have to write custom application code to handle subselects, joined updates, hot backups, etc. and telling them they will have to simply 'deal with it' is not a good way to win the hearts and minds of developers, DBAs, etc.:D
Not to really focus heavily on this but I believe MySQL.com does mislead: designed for speed, power and precision in mission critical, heavy load use
Power, to me, implies much more SQL options (e.g. subselects, joined updates, etc.) which yes, IS BEING worked on, but is not there yet (nor stable from what the docs tell me). Power, to me as a DBA, means that I can tweak server settings without dropping and restarting the DB server. It means that I can see more than what I have now.:D
Stored procedures are typically faster than ad hoc SQL, so it would be faster to have precompiled SQL capabilities.
Mission critical implies that I can have fail-safe replication and failover when something goes wrong. It says that I don't need to have 'workarounds' to hot dump my database (say I can't afford to have a dedicated backup replication guy somewhere).
Is MySQL getting there? Of course - but as I said 'Not yet'.:D
The misunderstanding re: indexes came from SlashDot parsing out my less than in the query, so it cut it off. This is what I wrote (sometimes slashdot times out and I lose the post so I write them in textpad first): Given an example query - 'SELECT bob FROM sometable WHERE somecol = 45 and somecol2 less than 44' - if I have two indexes on sometable (somecol and somecol2) it can join the two indexes together and use the 'virtual' index to find rows.
I don't think the generalizations were overly broad, certainly you can achive hot backups via a roundabout way, or that you can 'sort of' work with data as you're loading it via load but in PRACTICE I think it was accurate. We could split hairs over how to achive hot backups but in the end, the enterprise customers just want non-corrupt backups (is there a way to achive something like a rolling transaction dump to achive up-to-the-minue backups without backing up the entire database? I know we can't live with day old dumps if we have data loss, again replication can help but if I issue 'delete table;' then that will be replicated, too:D).
As I said, it's sort of a trade-off between the costs of, say Oracle (which is typically obscenely high;)) and having to 'put up' with a lot of deficiencies which certainly are 'slated' to be addressed but the only roadmap we have is 'next version' with no real timetable (and many of us recall many years ago the docs stating 'real soon' for things which are still not implemented). At our site we started on MySQL and encountered major problems and eventually purchased Sybase ASE for very cheap (under $15,000) which includes 24/7 support. And we have hot backups, subselects, triggers, stored procs, etc. and all sorts of other things which we didn't have to spend time developing in house.
I suppose it comes down to can you afford to not have the enterprise features? The problem is most people don't know until they encounter a situation in which they need the feature.:D
Yes, I mis-spoke. 30 million pg view. Which many of them are writes, unfortunately.:D If they were reads we wouldn't have hit MySQL locking problems, and live would've been good.
Thanks for replying to my post. I'm only trying to help make MySQL better - I don't use it but I know many who do, and they have to struggle with issues which I think they shouldn't have to.:)
Re: replication Correct, my gripe was that the DBA now has to do something to fix a problem which should've never occurred.:D With my DBA hat on, I have to deal with a lot of issues, and 'worrying' about replication (it's supposed to put my mind at ease, right?:D) failing shouldn't happen.
Re: File system buffering Again, I am aware that is something you should 'know' when you're setting up your DB, but it would be nice to be able to control it on a table or at least a database level. Again I think system tables are really important and should always be non-buffered and so should several of my production DBs, but there could be a test DB that I don't care about and buffering will provide a performance advantage.
Does turning off buffering also kill the read cache, too?
Re: indexing I think everyone is mis-reading, the docs here explain what I'm getting at: mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimiser tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows. What I would like it to do would be to read BOTH indexes and then create some sort of a 'dynamic' index (really just the AND-ing of RowIDs after considering both indexes) to limit the results further.
Re: Optimizer I will attempt to assemble some test data, but in the field I've seen many-table joins (inner, outer, etc.) being performed noticeably slower on MySQL than in other RDBMs. I haven't checked postgresql, though.
I actually browsed through the C code for the optimizer but couldn't exactly find what I was looking for - from what I saw it sets up a query class which does the optimization inside of it? I thought the optimizer would be a separate 'entity'. Maybe I was looking in the wrong place.:D
I read the InfoWorld benchmark thing, and also downloaded the code they used. MySQL's performance gains came from the query cache which is quite cool, although on systems in which you're updating/inserting you're going to hose the cache and lose the benefits (this was tested informally on a couple million row table, and performance with the cache when being updated frequently was roughly equal to without).
Again - I'm not saying there's no use for MySQL - but the original thread starter was looking at a high trafficked system. It's kind of like trying to figure out the Total Cost of Ownership of a piece of software. When you have a high trafficked site not only is performance important but reliability, maintainability, disaster preparation and recovery, etc. all need to be factored in, which I attempted to illustrate in the 'gaps'.
You are not seeing my point -- obviously you either didn't read what I said or have no clue what I was talking about.
I never said you cannot create more than one index on a table. But for a single select, on a single table, you can only use a single index. I'm not making this up, read the docs: SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimiser tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.
HI peyote, I'm glad you took the time to intelligently reply to my post. I think we're not quite on the same page in a couple places, certainly MySQL may be 'stable' (see my other post about what I consider to be 'stable') but I think the story starter was concerned with high trafficked sites which is a horse of a different color from, say, my personal site or a lightly-used corporate intranet which certainly doesn't require the types of things I illustrated in my post. I run several high-trafficked sites (upwards of 100 million imp/mo.) and those were 'gaps' in MySQL which our team isolated as ranging from 'really helpful' to 'required'.
Rebuttal: Somehow that fact doesn't impede Yahoo!'s extensive use of it. (See Jeremy Zawodny's presentations at the recently-held OSCON.) If the query hangs on the slave halfway through, it isn't marked on the slave as having completed. When the slave becomes available again, it notes its pointer in the transaction log and catches up automatically. Oh, and replication (at least in 4.x, possibly also in 3.23.x) is transaction-safe.
Right, and the slave goes down. So I have to manually intervene to fix it. That is unacceptable. It should do something better than simply quit and say 'oops'. Yahoo's extensive use is, as I see from MySQL.com, fairly weak: Size of database: 25 GB Average number of concurrent connections: 60 Max number of concurrent connections: 250
That is not worth writing home about.
Rebuttal: As another poster wrote, leave the OS out of this. If you cannot properly configure your OS to not buffer writes, you probably shouldn't be running a 'mission critical' ANYTHING.
Correct, knowing what your filesystem does is your problem. The filesystem buffer will say 'Ok I wrote that' and MySQL can move on, even if it is living in a buffer before being written. Not a big deal if you're aware of it, but MySQL doesn't give me an easy way to change it, or to specify WHICH tables/databases NEED to be O_SYNC'ed or not. That is what I wanted - recompiling MySQL and having EVERYTHING or NOTHING buffered is not acceptable. There are certain tables or databases I KNOW will benefit from buffering, but with MySQL it's all or nothing.
Rebuttal: So set up a dedicated slave for backups. Turn off the slave while backups are running, it catches up when backups are done and it is brought back up. A *simple* solution to a *simple* problem. If you really feel the need to do a hot backup of your live server, you can check into using InnoDB's tool at http://www.innodb.com/hotbackup.html.
Ok, so I have to now get another boxen set up just to backup my database? How is getting another box, setting up replication, pausing replication, dumping, etc. etc. 'simpler' than saying 'dump database bob' and having MySQL do the rest for me? And I did point out InnoDB has a hot backup but that does work only with InnoDB.
Rebuttal: You mean like backing up the raw MyISAM files? Of course, that doesn't work with InnoDB databases, so you can use their hot backup tool for that as well, if this is a REAL (rather than IMAGINED) problem.
That only works if you offline your DB while your copy is being done. Again, not acceptable and what hot backups for ALL MySQL tables would fix.
Argument: I have to wait for 4 million INSERTs to be performed... Rebuttal: RTFM. No you don't.
Well, certainly you can always use 'CONCURRENT' but in practice that is not terribly useful. Why? Well my 4 million load operation will take LONGER to perform, and sure people can query it now, but I have no indexes on the table so all queries will now table scan. So I suppose you're right, you don't have to lock the table for the whole operation, but in reality, your app won't run very well at all (are writes allowed to the table while you're loading?).
Rebuttal: Again, is this a real or imagined problem? It's likely that whatever you are trying to do, there's a better way. Unfamiliarity with a particular tool usually results in this type of problem.
It can be - depending on your OS and whether or not you have it configured for large file support. But again striping it can have significant performance advantages since I could dump AND load to multiple disks that leads to a significantly higher throughput rate.
Rebuttal: So because YOU don't like the output of EXPLAIN you're saying MySQL isn't ready for production? WTF are you talking about?! As for a more *in-depth* EXPLAIN, I agree there, and I found PostgreSQL's mechanism kind of cool. Of course, in four years of running MySQL in a production environment, I'm not sure I would have used it more than once or twice; MySQL's EXPLAIN has always been sufficient, if you actually know what you're doing.
Not necessarily 'production' use, but production use in a highly trafficked environment.
RTFM. You are obviously unfamiliar with the slow queries log where MySQL gives you EXACTLY the information you are looking for. As for the data cache and whatnot, I don't know if that is actually available or not.
Yes I've read the manual and seen the slow-query-log portion. However the log does not log the currently running SQL, only after it has completed. If my server is slow NOW I have to wait till whatever is blocking everything to complete BEFORE I knew what happened. I'd like a way to see RIGHT NOW what is going on, so if I see a rouge query performing a Cartesian product I can easily kill it and life can move on. The slow query log does not fit the bill.
No, there isn't a way to manage the data cache aside from setting config options on sizes of key caches and the like. I can't create dedicated caches to certain tables which is quite useful, nor can I change the I/O sizes of the caches if I know queries being performed on them would like 16K I/O vs 4K or whatnot.
Rebuttal: And your evidence? Oh wait, you didn't actually provide any. You just brought up a tangential issue...
Part of it comes into the fact that I can't manage index statistics (or even see them), can't tell what types of joins are being performed, etc. and the fact that the optimizer is somewhat 'young'. Now, simply because I can't see statistics doesn't mean the optimizer isn't using them, but I can't gauge how well they are being used, and I suspect the optimizer isn't as advanced as others. Optimizers are weird creatures, and with time it will certainly improve. However many-table joins are consistently slower in MySQL than many others - I'll see if I can't get test data up and run some quasi-useful benchmarks but I've seen this time and time again in properly normalized, indexed tables queries which run on Sybase ASE, Oracle, etc. with nontrivial amounts of data do not run as well on MySQL (which touts SELECT speed as their main advantage, I guess complicated SELECTs are not as nice).
Rebuttal: If you are that concerned, you have some good options here. (a) Pay the developers to hold your hand and explain to you what has happened. (b) Use the source and do your own friggin' diff. This is Unix; stop acting so helpless.
We don't use MySQL in a production environment, so I'm certainly not going to pay or waste my time coding. But MySQL.com seemingly touts that they are #1 in so many aspects and conveniently glosses over some issues which are important to those who write and manage large applications. I'm simply pointing out where they need work, and if they're serious about their claims then they should have no problem whatsoever implementing them.
Dude, have you not even looked at MySQL since 3.21 or something? Row locking is available in InnoDB, as are transactions. Stored procedures and triggers are planned for 5.x IIRC, but so many applications DON'T need them that the MySQL folks simply haven't cared to add them. Ditto for views (which are also slated for 5.x).
As I said in another post, I do lots of consulting and yes, I read the MySQL docs quite often. InnoDB does provide row locking, but at a cost of performance and memory. If you know much about how large-scale applications are built (which is what I think the comment author asked about) you know how well stored procedures and triggers are used. He asked if MySQL 4 (over something like PostGRES) was suitable in a high-trafficked environment. Those features are a serious part of any non-trivial application.
and is it stable enough to run on a high traffic site
I count most of what I wrote as part of being 'stable' on a high traffic site.
Requiring your DB to be offline while you dump is not acceptable for a high traffic site. Fail-safe replication is required for high stability on a high-trafficked site. Backups which are flat-file based and require indexes to be created are not acceptable for a high-traffic site. Having a poorly performing system and you not being able to diagnose why because the MySQL server provides inadequate tools is not acceptable.
I suppose it comes down to how you define 'stable'. Stable as in 'not crashing' may be true (although it still can and does crash). Stable as in 'highly-available' is not true.
You misread (or I mis-wrote) the multi-index statement. Multiple indexes on a SINGLE TABLE (which is clear from my SQL) is NOT implemented in MySQL. Multiple indexes on a single query works just fine if you have more than one table, but there still is the limit of one index per table. Either that, or it is so buggy it does not work when it is supposed to.
I don't use MySQL in production but I work with many people who do, and I also do significant consulting to clients who run MySQL. Perhaps you should know what *you're* talking about before posting -- I run several highly available, high trafficked sites (30+ million hits a week). But I guess you don't have the guts to stand behind your comments with your real moniker.
Why MySQL is Not Suitable for Enterprise or High-Volume Use or MySQL.com misleads you about it's capabilities
Replication in MySQL is a joke for 'mission critical' use. As I understand it, the binary log records SQL modification statements which are executed on a master, not the data which was changed. This is involves significant assumptions beforehand, such that the master and slave(s) must be 100% identical. If I perform an UPDATE on the master, the changes are not replicated, but the query. This is what I would call the 'easy way out'. Who knows what happens to the query once it is replicated out - what if it hangs halfway through? I can't roll back and be in a consistent state, I have half-completed changes which makes my database inconsistent and now I'm forced to dump-and-load. Keep track of which rows are modified, to what from what, and ensure that those transactions are replicated to my slaves. Anything less is simply useless for high availability.
I would also be willing to bet that a significant number of installations that have transitioned to MySQL replication are doing so due to table-lock induced latency. A suitable system with a capable RDBMS could probably handle all of the load given to it and not need 'many slaves' to handle the extra traffic. They would have a single failover for high availability and that's it.
Filesystem buffered writes. Transactional support is great - it allows me to roll-back aborted transactions. However, due to the inability to control whether or not my tables are write-buffered means that MySQL may *think* it has performed a write even though it is still in the write-cache. I can then turn off the system and voila -- corruption! Part of the fault lies in the OS who tells MySQL it was written even though it is in the cache, but I have a simple solution. Devise a way to selectively turn off buffered writes for certain tables / databases. This way if I know I have a critical table which has a lot of writes I can turn buffering off and be ASSURED that writes will be performed when asked. I suspect a lot of 1040 and other table corruptions are caused by something like this. Yes, performance will take a hit but I think it is a very acceptable trade-off for data corruption. Obviously all system tables should NOT be buffered.
Inability to use more than one index on a table in a query -- most enterprise RDBMS' can use more than one index on a table for a query. This can easily save a table scan or the use of a single, less-efficient index. Given an example query - 'SELECT bob FROM sometable WHERE somecol = 45 and somecol2
Clustered indexes. These basically physically sort the table based on particular columns. This allows you to ORDER BY username ASC without using anything special since the rows are already sorted on username (if you have a users table and cluster the username col). This also greatly speeds up BETWEEN clauses. And yes, to people who know a little bit of SQL but don't know as much about clustered indexes -- you can create an index with a bobcol ASC but clustering the actual data is faster and more efficient if you are grabbing data which is not on the index. For example, SELECT * FROM table ORDER BY username ASC will not be as efficient as the same query clustered on the username. If you had a sorted index on username it will probably read the index sequentially and then visit the table. That extra operation = more disk seeks = more time / cpu to execute (and it really adds up as the table size increases). However, if you are doing something like 'select username, password from user order by username' it would be better to create a sorted index on username ASC, password. That way it will read the index only and not visit the table at all.
On-line backups. In today's internet world your site has to be 24/7. This means you cannot have significant performance problems (or even offline-ing your dB!) when you make a dump -- Sybase, etc. have done this from as far back as I can remember. Postgres can do this with an add-on which is well worth the money. As far as I know MySQL can only do this with InnoDB tables and is a for-pay feature (since it has a MVC log to use in the meantime).
Backups to something other than CSV files. MS SQL, Sybase, Oracle, they all dump to a compressed binary file. Saves a TON of space and is MUCH FASTER to dump and load. I can dump a 12GB Sybase DB in under 20 minutes. Loading it all (from scratch) and then bringing the DB online is about the same amount of time. MySQL stupidly logs the CREATE TABLE / INSERT statements. What does this mean? That I have to wait for 4 million INSERTs to be performed when loading my table, and FURTHER I have to wait for the INDEXES to be re-created on the new data. Dump the indexes, too! (Remember that full-text indexing is just another index, so if you use that and have to load from a dump be in store for SIGNIFICANT downtime).
Ability to specify the number of files to dump to. What happens if you have a dump which is larger than 2GB? Some linux distros cannot handle a single file of 2GB or more without recompiling the kernel. Give users a way to, within the dump statement, split the dump over two files. Not only will that help avoid the 2GB limit, but it can speed up dump/loads since I can dump to a bunch of different disks to improve throughput. Sybase has the 'STRIPE ON' clause (originally to dump to two tape drives at once but works fine on filesystem files as well) to split the dump equally over an unlimited number of files. This also impacts the fact that MySQL tables and indexes are stored in filesystem files that are also subject to a 2GB limit.
Cleaner way to view query plans of statements. EXPLAIN... is great and all, but the resulting table is a PAIN to read. What I want to know is simple:
Query is using XYZ, ABC tables. Table XYZ is using index 123 which is sorted so I do not need to create a temp table to sort ASC.
Since you have all the columns in your select statement in the index I do not have to visit the actual table - I can pull it all from the index. Because of this, I will read the index from start to finish.
ABC is using index 23dsf which is not sorted so I must create a temp table to sort that. Also, since it is a join, I do not need to perform an index scan but a positioned search (table scan is to a WHERE clause with no index AS index scan is to an index which is not selective enough or needs to read all columns.)
Simple, easy and pretty much even a NOVICE can see that their query is a good performer or a bad performer.
Along with more in-depth EXPLAIN, also provide me with a way to see what the optimizer is doing with the query. In MS SQL and Sybase you have 'trace flags' which you can turn on before your query to see EXACTLY what Sybase is doing - why does it think this index is better than this other one, why is it table scanning when you think it should index sort, etc. Give me an easy way to say 'verbose on; explain xxx;'.
Ability to delve deeply into performance of the system. If there is one job a DBA must know it's how to tell what the heck is going on when something is slow. Currently MySQL gives you meaningless info like 'slow queries'. Great, I see 200,000 of them. What queries are they? What good is it in a large application which may contain 3000 lines of SQL to tell me the raw number of queries which are slow? I want to know the EXACT SQL of the query(s) which are slow and I want to find the one taking up the most CPU time and blocking all the rest. I want to know how MySQL is managing it's data cache so I can see if I need more ram (e.g. it is swapping lots of data to/from the cache) or if I am I/O bound. Don't tell me to look at 'free' or 'top' - half the time it is wrong because you (MySQL) tell it misleading figures. I want *you* to tell me exactly what you are doing since you would know best! If you've ever seen a sp_sysmon output from Sybase ASE you'd know what I'm talking about.
MySQL's query optimizer is PISS POOR. If I see another changelog entry like this I'm going to scream:
Optimized queries of type: SELECT DISTINCT * from table_name ORDER by key_part1 LIMIT # So does that mean these queries were NOT AT ALL optimized before? It doesn't read 'FURTHER optimized'.
"ORDER BY... DESC can now use keys." Does that mean it was table scanning each time? Jebus! Hands down the query optimizer is one of the most important things in the database -- knowing how to use the database statistics and knowing when to use a merge-join vs. a hash-join etc. are CRITIAL to database performance.
Of course, the usuals: integrated row (or in the least page) locking, full support of subqueries, stored procedures, views, triggers, referential integrity, transactions, etc. etc. etc.'
PostGRES and virtually 100% of 'for pay' RDBMs have this. There simply is no reason to use MySQL for anything sufficiently non-trivial.
P.S. I'm not the coordinator or anything, I certainly didn't make the decision to use Virgin. I just thought I'd post the location so that people who didn't register could find their way.
Parking was a bitch, to say the least. I saw you guys walking to B&N (or whatever the bookstore next to Virgin is) but by the time my friend and I parked and got there no one was in B&N (at least no one from the group that I could see). I think we caught sight of the group one more time in front of Brio but couldn't meet up.:(
We drove around for a while but couldn't find you guys again. My suggestion would be for the meet spot to be the FINAL spot, so that if people show up late or what not there's no worries.
I agree Virgin won't be a lot of fun... But I voted for Virgin simply because there are a lot of other options (food, beer, gaming, etc.) in the same 'block' which will make choosing something else quite easy.
To those geeks in Columbus who don't want to give your email out (can't you register a hotmail addy for this? Sheesh!):
Virgin Megastore 3965 Townfair Way Columbus, OH
Thursday, July 25 @ 7:00PM EST
That's the Virgin Megastore in Easton Town center. Easton is huge and leet, so if you don't want to hang out in Virgin (ha!) Megastore there's Adobe Gillas, Fados, etc -- lots of pubs and such all around there.
29 people signed up for Columbus, OH -- 6 confirmed!:D Not sure what we're going to do in Virgin Megastore (browse the DVD section?) but there's lots to do in Easton.
As the article poster touched on, this won't do anything if you're concerned with RDBMS integrity (and have a site which requires write access to your RDBMS).
For static content, it sounds like a cool idea, even if they get root all they can do is view things and not touch. Of course, if that compromised boxen is attached to an internal network to your RDBMS, then they can go to hax0ring the heck out of your DB, they just have to use whatever tools you have installed on the web server.
Batman: I finally found the people that killed my parents! Prepare to die!:holds up BatGun: Superman, swooping in: I'm sorry Bruc-- I mean 'Bat-Man'. I can't let you kill them, that would be wrong. Batman: Oh shut up you tight-wearing pansy!:looks at own tights: Ok, well at least I make this look good! Superman: After all we've been through together!:wipes tear away: Prepare to die!:turns on heat vision: Batman: Ah-hah! 'BatAntiSupermanHeatShield' activate!:heat vision deflects and blows up a nearby building: Superman: You bastard! I'll crush you with my pinky!:lunges at the Caped Crusader: Batman: Not so fast, Superdork!:whips out glowing green kryptonite rock: Superman: Noo, my one weakness! My Achilles rock, as it were!:grabs Bruce's parents killers: Now I have YOUR weakness, the vengeance for your parent's killers!:laughs heartily:
...Lather, rinse, repeat...
2 hours later
Superman::breathless and weary from a long fight: How did... how did Lex Luthor and The Joker team up to defeat us both? Batman::also beat-up: I don't know, but I suspect it is the scene for Batman V Superman II: "Two super-heads are better than one!"
Sounds about right. In my specific case I'm talking about Sybase ASE on Linux. From what I understand they are using new Red Hat 7.x runtimes which include async I/O for filesystem devices. So now we can have the best of both worlds -- OS buffered writes (as we all know non-buffered == slow
Based upon that example I would tend to guess the raw I/O would be 'built-in' and then filesystem access would utilize native libs of the OS, but it's all guesswork.
Oh ok I wasn't sure. I thought d-synch was the 'best' which is why I though it was odd. IIRC in Solaris they're actually dsynch, fsynch and fsynch does some extra stuff (synchs file info stuff as well as the data which you wouldn't need if you kept device sizes from changing).
IIRC for raw disks it was far easier to implement asynchronous I/O which is a significant performance boost over synchronous I/O. With filesystems you had to rely on the OS which may not have it or implement it differently, and it tends to blow away the disk cache anyway.
I'm not a C/Linux hacker (just a DBA) so I don't know if that is true or not, but that was the feeling I got 'back in the day' when I was asked why asynch I/O was availible only on raw disks and not filesystem ones (although they now have it on filesystem files so I guess either they found a way to do it on Linux or Linux provided the tools to do so)?
p.s. isn't dsync a slightly better method to use than fsync if you're using pre-defined/sized filesystem devices?
Uh.. Duh? :D The parent poster was talking data warehousing and I thought my post was more than clear on that. :)
:D
I've never used IQ for OLTP but I would guess it would puke as you suggested. ASE is more OLTP than DSS but it does have partitioning, parallelism, cost-based optimization, and unions. IQ has all that plus bitmaps and more DSS type functions.
I really like Sybase's choice in segregating the different products -- it's a lot nicer than Oracle's 'all in one' approach which encourages bloat, bugs, and security holes.
Sybase IQ and ASE have all those features as well (if you're again looking for Oracle features at a MUCH LESS cost).
Sybase IQ recently employed the largest data warehouse on record of something like 75TB of data and peformed faster than Oracle with 1TB.
EXACTLY! It's amazing -- consider their latest 'innovation', the "read cache". Basically every SELECT query's results will be stored in a special hunk of memory. When updates occur, it invalidates the cache. So, this is a feature designed soley for mostly-read applications.
My question is -- "WHY?" If the query cache provides significant performance improvements for your application then you're, by definition, using MySQL as a glorified HTML generator. You can certainly get much better peformance if you took the query results and dumped to a flat text file and had clients read from there. Why go through the overhead of dynamically generating STATIC html? It seems like a vast waste of time and memory. If you're willing to put up with MySQL's limitations to achieve marginal performance improvements then you certainly should have no problem going the extra mile and putting up with just a little bit more inconvenience for another chunk of performance.
If MySQL's memory management was more fine-grained (e.g. setting up caches for specific objects with specific block sizes, etc.) you most likely wouldn't need a query cache. I can set up a dedicated memory cachelet for a certain table or index and know that it will reside in memory anyway, so I can save RAM and get high performance from ALL queries, not just SELECTs. Seems like a 'band-aid' fix for a gunshot wound.
Visit DBDebunk.com. According to Fabian Pascal virtually all SQL based DBMS are not Relational.
Specifically:
*MySQL (and Oracle and DB2 etc.) is a SQL DBMS, or in other words, it is a DBMS based on the SQL data language;
*a true RDBMS is a DBMS truly, fully and correctly based on the relational data model;
*in the industry there is a common mistake that SQL-based DBMS are RDBMSs: SQL is not a true, full and correct implementation of the relational model;
So no, once and for all, MySQL is not a Relational Database Management System. It is a SQL-based management system. But then again, none of the SQL-based management systems are relational in the strictest sense either, but once you read Pascal/Date/etc. on that site you realize how much better your life would be as a SQL developer and DBA if they really *were*.
Another thing is to partner with big name 3rd party app vendors. Most shops choose Oracle, DB2, MS SQL, Sybase, etc. not because they necessarily 'need' it, but because it is what runs their SAP, PeopleSoft, Billing, etc. application. Get SAP, et al running on MySQL and they can switch without affecting their entrenched users.
They've already invested significantly into those apps, they're sure as hell not going to switch to something else that runs on MySQL. If a suitable product doesn't exist then they'd have to create it somehow. The cost to re-tool and re-train would far outweigh another check to Oracle.
Boy it seems not a couple weeks ago we were discussing something along these lines here on SlashDot. I think I've said everything I need to say on the lacking features of MySQL, so maybe I'll chat about something else.
:)
I think everyone 'knows' of an Oracle-dependant piece of software in your shop (or school, or website, etc.) which really doesn't 'need' Oracle. We look at apps with 100 users and say - "Heck! Even flat-files would be fine for this app!". Those sorts of Oracle installs are certainly feeling the IT budget crunch. No longer can management write $80K checks to Oracle each year for support and product upgrades. They're looking for a way out and I think some of the smaller more niche products (Sybase ASE, PostgreSQL, MySQL, etc.) need to be ready to step in and take them (so listen up MySQL developers and zealots!).
When we discussed this before I brought up what I thought were valid complaints (no hot backups, no binary dumps, EXPLAIN output is cryptic and could be cleaner, replication is still a little immature, etc. etc.). Regardless, some 'rebuttals' I received were the Open Source Party Line - e.g. 'Why don't you code it yourself?' or 'There's a workaround for that.' That is certainly not a healthy attitude to tell potential customers of your product. These guys dropping $100K on a RDBMS and are feeling the pain would love to pay MySQL support contracts, however if you have the attitude that somehow the end-user, who would like to actually pay you money for your product, needs to keep their mouth shut and gratefully take what you charitably give out you're not going to retain them as a customer. Like it or not, they are used to getting what they pay for. And if you're really focused on getting more Enterprise-ish people to use it then you'd best start acting like it.
Give them what they want, treat them like CLIENTS (e.g. deserving of some respect) and not simply another l33t hax0r ("Code it yourself, n00b!") and they'll beat a path to your door.
Wrong, re: "Doom map for practice":o m.htm
http://www.snopes.com/spoons/noose/do
I'm checking out Caudium right now. How does PHP speeds compare from Caudium and Apache? I am also giving Zeus a spin. :D
Problem is things like Zend Accelerator do not work with Apache 2 yet. :(
As soon as PHP4 and Apache 2 play nicely (it wasn't setting cookies correctly on some browers for some reason??) and Zend (et al) solutions work on it we can't use it, although we are REALLY looking forward to it (the hybrid thread model is neat-oh).
Well, I feel that it is NOT suitable for enterprise use, and I thought I explained why. Perhaps I'm spoiled with Oracle, Sybase, etc. in that I don't have to code workarounds (which is what they are) to have hot backups, or spread over multiple dumps, etc.. At least to me that is what the Enterprise world knows, and loves, and expects. If you want MySQL to have a chance, you can't tell a company that they're going to have to write utilities (or custom code, or set up a dedicated replication backup server, etc.) to perform routine things that their RDBMS has been doing for them. They don't have to write custom application code to handle subselects, joined updates, hot backups, etc. and telling them they will have to simply 'deal with it' is not a good way to win the hearts and minds of developers, DBAs, etc. :D
:D
:D
:D).
;)) and having to 'put up' with a lot of deficiencies which certainly are 'slated' to be addressed but the only roadmap we have is 'next version' with no real timetable (and many of us recall many years ago the docs stating 'real soon' for things which are still not implemented). At our site we started on MySQL and encountered major problems and eventually purchased Sybase ASE for very cheap (under $15,000) which includes 24/7 support. And we have hot backups, subselects, triggers, stored procs, etc. and all sorts of other things which we didn't have to spend time developing in house.
:D
Not to really focus heavily on this but I believe MySQL.com does mislead:
designed for speed, power and precision in mission critical, heavy load use
Power, to me, implies much more SQL options (e.g. subselects, joined updates, etc.) which yes, IS BEING worked on, but is not there yet (nor stable from what the docs tell me). Power, to me as a DBA, means that I can tweak server settings without dropping and restarting the DB server. It means that I can see more than what I have now.
Stored procedures are typically faster than ad hoc SQL, so it would be faster to have precompiled SQL capabilities.
Mission critical implies that I can have fail-safe replication and failover when something goes wrong. It says that I don't need to have 'workarounds' to hot dump my database (say I can't afford to have a dedicated backup replication guy somewhere).
Is MySQL getting there? Of course - but as I said 'Not yet'.
The misunderstanding re: indexes came from SlashDot parsing out my less than in the query, so it cut it off. This is what I wrote (sometimes slashdot times out and I lose the post so I write them in textpad first):
Given an example query - 'SELECT bob FROM sometable WHERE somecol = 45 and somecol2 less than 44' - if I have two indexes on sometable (somecol and somecol2) it can join the two indexes together and use the 'virtual' index to find rows.
I don't think the generalizations were overly broad, certainly you can achive hot backups via a roundabout way, or that you can 'sort of' work with data as you're loading it via load but in PRACTICE I think it was accurate. We could split hairs over how to achive hot backups but in the end, the enterprise customers just want non-corrupt backups (is there a way to achive something like a rolling transaction dump to achive up-to-the-minue backups without backing up the entire database? I know we can't live with day old dumps if we have data loss, again replication can help but if I issue 'delete table;' then that will be replicated, too
As I said, it's sort of a trade-off between the costs of, say Oracle (which is typically obscenely high
I suppose it comes down to can you afford to not have the enterprise features? The problem is most people don't know until they encounter a situation in which they need the feature.
Yes, I mis-spoke. 30 million pg view. Which many of them are writes, unfortunately. :D If they were reads we wouldn't have hit MySQL locking problems, and live would've been good.
Zak,
:)
:D With my DBA hat on, I have to deal with a lot of issues, and 'worrying' about replication (it's supposed to put my mind at ease, right? :D) failing shouldn't happen.
:D
Thanks for replying to my post. I'm only trying to help make MySQL better - I don't use it but I know many who do, and they have to struggle with issues which I think they shouldn't have to.
Re: replication
Correct, my gripe was that the DBA now has to do something to fix a problem which should've never occurred.
Re: File system buffering
Again, I am aware that is something you should 'know' when you're setting up your DB, but it would be nice to be able to control it on a table or at least a database level. Again I think system tables are really important and should always be non-buffered and so should several of my production DBs, but there could be a test DB that I don't care about and buffering will provide a performance advantage.
Does turning off buffering also kill the read cache, too?
Re: indexing
I think everyone is mis-reading, the docs here explain what I'm getting at:
mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimiser tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.
What I would like it to do would be to read BOTH indexes and then create some sort of a 'dynamic' index (really just the AND-ing of RowIDs after considering both indexes) to limit the results further.
Re: Optimizer
I will attempt to assemble some test data, but in the field I've seen many-table joins (inner, outer, etc.) being performed noticeably slower on MySQL than in other RDBMs. I haven't checked postgresql, though.
I actually browsed through the C code for the optimizer but couldn't exactly find what I was looking for - from what I saw it sets up a query class which does the optimization inside of it? I thought the optimizer would be a separate 'entity'. Maybe I was looking in the wrong place.
I read the InfoWorld benchmark thing, and also downloaded the code they used. MySQL's performance gains came from the query cache which is quite cool, although on systems in which you're updating/inserting you're going to hose the cache and lose the benefits (this was tested informally on a couple million row table, and performance with the cache when being updated frequently was roughly equal to without).
Again - I'm not saying there's no use for MySQL - but the original thread starter was looking at a high trafficked system. It's kind of like trying to figure out the Total Cost of Ownership of a piece of software. When you have a high trafficked site not only is performance important but reliability, maintainability, disaster preparation and recovery, etc. all need to be factored in, which I attempted to illustrate in the 'gaps'.
Really? Do you have an example?
You are not seeing my point -- obviously you either didn't read what I said or have no clue what I was talking about.
I never said you cannot create more than one index on a table. But for a single select, on a single table, you can only use a single index. I'm not making this up, read the docs:
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimiser tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.
http://www.mysql.com/doc/M/y/MySQL_indexes.html
HI peyote, I'm glad you took the time to intelligently reply to my post. I think we're not quite on the same page in a couple places, certainly MySQL may be 'stable' (see my other post about what I consider to be 'stable') but I think the story starter was concerned with high trafficked sites which is a horse of a different color from, say, my personal site or a lightly-used corporate intranet which certainly doesn't require the types of things I illustrated in my post. I run several high-trafficked sites (upwards of 100 million imp/mo.) and those were 'gaps' in MySQL which our team isolated as ranging from 'really helpful' to 'required'.
.
Rebuttal: Somehow that fact doesn't impede Yahoo!'s extensive use of it. (See Jeremy Zawodny's presentations at the recently-held OSCON.) If the query hangs on the slave halfway through, it isn't marked on the slave as having completed. When the slave becomes available again, it notes its pointer in the transaction log and catches up automatically. Oh, and replication (at least in 4.x, possibly also in 3.23.x) is transaction-safe.
Right, and the slave goes down. So I have to manually intervene to fix it. That is unacceptable. It should do something better than simply quit and say 'oops'. Yahoo's extensive use is, as I see from MySQL.com, fairly weak:
Size of database: 25 GB
Average number of concurrent connections: 60
Max number of concurrent connections: 250
That is not worth writing home about.
Rebuttal: As another poster wrote, leave the OS out of this. If you cannot properly configure your OS to not buffer writes, you probably shouldn't be running a 'mission critical' ANYTHING.
Correct, knowing what your filesystem does is your problem. The filesystem buffer will say 'Ok I wrote that' and MySQL can move on, even if it is living in a buffer before being written. Not a big deal if you're aware of it, but MySQL doesn't give me an easy way to change it, or to specify WHICH tables/databases NEED to be O_SYNC'ed or not. That is what I wanted - recompiling MySQL and having EVERYTHING or NOTHING buffered is not acceptable. There are certain tables or databases I KNOW will benefit from buffering, but with MySQL it's all or nothing.
Rebuttal: So set up a dedicated slave for backups. Turn off the slave while backups are running, it catches up when backups are done and it is brought back up. A *simple* solution to a *simple* problem. If you really feel the need to do a hot backup of your live server, you can check into using InnoDB's tool at http://www.innodb.com/hotbackup.html
Ok, so I have to now get another boxen set up just to backup my database? How is getting another box, setting up replication, pausing replication, dumping, etc. etc. 'simpler' than saying 'dump database bob' and having MySQL do the rest for me? And I did point out InnoDB has a hot backup but that does work only with InnoDB.
Rebuttal: You mean like backing up the raw MyISAM files? Of course, that doesn't work with InnoDB databases, so you can use their hot backup tool for that as well, if this is a REAL (rather than IMAGINED) problem.
That only works if you offline your DB while your copy is being done. Again, not acceptable and what hot backups for ALL MySQL tables would fix.
Argument: I have to wait for 4 million INSERTs to be performed...
Rebuttal: RTFM. No you don't.
Well, certainly you can always use 'CONCURRENT' but in practice that is not terribly useful. Why? Well my 4 million load operation will take LONGER to perform, and sure people can query it now, but I have no indexes on the table so all queries will now table scan. So I suppose you're right, you don't have to lock the table for the whole operation, but in reality, your app won't run very well at all (are writes allowed to the table while you're loading?).
Rebuttal: Again, is this a real or imagined problem? It's likely that whatever you are trying to do, there's a better way. Unfamiliarity with a particular tool usually results in this type of problem.
It can be - depending on your OS and whether or not you have it configured for large file support. But again striping it can have significant performance advantages since I could dump AND load to multiple disks that leads to a significantly higher throughput rate.
Rebuttal: So because YOU don't like the output of EXPLAIN you're saying MySQL isn't ready for production? WTF are you talking about?! As for a more *in-depth* EXPLAIN, I agree there, and I found PostgreSQL's mechanism kind of cool. Of course, in four years of running MySQL in a production environment, I'm not sure I would have used it more than once or twice; MySQL's EXPLAIN has always been sufficient, if you actually know what you're doing.
Not necessarily 'production' use, but production use in a highly trafficked environment.
RTFM. You are obviously unfamiliar with the slow queries log where MySQL gives you EXACTLY the information you are looking for. As for the data cache and whatnot, I don't know if that is actually available or not.
Yes I've read the manual and seen the slow-query-log portion. However the log does not log the currently running SQL, only after it has completed. If my server is slow NOW I have to wait till whatever is blocking everything to complete BEFORE I knew what happened. I'd like a way to see RIGHT NOW what is going on, so if I see a rouge query performing a Cartesian product I can easily kill it and life can move on. The slow query log does not fit the bill.
No, there isn't a way to manage the data cache aside from setting config options on sizes of key caches and the like. I can't create dedicated caches to certain tables which is quite useful, nor can I change the I/O sizes of the caches if I know queries being performed on them would like 16K I/O vs 4K or whatnot.
Rebuttal: And your evidence? Oh wait, you didn't actually provide any. You just brought up a tangential issue...
Part of it comes into the fact that I can't manage index statistics (or even see them), can't tell what types of joins are being performed, etc. and the fact that the optimizer is somewhat 'young'. Now, simply because I can't see statistics doesn't mean the optimizer isn't using them, but I can't gauge how well they are being used, and I suspect the optimizer isn't as advanced as others. Optimizers are weird creatures, and with time it will certainly improve. However many-table joins are consistently slower in MySQL than many others - I'll see if I can't get test data up and run some quasi-useful benchmarks but I've seen this time and time again in properly normalized, indexed tables queries which run on Sybase ASE, Oracle, etc. with nontrivial amounts of data do not run as well on MySQL (which touts SELECT speed as their main advantage, I guess complicated SELECTs are not as nice).
Rebuttal: If you are that concerned, you have some good options here. (a) Pay the developers to hold your hand and explain to you what has happened. (b) Use the source and do your own friggin' diff. This is Unix; stop acting so helpless.
We don't use MySQL in a production environment, so I'm certainly not going to pay or waste my time coding. But MySQL.com seemingly touts that they are #1 in so many aspects and conveniently glosses over some issues which are important to those who write and manage large applications. I'm simply pointing out where they need work, and if they're serious about their claims then they should have no problem whatsoever implementing them.
Dude, have you not even looked at MySQL since 3.21 or something? Row locking is available in InnoDB, as are transactions. Stored procedures and triggers are planned for 5.x IIRC, but so many applications DON'T need them that the MySQL folks simply haven't cared to add them. Ditto for views (which are also slated for 5.x).
As I said in another post, I do lots of consulting and yes, I read the MySQL docs quite often. InnoDB does provide row locking, but at a cost of performance and memory. If you know much about how large-scale applications are built (which is what I think the comment author asked about) you know how well stored procedures and triggers are used. He asked if MySQL 4 (over something like PostGRES) was suitable in a high-trafficked environment. Those features are a serious part of any non-trivial application.
and is it stable enough to run on a high traffic site
I count most of what I wrote as part of being 'stable' on a high traffic site.
Requiring your DB to be offline while you dump is not acceptable for a high traffic site.
Fail-safe replication is required for high stability on a high-trafficked site.
Backups which are flat-file based and require indexes to be created are not acceptable for a high-traffic site.
Having a poorly performing system and you not being able to diagnose why because the MySQL server provides inadequate tools is not acceptable.
I suppose it comes down to how you define 'stable'. Stable as in 'not crashing' may be true (although it still can and does crash). Stable as in 'highly-available' is not true.
You misread (or I mis-wrote) the multi-index statement. Multiple indexes on a SINGLE TABLE (which is clear from my SQL) is NOT implemented in MySQL. Multiple indexes on a single query works just fine if you have more than one table, but there still is the limit of one index per table. Either that, or it is so buggy it does not work when it is supposed to.
I don't use MySQL in production but I work with many people who do, and I also do significant consulting to clients who run MySQL. Perhaps you should know what *you're* talking about before posting -- I run several highly available, high trafficked sites (30+ million hits a week). But I guess you don't have the guts to stand behind your comments with your real moniker.
Let me repeat the same things I say every time:
... is great and all, but the resulting table is a PAIN to read. What I want to know is simple:
... DESC can now use keys."
Why MySQL is Not Suitable for Enterprise or High-Volume Use
or
MySQL.com misleads you about it's capabilities
Replication in MySQL is a joke for 'mission critical' use. As I understand it, the binary log records SQL modification statements which are executed on a master, not the data which was changed. This is involves significant assumptions beforehand, such that the master and slave(s) must be 100% identical. If I perform an UPDATE on the master, the changes are not replicated, but the query. This is what I would call the 'easy way out'. Who knows what happens to the query once it is replicated out - what if it hangs halfway through? I can't roll back and be in a consistent state, I have half-completed changes which makes my database inconsistent and now I'm forced to dump-and-load. Keep track of which rows are modified, to what from what, and ensure that those transactions are replicated to my slaves. Anything less is simply useless for high availability.
I would also be willing to bet that a significant number of installations that have transitioned to MySQL replication are doing so due to table-lock induced latency. A suitable system with a capable RDBMS could probably handle all of the load given to it and not need 'many slaves' to handle the extra traffic. They would have a single failover for high availability and that's it.
Filesystem buffered writes. Transactional support is great - it allows me to roll-back aborted transactions. However, due to the inability to control whether or not my tables are write-buffered means that MySQL may *think* it has performed a write even though it is still in the write-cache. I can then turn off the system and voila -- corruption! Part of the fault lies in the OS who tells MySQL it was written even though it is in the cache, but I have a simple solution. Devise a way to selectively turn off buffered writes for certain tables / databases. This way if I know I have a critical table which has a lot of writes I can turn buffering off and be ASSURED that writes will be performed when asked. I suspect a lot of 1040 and other table corruptions are caused by something like this. Yes, performance will take a hit but I think it is a very acceptable trade-off for data corruption. Obviously all system tables should NOT be buffered.
Inability to use more than one index on a table in a query -- most enterprise RDBMS' can use more than one index on a table for a query. This can easily save a table scan or the use of a single, less-efficient index. Given an example query - 'SELECT bob FROM sometable WHERE somecol = 45 and somecol2
Clustered indexes. These basically physically sort the table based on particular columns. This allows you to ORDER BY username ASC without using anything special since the rows are already sorted on username (if you have a users table and cluster the username col). This also greatly speeds up BETWEEN clauses. And yes, to people who know a little bit of SQL but don't know as much about clustered indexes -- you can create an index with a bobcol ASC but clustering the actual data is faster and more efficient if you are grabbing data which is not on the index. For example, SELECT * FROM table ORDER BY username ASC will not be as efficient as the same query clustered on the username. If you had a sorted index on username it will probably read the index sequentially and then visit the table. That extra operation = more disk seeks = more time / cpu to execute (and it really adds up as the table size increases). However, if you are doing something like 'select username, password from user order by username' it would be better to create a sorted index on username ASC, password. That way it will read the index only and not visit the table at all.
On-line backups. In today's internet world your site has to be 24/7. This means you cannot have significant performance problems (or even offline-ing your dB!) when you make a dump -- Sybase, etc. have done this from as far back as I can remember. Postgres can do this with an add-on which is well worth the money. As far as I know MySQL can only do this with InnoDB tables and is a for-pay feature (since it has a MVC log to use in the meantime).
Backups to something other than CSV files. MS SQL, Sybase, Oracle, they all dump to a compressed binary file. Saves a TON of space and is MUCH FASTER to dump and load. I can dump a 12GB Sybase DB in under 20 minutes. Loading it all (from scratch) and then bringing the DB online is about the same amount of time. MySQL stupidly logs the CREATE TABLE / INSERT statements. What does this mean? That I have to wait for 4 million INSERTs to be performed when loading my table, and FURTHER I have to wait for the INDEXES to be re-created on the new data. Dump the indexes, too! (Remember that full-text indexing is just another index, so if you use that and have to load from a dump be in store for SIGNIFICANT downtime).
Ability to specify the number of files to dump to. What happens if you have a dump which is larger than 2GB? Some linux distros cannot handle a single file of 2GB or more without recompiling the kernel. Give users a way to, within the dump statement, split the dump over two files. Not only will that help avoid the 2GB limit, but it can speed up dump/loads since I can dump to a bunch of different disks to improve throughput. Sybase has the 'STRIPE ON' clause (originally to dump to two tape drives at once but works fine on filesystem files as well) to split the dump equally over an unlimited number of files. This also impacts the fact that MySQL tables and indexes are stored in filesystem files that are also subject to a 2GB limit.
Cleaner way to view query plans of statements. EXPLAIN
Query is using XYZ, ABC tables. Table XYZ is using index 123 which is sorted so I do not need to create a temp table to sort ASC.
Since you have all the columns in your select statement in the index I do not have to visit the actual table - I can pull it all from the index. Because of this, I will read the index from start to finish.
ABC is using index 23dsf which is not sorted so I must create a temp table to sort that. Also, since it is a join, I do not need to perform an index scan but a positioned search (table scan is to a WHERE clause with no index AS index scan is to an index which is not selective enough or needs to read all columns.)
Simple, easy and pretty much even a NOVICE can see that their query is a good performer or a bad performer.
Along with more in-depth EXPLAIN, also provide me with a way to see what the optimizer is doing with the query. In MS SQL and Sybase you have 'trace flags' which you can turn on before your query to see EXACTLY what Sybase is doing - why does it think this index is better than this other one, why is it table scanning when you think it should index sort, etc. Give me an easy way to say 'verbose on; explain xxx;'.
Ability to delve deeply into performance of the system. If there is one job a DBA must know it's how to tell what the heck is going on when something is slow. Currently MySQL gives you meaningless info like 'slow queries'. Great, I see 200,000 of them. What queries are they? What good is it in a large application which may contain 3000 lines of SQL to tell me the raw number of queries which are slow? I want to know the EXACT SQL of the query(s) which are slow and I want to find the one taking up the most CPU time and blocking all the rest. I want to know how MySQL is managing it's data cache so I can see if I need more ram (e.g. it is swapping lots of data to/from the cache) or if I am I/O bound. Don't tell me to look at 'free' or 'top' - half the time it is wrong because you (MySQL) tell it misleading figures. I want *you* to tell me exactly what you are doing since you would know best! If you've ever seen a sp_sysmon output from Sybase ASE you'd know what I'm talking about.
MySQL's query optimizer is PISS POOR. If I see another changelog entry like this I'm going to scream:
Optimized queries of type: SELECT DISTINCT * from table_name ORDER by key_part1 LIMIT #
So does that mean these queries were NOT AT ALL optimized before? It doesn't read 'FURTHER optimized'.
"ORDER BY
Does that mean it was table scanning each time? Jebus! Hands down the query optimizer is one of the most important things in the database -- knowing how to use the database statistics and knowing when to use a merge-join vs. a hash-join etc. are CRITIAL to database performance.
Of course, the usuals: integrated row (or in the least page) locking, full support of subqueries, stored procedures, views, triggers, referential integrity, transactions, etc. etc. etc.'
PostGRES and virtually 100% of 'for pay' RDBMs have this. There simply is no reason to use MySQL for anything sufficiently non-trivial.
P.S. I'm not the coordinator or anything, I certainly didn't make the decision to use Virgin. I just thought I'd post the location so that people who didn't register could find their way.
Parking was a bitch, to say the least. I saw you guys walking to B&N (or whatever the bookstore next to Virgin is) but by the time my friend and I parked and got there no one was in B&N (at least no one from the group that I could see). I think we caught sight of the group one more time in front of Brio but couldn't meet up. :(
We drove around for a while but couldn't find you guys again. My suggestion would be for the meet spot to be the FINAL spot, so that if people show up late or what not there's no worries.
I agree Virgin won't be a lot of fun... But I voted for Virgin simply because there are a lot of other options (food, beer, gaming, etc.) in the same 'block' which will make choosing something else quite easy.
To those geeks in Columbus who don't want to give your email out (can't you register a hotmail addy for this? Sheesh!):
Virgin Megastore
3965 Townfair Way
Columbus, OH
Thursday, July 25 @ 7:00PM EST
That's the Virgin Megastore in Easton Town center. Easton is huge and leet, so if you don't want to hang out in Virgin (ha!) Megastore there's Adobe Gillas, Fados, etc -- lots of pubs and such all around there.
29 people signed up for Columbus, OH -- 6 confirmed! :D Not sure what we're going to do in Virgin Megastore (browse the DVD section?) but there's lots to do in Easton.
As the article poster touched on, this won't do anything if you're concerned with RDBMS integrity (and have a site which requires write access to your RDBMS).
For static content, it sounds like a cool idea, even if they get root all they can do is view things and not touch. Of course, if that compromised boxen is attached to an internal network to your RDBMS, then they can go to hax0ring the heck out of your DB, they just have to use whatever tools you have installed on the web server.
Batman: I finally found the people that killed my parents! Prepare to die! :holds up BatGun: :looks at own tights: Ok, well at least I make this look good! :wipes tear away: Prepare to die! :turns on heat vision: :heat vision deflects and blows up a nearby building: :lunges at the Caped Crusader: :whips out glowing green kryptonite rock: :grabs Bruce's parents killers: Now I have YOUR weakness, the vengeance for your parent's killers! :laughs heartily:
...Lather, rinse, repeat...
:breathless and weary from a long fight: How did... how did Lex Luthor and The Joker team up to defeat us both? :also beat-up: I don't know, but I suspect it is the scene for Batman V Superman II: "Two super-heads are better than one!"
Superman, swooping in: I'm sorry Bruc-- I mean 'Bat-Man'. I can't let you kill them, that would be wrong.
Batman: Oh shut up you tight-wearing pansy!
Superman: After all we've been through together!
Batman: Ah-hah! 'BatAntiSupermanHeatShield' activate!
Superman: You bastard! I'll crush you with my pinky!
Batman: Not so fast, Superdork!
Superman: Noo, my one weakness! My Achilles rock, as it were!
2 hours later
Superman:
Batman:
fin.