> It's kind of like Catholics during the Inquisition.
I think you mean "like the christians during the inquisition".
The inquisition was far from catholic-only. On the catholic side you had the initial enablement via the papal bull of 1484 and it was executed primarily via their secular lawyer and monk (first dominican then jesuit) proxies.
But as far as the protestants go:
- luther said that witches should be burnt
- luther believed in incubus, sucubus, witches flying at night, etc, etc, etc
- calvin said the bible teaches us that that there are witches and that they must be slain
- calvin said that God expressly commands that all witches and enchantresses shall be put to death
- lutheran preachers brought the witch hunt to denmark, germany, sweden, etc
- calvinist missionaries brought it to transylvania, scottland, england, etc
- the catholic-protestant religious wars vastly increased the witch-burnings
So, while the catholics started the ball rolling, the protestants were equally guilty of keeping it going.
> What you do is make sure that your: > a) database(s) are set to utf8 by default > b) table(s) are set to utf8 by default > c) column(s) are set to utf8 by default > d) connection defaults to utf8
Please tell me that if I specify a database is UTF8 that I don't also have to tell it that each column is utf8 as well!
And why should I have to tell the connections? Doesn't that get resolved automatically when the client connects to the server?
And why should I have to tell a utility what the target database codepage is? Can't they talk? Why the opportunity to manually do something trivial and screw it up?
And...'by default'? Doesn't mysql know how to automatically perform codepage conversions?
And, please tell me that MySQL wouldn't let somebody insert utf8, say asian characters, into a latin1 database!
> That way, it'll "Just Work"(tm) ugh, i don't think that expression means what you think it means
does mysql have a list of prioritized missing basic features and functionality so that you get can a sense of when this kind of stupidity will be resolved?
> You are the second one I see make fun of this, and I fail to see why this is funny. If your database is used primarily > for looking up stuff and seldom has inserts/updates, it makes perfect sense to dump your tables and reloading them once in > a while in order to speed up sequential reads.
No it doesn't - that's what online reorgs are for. Why should I take a database offline merely to resolve fragmentation? Does mysql need vacuum?
And sure, I could double my number of servers and use replication to get a failover. But why should I have to buy another large server, possibly license mysql once again, deal with a more complex environment when every other database product out there has embraced a far simpler solution?
> a simplistic statement like "never use MySQL for a project involving more than 10000 rows per table" is obviously false.
Sure, mysql can provide adequate performance for many apps with more rows than that - but only if you've got highly selective queries that the btree indexes can support.
> Incidentally, MySQL has partitioning now and can parallelize queries across partitions.
Not in a general availability release - you're dealing with non-production code if you use mysql partitioning. And you're living with about a hundred limitations.
> I would like to know what slashdotters think is a good level of stuff to be putting into stored procedures.
Here's an example:
On a current project I've got a separate department of users who have written a reporting app in actuate against a db2 reporting database (a datamart). Along the way we've found that this team is not highly skilled in sql and often had to rewrite some of the sql to fit it into their tool. Note that this can be relatively complex sql - involving temp tables, etc - in order to show trends. The results have been queries that returned incorrect data, that scanned 4 *trillion* rows over 6 hours rather than 1 million in 5 seconds, etc, etc, etc.
The solution that we settled upon was to encapsulate all of their sql within stored procedures. These procedures then: 1. validated all arguments - to ensure that they didn't mix them up and ask for the wrong data 2. logged each call along with argument values, rows returned, and time to return 3. returned the result set along with some useful metadata 4. processed everything in a highly consistent way
Now, I don't typically use stored procedures heavily - and often prefer to encapsulate the physical data model in views to save time. But in this case the availability of this option was really a life-saver. And note that these stored procedures are also allowing us to more easily change the underlaying data model, measure and tune each query, maintain the queries, etc, etc.
> I'm currently looking at a table on a production MySQL db with 176 million rows in it. It has no problems at all, > other than being a huge amount of data. Your ideas about MySQL's limitations are not based on fact.
Without partitioning you face the choice of selecting data only via a btree index (only typically works if you need to select less than 3% of the data) or scanning it all. With a more typical alternative (certainly db2, oracle, etc) you can partition the data. This can result in 10:1 differences in performance between mysql & db2/oracle for typical reporting queries.
Without query parallelism you're single-threading all of your queries, and unable to take advance of those extra processors. Since oracle/db2 get near-linear performance benefits from parallelism, you're again suffering a 4:1 performance penalty on a four-way smp.
Without a robust optimizer you will choke on complex queries against even moderate data - so a query that joins a dozen tables together will inevitately go into the ditch with poorly chosen nested-loop joins. Performance penalty? could easily be 100:1 in some situations.
Of course, this doesn't mean that you can't keep 200 million rows in msyql - you certainly can. Just don't plan to get the same kind of performance out of a lot of typically complex queries against it - that you would get from oracle/db2/informix or even sql server. Not unless you spend 20x as much on the hardware anyway.
> Your ideas about MySQL's limitations are not based on fact. Unfortunately, your ideas about mysql's capabilities are based upon insufficient experience
> Put it to you this way: If the features of MySQL are "good enough" for the application you want -- which arguably goes for > most of the Web apps out there -- why would you not choose MySQL? There are times when swimming upstream is a noble > effort, but generally all it gets you is tired.
Great point. Aside from asking what exactly "good enough" means - and pointing out how odd it is that some would insist on picking a tool because it's "good enough" while others insist on picking "the best tool for the job", a few possible reasons:
1. maybe you don't want to waste time testing for exceptions that should be reported in a more robust fashion 2. maybe you need a large reporting database - and don't want to waste $100k+ on extra hardware to make up for mysql's lack of partitioning, parallelism, automatic summarization and mature optimization. 3. maybe you need multiple databases, and one of them is a reporting database - so decide to go for a consistent other option to save on labor (which is more expensive that licensing costs these days) 4. maybe you want free online backups 5. maybe you want to avoid licensing costs 6. maybe you want to avoid having to talk to a lawyer to deal with mysql's obfuscated license 7. maybe you need better optimization for complex queries 8. maybe you want to ensure that clients can't override your data quality constraints 9. maybe you find that there are many great programmers who would prefer not to work with mysql 10. etc,etc,etc
So, quite a few reasons why a person might think that mysql has a way to go before being good enough for their project.
> How on earth, in year 2007, is a book the quickest reference to anything. MySQL provides both instant manual > lookup online, and indexed searchable CHM help manual reference.
1. it is small enough to be always handy (keep it in your laptop bag, next to your puter, etc)
2. since all it has are concise explanations - you'll spend less time wallowing in stuff you don't care about (when you want is something concise)
3. it is formatted specifically for quick syntax & example checks, rather than for general purpose documentation
4. there are benefits to books - you can easily add comments to them, you more (sometimes) more easily scan for information, etc, etc.
I generally avoid mysql, but if I had to use it a lot I might consider this book. Or especially if I had to hit the road supporting some product that used four different databases, mysql just being one. In that case one of these for each database might be useful. But this is the older version (2003), and the author said that you'll want to wait for the new one due this summer to get all the standard stuff that was missing in 2003 (view, stored procedures, triggers, ansi-compatible sql, whatever).
You do realize that the paper that really blew this is The Detroit News - a highly conservative paper. A paper that seems to criticize every single expenditure on education excepting only vouchers for religious schools.
So, it's not unlikely that there's another side to this story.
And $38m is a lot, but the benefits of the plan were conveniently left out of the editorials - and it's just possible that some of that cost will be offset by savings.
> If we have a genetic disposition to need God, why is atheism more common among the young people that I have known and still know?
When you're young you're usually too distracted by pursuing a mate, sex, adventure, drugs, whatever to really spend a lot of time pursuing philosophy, spirituality and mysticism.
But when you're older, the distractions behind to slip away, you start to fear death, and fall prey to urban legends that you need a church to teach your children moral values.
I tried Pandora for a week but grew bored with it. It seemed fine for exploring a type of music that you aren't very familiar with, but didn't seem to be very good at presenting music that I was likely to enjoy that was further outside the narrow scope of what I was listening to.
But perhaps this is just due to how their feedback mechanism works: you get three options:
- no feedback
- "i like it"
- "i don't like it" So, what then what happens when you like a couple of qualities about a given song, but dislike a even more. You're listening to accoustic blues, then hear an electric rendition of a Mississippi John Hurt tune. You like the tune, but you don't like the artists phrasing, voice or tone of his instrument. So, if you disapprove will you see fewer Mississippi John Hurt tunes? That certainly isn't what you want.
What if you're open to electric renditions of these old accoustic tunes, but just don't like that one in particular due to some other aspect? How long before you get enough chances to educate Pandora that you're really ok with electric as well? Just not that particular artist? Or absolutely not if it includes a harmonica?
Anhow, it wasn't clear what the impacts of these choices were and there was no way to be more specific in feedback. So, I kept ending up with very narrowly-scoped "stations" that would just play a single genre of music. Which became boring and hardly provided any broadening of my horizons.
Maybe I should have followed a different strategy in my feedback, but the documention on Pandora doesn't make that clear at all. In the meanwhile, I've got a local public radio station that constantly surprises me with new music that I never would have discovered through Pandora. So, that's what I spend my time listening to.
ok, so the test examined *correlation* between gaming skills and: "Laparoscopy and related surgeries involve manipulating instruments through a small incision or body opening where the surgeon's movements are guided by watching a television screen."
What's not clear: 1. if there is causation (games improved some skills) - do each activity just attract slightly obsessive individuals that happen to do best at each profession? 2. if just as much benefit could be gained from a few hours a week of using a mouse - sounds like the major skill here is
just being able to relate activity on a monitor to activity with your hand. 3. how much of a surgeon's professional skills this represents - that is, what about diagnostics, knowledge of techniques,
communication, etc? I'm guessing that playing WoW doesn't contribute much to a surgeon's knowledge of anatomy...
The GA (general availability) or production release does not support partitioning. Once they put it into production and gets serious reviews we'll find out whether or not they've got a worthwhile solution. Until then it's not applicable.
> Whether you think the optimizer is any good is up to you, basically, but it, too has improved lately.
Yes, it has improved - it used to completely tank on joins of 4+ tables just a couple of years ago. However, I've seen no indications that it is yet in the league of any other options.
> Parallelism on the other hand i did not understand... what do you mean by that? The ability to split a single query that you submit into components, run them separately under the covers, join the data together and present you with the results. Sometimes it may just be setting up multiple agents to handle reading the data - so that pulling data off disk happens in parrall, with the rest of the operations happending serially. With db2, for example, we typically see linear performance improvements by allowing parallelism up to 4 or 8 way on 4 or 8 way servers with good IO subsystems. In other words a table scan (ie, no use of an index) of 100 million rows on a 4-way server that takes 60 seconds can be shortened to 15 seconds with parallelism.
> Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access > to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS > server running a a "seti at home" type network.
Or like teradata in around, what? 1992? Informix around 1994? db2 around 1995? Oracle isn't there yet since their grid solution is more about failover than partitioning.
This is now lower-end functionality in the high-end database market. The typical database only runs best on an SMP if you mean postgesql or mysql when you say typical. The large commercial databases can easily split your data across 2,4 or 500 servers for handling 1 second queries that require complex queries across billions of rows of data.
> Seriously though, I look after a system that has to run on oracle, postgresql (very tuned) and mysql. The reporting > nature results over 10,000 grouping queries each time it's run. mysql takes less than a minute using InnoDB tables. > The others crawl home closer to 20.
The problem is that mysql doesn't have partitioning, parallelism or a good optimizer. So even if in your specific application you're lucky enough to have a situation in which you've got a ton of both highly selective and very simple queries - that's not the normal reporting or business intelligence scenario. This is making the huge assumption that you've even tuned oracle correctly.
A more typical scenario is one in which you've got to read 5-10% of the total data volume (and your b-tree indexes don't work so you need partitioning), the queries *aren't* highly repetitive and are complex. In this scenario mysql sucks, and needs a few years to catch up. Meanwhile, expect query times 10-40x the length of oracle or db2.
> The fact that people ranging from kids to Slashdotters to more or less typical adults have enjoyed the books seems > like an excellent tribute to their quality.
Nah, more like she tapped into some interests, got some early promotion and then rode that momentum for the real bucks in the later books. The fact that adults enjoy the books is largely driven by the lack of literary skills of most adults.
Consider for a moment:
1. extremely primitive writing style
- excessive use of adverbs (everything is "said angrily" or "said hostly", or whatever)
- exhausting back & forth dialogs with excessive labeling "he said, she said, he said, she said"
- etc
2. inconsistencies/plot holes/unbelievability
- so rather than just mail a portalkey to harry potter, voltmort devises a complex plan in which he has to plant a fake in the school
who then has to somehow get Harry to win a tournament and touch the key at the end.
- does it appear that the school doesn't teach any kind of arts? and the kids are somehow completely unaware of music?
- since duals between wizards are just contests of speed then magical skill is almost useless - all you need is one spell and speed.
- most of the tension of the book would be easily eliminated if just one of the three kids would tell an adult about the problem
- etc
3. predictability
- every book starts with a blow-up at home
- then there's something odd going on in the begining of the year
- then it is ignored and the kids focus on some kind of sports thing in the middle of the year
- then it blows up into a climax at the end of the year
- etc
I enjoy reading children and young adult literature to my boys. But this is really some of the worst stuff. After the forth book I refused to read any more of it aloud - it's just too obnoxious. And if anyone really believes it's the best, then they should probably read a book every now and then.
> I read these books aloud to my wife... I guess I should clear my calendar on August evenings!
You can cut the time in half by just omitting the adverbs:
"Malfoy really makes me angry", Harry said [angrily].
"Will you stop complaining and just focus on your homework?", Hermione said [testily].
"Why don't you understand that there's more to life than homework?", Ron said [hotly].
etc, etc, etc.
Ok, so maybe the adverbs are just 5-10% of the sentence - but the preceding "harry said", "ron said", etc - can also usually be removed - it's generally obvious to anyone paying attention who said each line in a dialog between two people. So, I'd remove those two words as well.
And this is how I read it to my boys. Of course, eventually I started additional words and entire sentences - and they never realized. Anything to make these poorly written books go more quickly. Unfortunately, there's no way to simply omit plot holes.
Ah, I don't think Oracle wants to provide MySQL support for any of the following reasons:
- because there is a lot of money to be made
- because they think it is a better product than oracle
- because they want to "endorse" mysql
No, why would Larry Ellison want to provide support for mysql?
Really, just one reason: to further injure MySQL AB.
Look, he already bought InnoDB and Sleepycat out from under MySQL which robs them of important persistence layers (oh sure, but MySQL AB can spend a couple of years and millions of dollars to develop their own backends just proves the point). And now he's going straight for their income stream.
Oracle clearly sees MySQL as a revenue threat that they want to eliminate. This doesn't mean it is a great product - just a revenue threat. And this strategy doesn't apply to postgresql (and EnterpriseDB, etc) - but that's ok for now. Postgresql is probably 2+ years away from being the revenue threat that MySQL AB is today.
Re:anyone know of a good "schema cookbook"
on
SQL Hacks
·
· Score: 0
I think I like David C. Hays "Data Modeling Pattern" a little more.
Not only is it an apples and oranges comparison between an old mainframe and a new custom-built grid, but the software was completely different.
According to this the original software was probably poorly designed: > the mainframe took on the persona of a lumbering behemoth. This was especially the case when the IT staff had to accommodate new > business requirements such as a car dealership adding a new type of vehicle to its inventory. Each update required a > major rework of the program Hmmm, massive software reworks to handle new vehicles types? Yikes.
And who knows what the author of the article meant by this: > Isiminger said his IT staff moved from the mainframe to a grid using high-performance code developed in Assembler and PL/1, > business logic in COBOL, proprietary mainframe development tools, flat file processing, VSAM and some IMS. Most of the above actually sounds like what would be running on the mainframe.
In any event, it sounds like the your typical scenario - the data requirements grew wildly beyond what was originally envisioned and designed for. The new solution (both hardware and software) were designed for a far larger amount of data.
The more interesting question now is how well would a new mainframe compare?
- initial cost: mainframes can be hand for as little as $100k, and may cost less than 50 dual-cpu dell servers
- labor cost: one mainframe is most likely cheaper to support than 50 servers - especially when you consider requirement for absolute consistency between the 50
- adaptability: the grid has some benefits here with easy additions and subtractions from the gird - but grid performance relies upon problems that can be easily partitioned. The mainframe has extremely mature virtualization - easily allowing new lpars (separate hosts) to be created and managed without so much of the overhead of pc based virtualization products.
> "once a week scanning contract" - do they make core architectural changes that often?
Think of it this way - do they tell you when they make changes to their systems? Answer: of course not.
So, either you scan monthly or quarterly - and leave vulnerabilities undetected, unreported and wide open to exploit for weeks or months. Or you scan much more frequently, and catch it when it happens.
Just need a nice way to identify deltas so that they don't constantly have to wade through false positives.
> I'm guessing the administration overhead of oracle or db2 would make it not worth while. Due to the fact he's asking slashdot about > his problem, the expensive db's are probably out of reach for him. This is especially true because he says "cost matters a lot."
Note that administration of oracle or db2 may or may not be greater than administration of mysql - if you find yourself having to manage a half-dozen mysql databases all replicating large amounts of data and struggling to support slow & complex queries.
And the licensing cost is not necessarily greater either - if you can go with a smaller server (with db2, I think it's free for less than 4 gbytes of memory), and it's especially cheaper when you consider the hardware savings over mysql.
> I made the assumption that the reporting application already exists and runs against the current schema. If it doesn't, > ETL is an option with MySQL just as much as any other database. Again, I doubt it will be a problem if cost matters > -- they probably have very simple reporting needs.
Note that transactional models (using normalized models) are much more expensive to report against than analytical models (using star-schemas): they cost far more to write the reports against, support only limited reporting functionality, and take more hardware to get the equivilent speed. It's only worth going this route if you plan to deliver almost no reporting.
> 60k queries/day is less than 1 per second. Even 6-7 times that is still only 6-7 a second. MySQL 5.1 has partitioning. > MySQL was originally designed for data warehousing applications (that is what the isam table structure was all about) > and there are many documented > 100GB databases in existence.
A few thoughts here: 1. a single query that requires a tablescan of 100 million rows on mysql can take 60 seconds, 6000 seconds if the optimizer gets confused (see references regarding impacts of simplistic optimizer2). 2. 60,000 queries probably don't come at one a second, they probably come in an uneven workload with large bursts. So, perhaps you're using 100 queries a second on monday morning, etc. 3. MySQL's partitioning in 5.1 doesn't matter now - since only 5.0.27 is GA. No idea when 5.1 goes GA - or more importantly, when their partitioning is mature enough to want to depend upon in a serious production environment. 4. The fact that mysql claims to originally have been oriented towards data warehousing and yet failed to include any basic data warehousing features is completely consistent with their attempts to market themselves more recently as an enterprise database without basic transactions. Yes, I know they have had transactions and RI for a few years now - but prior to that time they were telling people that nobody really needed it. 5. A 100GB MySQL database is nothing to brag about. I've seen plenty of gigantic databases - that due to misdesign or use of the wrong product could support almost no queries, had very poor query speed, very poor load speed, or whatever. Keeping 100 gbytes of data in a single-threaded, non-partitioning database is hardly an attractive solution - when so much better alternatives exist.
Sorry - but relational databases are complex beasts - and too complex for such simplifications. MySQL can be fast - if you're on limited hardware with simple and highly-selective repetitive read-only queries hitting its MyIsam storage engine.
But:
- if you're doing reporting and querying more than 2-4% of the rows in the database then it wonly use a btrieve-index, and so will do table scans. This will suck, and be far slower than using a database like db2, oracle, informix or sql server.
- if you're using complex queries (5+ tables) - then its simple optimizer is likely to take the wrong turn and use a nested loop join when it should use a merge join, etc. Then your peformance will suck and you'll end up continually rewriting your query until you get some 400-line monstrousity that barely performs.
- if you've got a 2+ way SMP - then MySQL's singled-threaded design will fail to split the processing for a single query across multiple processors. That'll suck compared to db2, oracle, sql server, etc, etc, etc.
- if you're not running the same content-query repetitively - then its content cache won't work for you and you'll be hitting the database all the time. This is of course what most databases do, and what it sounds like this app in particular will do.
- if you're not using MyISAM - lets say because you want support for transactions, referential integrity, etc. Then in that case you don't get the speed benefit of MyISAM at all, and are operating at roughtly the same speed as Postgresql for all operations. Except when queries get complex or the load gets high - then you're dragging.
- if you're doing quite a few concurrent writes - in which case MyISAM's table locking will kill your performance and even InnoDB's performance sucks if the load gets high.
Bottom line: MySQL's speed reputation has been discovered for the most part to not apply to most real-world scenarios. Unless you're running a content-management site (like slashdot) then you won't get it. Even then there are quite a few functions that are so slow that you will feel the pain (remember 4 hours to reindex slashdot's 20 million rows?).
> MySQL can definitely do whatever you want it to. Why switch?
because it is obviously a poor contender?
> If mysql is working for you now, you should look at mysql scaling options. For example, if you are worried about reporting queries, > replicate the database to a second machine for running the reports against. Mysql replication works great for this sort of application.
Ah, that doesn't address partitioning, parallelism, query optimization, automatic summarization pr query rewriting, does it?
Note that these are standard reporting database features - and are responsible for easily providing reporting queries 40-400x performance improvements when moving from mysql to db2 or oracle.
So, replicating the exact same transactional model to another server - which also doesn't provide partitioning or parallelism is an extremely sad solution. Ideally, you move the data through a lightweight ETL solution to remodel it and then move it into a database that provides all these features. Then you'll not only have performance and scalability - you'll also have additional *standard* reporting functionality (time series queries, point in time reporting, etc).
> Also, MySQL Cluster was designed by/for the telecomm industry -- the original commissioners were performaning analysis on > call records or something of the such.
Sure, if you want to limit the amount of data you use to what will fit in memory. Other problems are likely (like compatibility with mysql's standard product since they just purchased cluster a few years ago) - but since so few people who know what they're doing are using mysql's cluster product this hasn't yet been well documented.
> It is not playing in the US media because no law was broken when those attorneys were fired.
h tml?ex=1331956800&en=dfab854c91a51b4b&ei=5088&part ner=rssnyt&emc=rsst -greg-palast-reports-on-the-firing-of-new-mexico-a ttorney-david-iglesias/l l-of-pete-domenici_b_43006.html0 03699882_webmckayforum09m.html?syndication=rss/
1. pete domenici (r-nm) tried to force attorney generals to indict democrats for voter-fraud
2. alberto gonzales (ag) almost certainly lied under oath
http://www.nytimes.com/2007/03/19/opinion/19mon4.
http://www.gregpalast.com/investigative-journalis
http://www.tpmmuckraker.com/archives/002677.php
http://www.huffingtonpost.com/mark-kleiman/the-fa
http://seattletimes.nwsource.com/html/localnews/2
> It's kind of like Catholics during the Inquisition.
I think you mean "like the christians during the inquisition".
The inquisition was far from catholic-only. On the catholic side you had the initial enablement via the papal bull of 1484 and it was executed primarily via their secular lawyer and monk (first dominican then jesuit) proxies.
But as far as the protestants go:
- luther said that witches should be burnt
- luther believed in incubus, sucubus, witches flying at night, etc, etc, etc
- calvin said the bible teaches us that that there are witches and that they must be slain
- calvin said that God expressly commands that all witches and enchantresses shall be put to death
- lutheran preachers brought the witch hunt to denmark, germany, sweden, etc
- calvinist missionaries brought it to transylvania, scottland, england, etc
- the catholic-protestant religious wars vastly increased the witch-burnings
So, while the catholics started the ball rolling, the protestants were equally guilty of keeping it going.
> What you do is make sure that your:
> a) database(s) are set to utf8 by default
> b) table(s) are set to utf8 by default
> c) column(s) are set to utf8 by default
> d) connection defaults to utf8
Please tell me that if I specify a database is UTF8 that I don't also have to tell it that each column is utf8 as well!
And why should I have to tell the connections? Doesn't that get resolved automatically when the client connects to the server?
And why should I have to tell a utility what the target database codepage is? Can't they talk? Why the opportunity to manually do something trivial and screw it up?
And...'by default'? Doesn't mysql know how to automatically perform codepage conversions?
And, please tell me that MySQL wouldn't let somebody insert utf8, say asian characters, into a latin1 database!
> That way, it'll "Just Work"(tm)
ugh, i don't think that expression means what you think it means
does mysql have a list of prioritized missing basic features and functionality so that you get can a sense of when this kind of stupidity will be resolved?
> You are the second one I see make fun of this, and I fail to see why this is funny. If your database is used primarily
> for looking up stuff and seldom has inserts/updates, it makes perfect sense to dump your tables and reloading them once in
> a while in order to speed up sequential reads.
No it doesn't - that's what online reorgs are for. Why should I take a database offline merely to resolve fragmentation? Does mysql need vacuum?
And sure, I could double my number of servers and use replication to get a failover. But why should I have to buy another large server, possibly license mysql once again, deal with a more complex environment when every other database product out there has embraced a far simpler solution?
> a simplistic statement like "never use MySQL for a project involving more than 10000 rows per table" is obviously false.
Sure, mysql can provide adequate performance for many apps with more rows than that - but only if you've got highly selective queries that the btree indexes can support.
> Incidentally, MySQL has partitioning now and can parallelize queries across partitions.
Not in a general availability release - you're dealing with non-production code if you use mysql partitioning. And you're living with about a hundred limitations.
> I would like to know what slashdotters think is a good level of stuff to be putting into stored procedures.
Here's an example:
On a current project I've got a separate department of users who have written a reporting app in actuate against a db2 reporting database (a datamart). Along the way we've found that this team is not highly skilled in sql and often had to rewrite some of the sql to fit it into their tool. Note that this can be relatively complex sql - involving temp tables, etc - in order to show trends. The results have been queries that returned incorrect data, that scanned 4 *trillion* rows over 6 hours rather than 1 million in 5 seconds, etc, etc, etc.
The solution that we settled upon was to encapsulate all of their sql within stored procedures. These procedures then:
1. validated all arguments - to ensure that they didn't mix them up and ask for the wrong data
2. logged each call along with argument values, rows returned, and time to return
3. returned the result set along with some useful metadata
4. processed everything in a highly consistent way
Now, I don't typically use stored procedures heavily - and often prefer to encapsulate the physical data model in views to save time. But in this case the availability of this option was really a life-saver. And note that these stored procedures are also allowing us to more easily change the underlaying data model, measure and tune each query, maintain the queries, etc, etc.
All good stuff, though your mileage may vary.
> I'm currently looking at a table on a production MySQL db with 176 million rows in it. It has no problems at all,
> other than being a huge amount of data. Your ideas about MySQL's limitations are not based on fact.
Without partitioning you face the choice of selecting data only via a btree index (only typically works if you need to select less than 3% of the data) or scanning it all. With a more typical alternative (certainly db2, oracle, etc) you can partition the data. This can result in 10:1 differences in performance between mysql & db2/oracle for typical reporting queries.
Without query parallelism you're single-threading all of your queries, and unable to take advance of those extra processors. Since oracle/db2 get near-linear performance benefits from parallelism, you're again suffering a 4:1 performance penalty on a four-way smp.
Without a robust optimizer you will choke on complex queries against even moderate data - so a query that joins a dozen tables together will inevitately go into the ditch with poorly chosen nested-loop joins. Performance penalty? could easily be 100:1 in some situations.
Of course, this doesn't mean that you can't keep 200 million rows in msyql - you certainly can. Just don't plan to get the same kind of performance out of a lot of typically complex queries against it - that you would get from oracle/db2/informix or even sql server. Not unless you spend 20x as much on the hardware anyway.
> Your ideas about MySQL's limitations are not based on fact.
Unfortunately, your ideas about mysql's capabilities are based upon insufficient experience
> Put it to you this way: If the features of MySQL are "good enough" for the application you want -- which arguably goes for
> most of the Web apps out there -- why would you not choose MySQL? There are times when swimming upstream is a noble
> effort, but generally all it gets you is tired.
Great point. Aside from asking what exactly "good enough" means - and pointing out how odd it is that some would insist on picking a tool because it's "good enough" while others insist on picking "the best tool for the job", a few possible reasons:
1. maybe you don't want to waste time testing for exceptions that should be reported in a more robust fashion
2. maybe you need a large reporting database - and don't want to waste $100k+ on extra hardware to make up for mysql's lack of partitioning, parallelism, automatic summarization and mature optimization.
3. maybe you need multiple databases, and one of them is a reporting database - so decide to go for a consistent other option to save on labor (which is more expensive that licensing costs these days)
4. maybe you want free online backups
5. maybe you want to avoid licensing costs
6. maybe you want to avoid having to talk to a lawyer to deal with mysql's obfuscated license
7. maybe you need better optimization for complex queries
8. maybe you want to ensure that clients can't override your data quality constraints
9. maybe you find that there are many great programmers who would prefer not to work with mysql
10. etc,etc,etc
So, quite a few reasons why a person might think that mysql has a way to go before being good enough for their project.
> How on earth, in year 2007, is a book the quickest reference to anything. MySQL provides both instant manual
> lookup online, and indexed searchable CHM help manual reference.
1. it is small enough to be always handy (keep it in your laptop bag, next to your puter, etc)
2. since all it has are concise explanations - you'll spend less time wallowing in stuff you don't care about (when you want is something concise)
3. it is formatted specifically for quick syntax & example checks, rather than for general purpose documentation
4. there are benefits to books - you can easily add comments to them, you more (sometimes) more easily scan for information, etc, etc.
I generally avoid mysql, but if I had to use it a lot I might consider this book. Or especially if I had to hit the road supporting some product that used four different databases, mysql just being one. In that case one of these for each database might be useful. But this is the older version (2003), and the author said that you'll want to wait for the new one due this summer to get all the standard stuff that was missing in 2003 (view, stored procedures, triggers, ansi-compatible sql, whatever).
You do realize that the paper that really blew this is The Detroit News - a highly conservative paper. A paper that seems to criticize every single expenditure on education excepting only vouchers for religious schools.
/ profiles/umich.html
e &site=achapin-ipod
So, it's not unlikely that there's another side to this story.
teachers podcasting lectures:
- http://www.apple.com/education/solutions/itunes_u
learn chinese in michigan schools - while the schools save money on teachers:
- http://education.zdnet.com/?p=934
info about ipods for educational institutions:
- http://campustechnology.com/articles/40744/
- https://segue.middlebury.edu/index.php?action=sit
And $38m is a lot, but the benefits of the plan were conveniently left out of the editorials - and it's just possible that some of that cost will be offset by savings.
> If we have a genetic disposition to need God, why is atheism more common among the young people that I have known and still know?
When you're young you're usually too distracted by pursuing a mate, sex, adventure, drugs, whatever to really spend a lot of time pursuing philosophy, spirituality and mysticism.
But when you're older, the distractions behind to slip away, you start to fear death, and fall prey to urban legends that you need a church to teach your children moral values.
I tried Pandora for a week but grew bored with it. It seemed fine for exploring a type of music that you aren't very familiar with, but didn't seem to be very good at presenting music that I was likely to enjoy that was further outside the narrow scope of what I was listening to.
But perhaps this is just due to how their feedback mechanism works: you get three options:
- no feedback
- "i like it"
- "i don't like it"
So, what then what happens when you like a couple of qualities about a given song, but dislike a even more. You're listening to accoustic blues, then hear an electric rendition of a Mississippi John Hurt tune. You like the tune, but you don't like the artists phrasing, voice or tone of his instrument. So, if you disapprove will you see fewer Mississippi John Hurt tunes? That certainly isn't what you want.
What if you're open to electric renditions of these old accoustic tunes, but just don't like that one in particular due to some other aspect? How long before you get enough chances to educate Pandora that you're really ok with electric as well? Just not that particular artist? Or absolutely not if it includes a harmonica?
Anhow, it wasn't clear what the impacts of these choices were and there was no way to be more specific in feedback. So, I kept ending up with very narrowly-scoped "stations" that would just play a single genre of music. Which became boring and hardly provided any broadening of my horizons.
Maybe I should have followed a different strategy in my feedback, but the documention on Pandora doesn't make that clear at all. In the meanwhile, I've got a local public radio station that constantly surprises me with new music that I never would have discovered through Pandora. So, that's what I spend my time listening to.
ok, so the test examined *correlation* between gaming skills and:
"Laparoscopy and related surgeries involve manipulating instruments through a small incision or body opening where the surgeon's movements are guided by watching a television screen."
What's not clear:
1. if there is causation (games improved some skills) - do each activity just attract slightly obsessive individuals that happen to do best at each profession?
2. if just as much benefit could be gained from a few hours a week of using a mouse - sounds like the major skill here is
just being able to relate activity on a monitor to activity with your hand.
3. how much of a surgeon's professional skills this represents - that is, what about diagnostics, knowledge of techniques,
communication, etc? I'm guessing that playing WoW doesn't contribute much to a surgeon's knowledge of anatomy...
> Nowadays MySQL even does partitioning.
The GA (general availability) or production release does not support partitioning. Once they put it into production and gets serious reviews we'll find out whether or not they've got a worthwhile solution. Until then it's not applicable.
> Whether you think the optimizer is any good is up to you, basically, but it, too has improved lately.
Yes, it has improved - it used to completely tank on joins of 4+ tables just a couple of years ago. However, I've seen no indications that it is yet in the league of any other options.
> Parallelism on the other hand i did not understand... what do you mean by that?
The ability to split a single query that you submit into components, run them separately under the covers, join the data together and present you with the results. Sometimes it may just be setting up multiple agents to handle reading the data - so that pulling data off disk happens in parrall, with the rest of the operations happending serially. With db2, for example, we typically see linear performance improvements by allowing parallelism up to 4 or 8 way on 4 or 8 way servers with good IO subsystems. In other words a table scan (ie, no use of an index) of 100 million rows on a 4-way server that takes 60 seconds can be shortened to 15 seconds with parallelism.
> Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access
> to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS
> server running a a "seti at home" type network.
Or like teradata in around, what? 1992? Informix around 1994? db2 around 1995? Oracle isn't there yet since their grid solution is more about failover than partitioning.
This is now lower-end functionality in the high-end database market. The typical database only runs best on an SMP if you mean postgesql or mysql when you say typical. The large commercial databases can easily split your data across 2,4 or 500 servers for handling 1 second queries that require complex queries across billions of rows of data.
> Seriously though, I look after a system that has to run on oracle, postgresql (very tuned) and mysql. The reporting
> nature results over 10,000 grouping queries each time it's run. mysql takes less than a minute using InnoDB tables.
> The others crawl home closer to 20.
The problem is that mysql doesn't have partitioning, parallelism or a good optimizer. So even if in your specific application you're lucky enough to have a situation in which you've got a ton of both highly selective and very simple queries - that's not the normal reporting or business intelligence scenario. This is making the huge assumption that you've even tuned oracle correctly.
A more typical scenario is one in which you've got to read 5-10% of the total data volume (and your b-tree indexes don't work so you need partitioning), the queries *aren't* highly repetitive and are complex. In this scenario mysql sucks, and needs a few years to catch up. Meanwhile, expect query times 10-40x the length of oracle or db2.
> The fact that people ranging from kids to Slashdotters to more or less typical adults have enjoyed the books seems
> like an excellent tribute to their quality.
Nah, more like she tapped into some interests, got some early promotion and then rode that momentum for the real bucks in the later books. The fact that adults enjoy the books is largely driven by the lack of literary skills of most adults.
Consider for a moment:
1. extremely primitive writing style
- excessive use of adverbs (everything is "said angrily" or "said hostly", or whatever)
- exhausting back & forth dialogs with excessive labeling "he said, she said, he said, she said"
- etc
2. inconsistencies/plot holes/unbelievability
- so rather than just mail a portalkey to harry potter, voltmort devises a complex plan in which he has to plant a fake in the school
who then has to somehow get Harry to win a tournament and touch the key at the end.
- does it appear that the school doesn't teach any kind of arts? and the kids are somehow completely unaware of music?
- since duals between wizards are just contests of speed then magical skill is almost useless - all you need is one spell and speed.
- most of the tension of the book would be easily eliminated if just one of the three kids would tell an adult about the problem
- etc
3. predictability
- every book starts with a blow-up at home
- then there's something odd going on in the begining of the year
- then it is ignored and the kids focus on some kind of sports thing in the middle of the year
- then it blows up into a climax at the end of the year
- etc
I enjoy reading children and young adult literature to my boys. But this is really some of the worst stuff. After the forth book I refused to read any more of it aloud - it's just too obnoxious. And if anyone really believes it's the best, then they should probably read a book every now and then.
> I read these books aloud to my wife... I guess I should clear my calendar on August evenings!
You can cut the time in half by just omitting the adverbs:
"Malfoy really makes me angry", Harry said [angrily].
"Will you stop complaining and just focus on your homework?", Hermione said [testily].
"Why don't you understand that there's more to life than homework?", Ron said [hotly].
etc, etc, etc.
Ok, so maybe the adverbs are just 5-10% of the sentence - but the preceding "harry said", "ron said", etc - can also usually be removed - it's generally obvious to anyone paying attention who said each line in a dialog between two people. So, I'd remove those two words as well.
And this is how I read it to my boys. Of course, eventually I started additional words and entire sentences - and they never realized. Anything to make these poorly written books go more quickly. Unfortunately, there's no way to simply omit plot holes.
Ah, I don't think Oracle wants to provide MySQL support for any of the following reasons:
- because there is a lot of money to be made
- because they think it is a better product than oracle
- because they want to "endorse" mysql
No, why would Larry Ellison want to provide support for mysql?
Really, just one reason: to further injure MySQL AB.
Look, he already bought InnoDB and Sleepycat out from under MySQL which robs them of important persistence layers (oh sure, but MySQL AB can spend a couple of years and millions of dollars to develop their own backends just proves the point). And now he's going straight for their income stream.
Oracle clearly sees MySQL as a revenue threat that they want to eliminate. This doesn't mean it is a great product - just a revenue threat. And this strategy doesn't apply to postgresql (and EnterpriseDB, etc) - but that's ok for now. Postgresql is probably 2+ years away from being the revenue threat that MySQL AB is today.
I think I like David C. Hays "Data Modeling Pattern" a little more.
Not only is it an apples and oranges comparison between an old mainframe and a new custom-built grid, but the software was completely different.
According to this the original software was probably poorly designed:
> the mainframe took on the persona of a lumbering behemoth. This was especially the case when the IT staff had to accommodate new
> business requirements such as a car dealership adding a new type of vehicle to its inventory. Each update required a
> major rework of the program
Hmmm, massive software reworks to handle new vehicles types? Yikes.
And who knows what the author of the article meant by this:
> Isiminger said his IT staff moved from the mainframe to a grid using high-performance code developed in Assembler and PL/1,
> business logic in COBOL, proprietary mainframe development tools, flat file processing, VSAM and some IMS.
Most of the above actually sounds like what would be running on the mainframe.
In any event, it sounds like the your typical scenario - the data requirements grew wildly beyond what was originally envisioned and designed for. The new solution (both hardware and software) were designed for a far larger amount of data.
The more interesting question now is how well would a new mainframe compare?
- initial cost: mainframes can be hand for as little as $100k, and may cost less than 50 dual-cpu dell servers
- labor cost: one mainframe is most likely cheaper to support than 50 servers - especially when you consider requirement for absolute consistency between the 50
- adaptability: the grid has some benefits here with easy additions and subtractions from the gird - but grid performance relies upon problems that can be easily partitioned. The mainframe has extremely mature virtualization - easily allowing new lpars (separate hosts) to be created and managed without so much of the overhead of pc based virtualization products.
> "once a week scanning contract" - do they make core architectural changes that often?
Think of it this way - do they tell you when they make changes to their systems? Answer: of course not.
So, either you scan monthly or quarterly - and leave vulnerabilities undetected, unreported and wide open to exploit for weeks or months. Or you scan much more frequently, and catch it when it happens.
Just need a nice way to identify deltas so that they don't constantly have to wade through false positives.
> I'm guessing the administration overhead of oracle or db2 would make it not worth while. Due to the fact he's asking slashdot about
> his problem, the expensive db's are probably out of reach for him. This is especially true because he says "cost matters a lot."
Note that administration of oracle or db2 may or may not be greater than administration of mysql - if you find yourself having to manage a half-dozen mysql databases all replicating large amounts of data and struggling to support slow & complex queries.
And the licensing cost is not necessarily greater either - if you can go with a smaller server (with db2, I think it's free for less than 4 gbytes of memory), and it's especially cheaper when you consider the hardware savings over mysql.
> I made the assumption that the reporting application already exists and runs against the current schema. If it doesn't,
> ETL is an option with MySQL just as much as any other database. Again, I doubt it will be a problem if cost matters
> -- they probably have very simple reporting needs.
Note that transactional models (using normalized models) are much more expensive to report against than analytical models (using star-schemas): they cost far more to write the reports against, support only limited reporting functionality, and take more hardware to get the equivilent speed. It's only worth going this route if you plan to deliver almost no reporting.
> 60k queries/day is less than 1 per second. Even 6-7 times that is still only 6-7 a second. MySQL 5.1 has partitioning.
> MySQL was originally designed for data warehousing applications (that is what the isam table structure was all about)
> and there are many documented > 100GB databases in existence.
A few thoughts here:
1. a single query that requires a tablescan of 100 million rows on mysql can take 60 seconds, 6000 seconds if the optimizer gets confused (see references regarding impacts of simplistic optimizer2).
2. 60,000 queries probably don't come at one a second, they probably come in an uneven workload with large bursts. So, perhaps you're using 100 queries a second on monday morning, etc.
3. MySQL's partitioning in 5.1 doesn't matter now - since only 5.0.27 is GA. No idea when 5.1 goes GA - or more importantly, when their partitioning is mature enough to want to depend upon in a serious production environment.
4. The fact that mysql claims to originally have been oriented towards data warehousing and yet failed to include any basic data warehousing features is completely consistent with their attempts to market themselves more recently as an enterprise database without basic transactions. Yes, I know they have had transactions and RI for a few years now - but prior to that time they were telling people that nobody really needed it.
5. A 100GB MySQL database is nothing to brag about. I've seen plenty of gigantic databases - that due to misdesign or use of the wrong product could support almost no queries, had very poor query speed, very poor load speed, or whatever. Keeping 100 gbytes of data in a single-threaded, non-partitioning database is hardly an attractive solution - when so much better alternatives exist.
> MySQL is fast.
Sorry - but relational databases are complex beasts - and too complex for such simplifications. MySQL can be fast - if you're on limited hardware with simple and highly-selective repetitive read-only queries hitting its MyIsam storage engine.
But:
- if you're doing reporting and querying more than 2-4% of the rows in the database then it wonly use a btrieve-index, and so will do table scans. This will suck, and be far slower than using a database like db2, oracle, informix or sql server.
- if you're using complex queries (5+ tables) - then its simple optimizer is likely to take the wrong turn and use a nested loop join when it should use a merge join, etc. Then your peformance will suck and you'll end up continually rewriting your query until you get some 400-line monstrousity that barely performs.
- if you've got a 2+ way SMP - then MySQL's singled-threaded design will fail to split the processing for a single query across multiple processors. That'll suck compared to db2, oracle, sql server, etc, etc, etc.
- if you're not running the same content-query repetitively - then its content cache won't work for you and you'll be hitting the database all the time. This is of course what most databases do, and what it sounds like this app in particular will do.
- if you're not using MyISAM - lets say because you want support for transactions, referential integrity, etc. Then in that case you don't get the speed benefit of MyISAM at all, and are operating at roughtly the same speed as Postgresql for all operations. Except when queries get complex or the load gets high - then you're dragging.
- if you're doing quite a few concurrent writes - in which case MyISAM's table locking will kill your performance and even InnoDB's performance sucks if the load gets high.
Bottom line: MySQL's speed reputation has been discovered for the most part to not apply to most real-world scenarios. Unless you're running a content-management site (like slashdot) then you won't get it. Even then there are quite a few functions that are so slow that you will feel the pain (remember 4 hours to reindex slashdot's 20 million rows?).
> MySQL can definitely do whatever you want it to. Why switch?
because it is obviously a poor contender?
> If mysql is working for you now, you should look at mysql scaling options. For example, if you are worried about reporting queries,
> replicate the database to a second machine for running the reports against. Mysql replication works great for this sort of application.
Ah, that doesn't address partitioning, parallelism, query optimization, automatic summarization pr query rewriting, does it?
Note that these are standard reporting database features - and are responsible for easily providing reporting queries 40-400x performance improvements when moving from mysql to db2 or oracle.
So, replicating the exact same transactional model to another server - which also doesn't provide partitioning or parallelism is an extremely sad solution. Ideally, you move the data through a lightweight ETL solution to remodel it and then move it into a database that provides all these features. Then you'll not only have performance and scalability - you'll also have additional *standard* reporting functionality (time series queries, point in time reporting, etc).
> Also, MySQL Cluster was designed by/for the telecomm industry -- the original commissioners were performaning analysis on
> call records or something of the such.
Sure, if you want to limit the amount of data you use to what will fit in memory. Other problems are likely (like compatibility with mysql's standard product since they just purchased cluster a few years ago) - but since so few people who know what they're doing are using mysql's cluster product this hasn't yet been well documented.