> So, given your rankings above, PostgreSQL, tends to find a middle ground between Oracle's performance/scalability ranking. > In other words, PostgreSQL tends to scale less than Oracle yet tends to perform better.
While I agree with your definitions of scaleability & performance, what would lead you to the above conclusion?
Both postgresql and mysql lack query parallelism, fine memory tuning, any kind of partitioning, etc. Without these features postgresql is *at best* only faster than oracle on index-oriented extremely transactional applications. And I'd say it was pretty safe bet that Oracle would stomp postgresql in that space as well on reasonable hardware (ie, not desktops, etc).
Note, of the two databases I would still rather work with postgresql - it's a fine product and allows me to avoid dealing with oracle personnel. It's just not as fast.
> MySQL, on the other hand, performs fairly fast for read-only databases but scales very poorly.
Scaleability should also consider economics, and mysql can cost. It all depends on how you use it. At what? $500/year/database it can easily be more expensive than a small oracle/db2/etc database (which after the first year will drop to 18% or so maintenance). Given that cost postgresql could pull ahead - even assuming equal performance.
And compared to db2 or oracle, then for the classic read-only applications (reporting) mysql will fall *far* behind. Again, the lack of parallelism & partitioning can easily lead to a 40x difference in performance between db2 & mysql. And while hardware costs are dropping, they aren't dropping so far that you want to spend 40x as much on a server in order to support mysql.
> I did hear a rumor around the water cooler that it costed $20k / Year / Processor.
No, it could cost $32k/cpu list plus something like 15%/year for maintenance. Of course, that's if you want a beowulf-cluster like solution.
You could just run the workstation edition on a single 4-way and still support vast amounts of data. At between $1500 & $7500 / CPU list that can easily handle a data warehouse with billions of rows of data - using MDC, summary tables, etc, etc. Note that db2 (and Oracle) can be often be cheaper than MySQL on smaller systems.
> I once heard an urban legend that MySQL is faster than DB2 with queries on tables that are less than 250k rows.
Could be - mysql could be faster than db2 on extremely small hardware. Then again, db2 has better memory tuning - so you could probably tune db2 smaller. Not positive of that. And mysql won't be faster on writes and certainly not on large selects on a SMP.
> Installing DB2 (across multiple machines for paralell processing) was a little bit like having my teeth pulled
Yep, until a few years ago all the databases were pretty nasty to install. SQL Server was the first one that was really easy - around 2000. It took Oracle and IBM years to clean up their installs. They're pretty easy now though, wizard-driven if you want it, other wise just a few simple commands. EEE is still more than that - since you're installing it on sometimes dozens of separate machines, but it's also easier.
> heh, rather than burning all your mod points on my posts, why dont you just foe me? it'd probably be far more satisfying.
You don't appear irrational, so I assume that you provide value in non-mysql contexts. And who knows? Maybe you'll open up and take an objective view at mysql. Assuming that you don't work for them.
> and appear much less silly than just modding down every post i make. modding down posts which have nothing to do with > you makes you appear comically spiteful...
Hmmm, I almost never mod down. Do you think that there's a conspiracy to mod you down? If so, then what? Not sure what to tell you. I suppose it happens occasionally. I hope most people have too many other more interesting things to do! What makes you think that I'm modding you down?
> No, you really don't want to like mysql. Why would you want your favorite object of derision to go away? > Besides, it's a moving target anyway. Even if all your objections were solved, you'd just come up with new ones.
Is this how you justify ignoring current problems with the product? By tar & feathering database experts that point to issues?
Personally, I can't wait for mysql to address these issues - there's a considerable list of other products, technologies and methodologies in which the ratio of hype to reality is nearly as bad.
You've over-extended your analogy: nobody considers bitkeeper a threat to linux licensing or sco a threat to the linux future. This is in direct contrast to mysql licensing & innodb concerns.
Why the effort to whitewash the mysql issues and ignore obvious challenges they face?
Even if you love the company & product you've got to face the innodb issue.
Even if you don't care about data quality problems you've got to want it to catch up to competitors here.
Even if you don't pay for licensing you've got to be aware of the mixed-message they're sending on this subject.
I hope some day to really like MySQL AB and their product. But it won't happen until they catch up with the rest of the industry. Or clean up the product and shrink its mission down to something along the lines of sqlite.
No, not every disagreement about methodologies, technologies or products can be easily dismissed as preferences of an extremist:
- a ford taurus really is better than a yugo
- cars really are safer today than they were forty years ago
- mp3s really don't sound as good as cds
- mysql ab really has tried to convince users that transactions, views, subselects, etc aren't useful
- mysql really has unacceptable data quality problems due to silent errors
- mysql ab has really been deliberately deceptive about its licensing
- mysql's future really is in question due to oracle's purchase of Innodb
None of the above are worth arguing about. The reason discussion needs to be about what mysql can do to mitigate or correct the above and stay relevant.
> I know it is fashionable for "real" computer scientists and DBAs to sneer at MySQL. But that's actually a sign of insecurity. Real > mechanics don't sneer at zinc plated steel bolts because 316 is available: they just don't use zinc plate under salt spray conditions.
No, but when the vendor of these zinc plated bolts insist that:
- 99% of aquatic applications don't require stainless
- iso standards for bolts aren't really that important
- zinc is simply "good enough"
- zinc is stronger than stainless
- zinc bolts are free
Then the vendor loses credibility in the eyes of professionals who've worked with a half-dozen different products. Experience with products that:
- are more standards compliant
- provide *far* better data quality safeguards
- are freer (postgresql, etc)
- and don't push "worst practices" propaganda
hmmm, in thinking about that last point and ms sql/oracle, let me revise this one: aren't guilty of pushing bizarre and shocking "worst practices" propaganda. While these vendors might exaggerate the ease of use, performance, etc of their products - they never tried to tell people in 2002-2003 that they really didn't need transactions!
MySQL is definitely improving. But they started out far behind the competition, and even now rely on an Oracle product (Innodb) for their most important functionality. If we didn't have better alternatives like Postgresql, Firebird, etc then their fate might be more important.
> A lot of people have religious problems with the whitespace significance in python
I'd advise to give it a try. I understand that there are a few inconveniences. But after a few years of writing python code I find that these are very minor issues. On the plus side:
1. there is more runnable code / line of text: Just like an ACM article from many years ago suggested, I find that my comprehension of code is aided if I can see entire code blocks, functions, classes, methods, etc all at once. Python is more vertically compressed in comparison to perl, java, etc - and so easier to see more code at once.
2. no misleading indentation. The code runs the way it looks at first glance. Again, just another little feature that helps you in quickly understanding the code.
I often have to look back at code I haven't seen for 6+ months. I settled on python because it was the easiest for me to pick up again later without having to invest more time in relearning the code.
> Those who are incompetent, in their vast majority, do not back up OR pay attention to exploit bulletins. > These will either get fired, killed or their companies will fail. Either way, they do not need to be worried > because they are hopeless.
That *might* make sense if only 1-5% of the dbas out their were incompetent. Unfortunately, that number is probably more like 50-60%. And those databases that do fortunately have a competent dba often don't have competent management - that would pay to test out the backup strategy. So your recovery is going to depend on an *untested* backup process.
> They do not need to be worried because they can easily recover.
Right, so long as:
- their experienced dba is available, not the junior guy - remember, the company may have hundreds of databases to fix
- so long as they're media turns out to be valid
- so long as the attack didn't take a while to be noticed, and contaminate all backups
- so long as the application can roll back its other persisted data stored within XML, etc to the
same point in time as the database (don't count on it)
- so long as the backups actually backed *everything* in the database up - remember, this recovery was probably never tested!
Sure, but look at all the non-skilled labor now doing this work. Whether due to outsourcing, attempts to save labor money, etc - I see a ton of very junior people doing installs. The major databases now have install wizards that are pretty good - and allow complete amateurs to install the product. This is great. But these folks have no idea how to lock down the database once they've installed it.
And think beyond oracle to mysql - where almost no dbas install the product, but rather junior programmers. I'm sure that if you looked you could find some very scary mysql installs!
> If you were a Sr. DBA or DB architect being paid triple figures. Deleting those accounts would be done > with [i]your[/i] schema setup/install scripts. It would be second nature to secure the DB at the > install (as taught in many Oracle adv. DBA classes).
Sure, but many databases are installed by extremely junior personnel. Sometimes somewhat-embedded within an application, at other times stand-alone.
I've found as a sr. dba that setting up a dozen secure databases to be a challenge without tools to help automate some of the inspections.
This attack relies on default userids & passwords, not on any vulnerability. Oracle used to use default scott/tiger userid/passwd. I think it still does in 10g, but I'm not positive.
Given enough databases, someone will forget to change these. For that reason any shop with more than a half-dozen databases should be using some kind of application policy-checker that will automatically test for this kind of a policy violation.
> If you use your full name for an email address...You're a fool.
Keep in mind that if you used that email address to register for a service that required your full name they've probably got both anyway. Or used your email address in a purchase with a visa card. Or etc.
That's what makes data integration efforts so dangerous - data mining on its own isn't really that useful. Data mining on pre-integrated and cleansed data *is*.
I don't think that it's necessarily that difficult to build this, just need the right people, and they are as you mentioned, difficult to find.
But it'll be the politics that will kill the project. Horrific waterfall processes requiring Big Upfront Designs - that are notorious death-traps for data warehousing, reporting, searching, and any implementation of fragile technology like federated databases.
SAIC tried to pin the blame for the Virtual CaseFile project on the FBI's changing requirements. But changing requirements always happen, the real culprit was antiquated methodologies used by both the FBI and SAIC.
no:
* Google is (primarily) a search engine for unstructured data (documents, web pages, etc)
* Data warehousing is a method of consolidating & distributing primarily structured data
* Federated searching of databases is a method of spreading a search across multiple databases
So, data warehousing would be used to consolidate explicit data from multiple sources like:
- financial, credit, and purchasing info
- legal history
- travel info
- demographics & psychographics
- personal relationship data (friends, family members, friends of friends, etc)
- organizational memberships Once together within a data warehouse you can easily ship that integrated set of data to data marts for further analysis:
- trend analysis
- searching (not freeform like google with tons of false positives, but contextual)
- scoring - for degree of match between people & organizations and established profiles
This is considerably more powerful than google for this application, though a google-like solution would also be useful for all the unstructured documents. The warehouse could even incorporate links to documents with the integrated personal info.
Federated searching is a completely separate solution that overlaps warehousing: in which you use modern capabilities of db2 or oracle (but db2 especially) to create a virtual database that maps to possibly hundreds of databases behind it. One query will be sent to all that it applies to (the optimiser is smart enough to know which to send it to usually). The reason for federation is that it is supposedly easier to implement than warehousing (don't have to move data into another platform). The downsides though include:
- performance will generally stink
- aggregate operations like scoring or trending can't be done on large sets of data
- it's fragile, and prone to break easily But perhaps as an initial deliverable it could allow you to provide narrowly scoped searches across a variety of databases very quickly.
I'm not surprised that they're planning to do this. Ok, well a little surprised that they didn't do it at least four years ago - the data warehousing at least is a set of very mature methodologies & technologies. But they'll probably blow it - remember the $170m fiasco with SAIC over their "Virtual Case File" project? http://it.slashdot.org/article.pl?sid=05/01/13/145 5234&tid=185&tid=103&tid=218
Note that the Virtual Case File project also included a data warehouse. That wsa probably flushed though, so no code reuse I suspect.
> Do you have any evidence to support your statement that AIX is somehow better than linux for databases or critical apps?
Sure, it's the reference platform for db2, has a great logical volume manager, and has far fewer patches than linux. Note that if I was running a lot of critical oracle I'd be leaning towards Solaris for similar reasons. Note also that the difference in patch frequency can result in a cost difference of a few thousand dollars a year in admin labor.
> Sure, you can use AIX for those things - but linux is certainly a viable choice. With 30 million customers and 24x7 > operation, amazon.com seems to indicate that linux does a fine job powering web services and back end databases.
You're confusing issues. Of course you can support mission critical applications on linux. But that doesn't mean it is the easiest and currently best way to go for everyone.
> AIX somehow "better"? Sure it's mature and stable, but it is showing its age, and it is a bit eccentric.
It's always been a bit eccentric. As linux gets more popular AIX will have to change to keep up - you can already feel minor differences with simple utilities falling behind - like cron (doesn't support time equations, etc). But since gnu utilities are now provided on aix it is helping. And I can generally live with these issues in return for the mature and stable parts. Especially for a database server.
> And what about the fact that linux is dominating the supercomputing space? can anyone really imagine that we are still talking about "low end x86"?
Linux is now dominating the supercomputing space when it comes to MPP architectures - machines that can be built from a highspeed internal network along with thousands of share-nothing nodes. First off, this architecture has existed almost unchanged since 1994 or so. Secondly, this only really helps satisfy some supercomputing needs - where you can divide the problem nicely into separate pieces. Third, it's a simple matter of economics. Note that this architecture was initially developed for super-computing, but then sold most of its units for large-scale databases. Where it can make more sense to use a two-way P5 than a two-way opteron.
But today when I need to support a financial database on a pair of redundant servers, I'm going to heavily lean towards aix or solaris (assuming db2 or oracle). What works in an MPP environment doesn't apply here. This will probably change in five years, but for now it holds.
> I think some dinosaurs see their way of life threatened by linux, so the dismissal of linux' capabilities is a > perfecly understandable psychological defense mechanism. I trust that time will teach these folks a lesson or two.
I think that it's unfortunate that some linux advocates are unable to accept the idea that any OS can be better at anything than Linux. Even admitting that linux will probably surpass unix in five years isn't enough - they demand that you admit that it is in all ways better than unix right now.
> Solaris 10 kicks AIX butt in features and performance.
Do you know of any public benchmarks that would show this? I know that AIX 5.3 on Power5 is almost 50% faster than AIX 5.2...
And more to the point - IBM is much healthier than Sun, and much more likely to be around in 10 years.
> AIX is not free, Solaris 10 and Linux are free to users (maintenance is extra, unless you chose to get it > via the LUGs or other sources)
That's not really accurate. You *can* get a free linux distribution, but quite often you'll find that applications require RedHat or Suse. And RHEL4 is more expensive than AIX.
> Linux is a very good choice for a Small/Medium Business environment and even for some desktops.
Sure, I'd actually recommend both. Want to run websevers or print servers? Linux works fine. Want to run critical app or database servers? AIX is a better fit. Obviously linux will be better on desktops or laptops.
> Right now Linux does not scale really well once you get past about 16 CPUs.
That seems to be a diminishing issue: the big driver of massive SMPs was Oracle, and they've headed towards a more distributed architecture. It certainly doesn't matter at all for db2 or informix - where they have for years supported more of a beowulf-like clustering approach.
> It will be interesting to watch them duke it out - done right, both may benefit. If IBM really wants to compete, > they need to start selling low cost workstations to encourage further development on AIX.
There's not much of a price difference between these and intel boxes. And they can run either AIX or Linux. In fact, with the P510 (around $5k) you should be able to run multiple lpars, including linux lpars. Pretty cool stuff.
> Having worked with both, I must say that Linux is way better, in all respects.
What are you comparing? AIX 4.3 to RHEL 3?
AIX is now on v5.2 & v5.3.
I find that I keep my db2 servers on aix even though db2 also supports linux. The better logical volume management, firmware reporting, etc makes it a slightly better platform. I haven't benchmarked the two but would expect AIX to be the fastest. And perhaps most importantly - the fewer patches means that it costs far less to support.
Why did you find linux better? Even the gnu tools are now commonly installed on aix.
> Sure, Bush has shown he's capable of seriously overstepping his bounds, raping the Constitution, > and being a very naughty boy who needs to be punished....but the Democrats hardly had a Utopia going.
What? Almost all of your points are crazy-rush-limbaugh talking points:
- economy? it was great under clinton
- nanny state? guess what? FEMA actually worked under Clinton
- foreign adventures? guess what? Clinton was given no flexibility in trying to stop genocide by
the far right (who attacked him relentlessly). And was generally successful.
- wako? so a couple of hundred died accidently - that was a shame. but also a tiny incident.
- perjury? first off, what kind of people would have even asked that question? secondly,
it's a shame that clinton didn't just say "fuck you", third, besides Hillary and some crotch-obsessed
conservatives nobody else really gives a damn.
So, aside from Hillary and the wacko cult everyone else did great. Now compare that to:
- 30k-100k dead in Iraq, unknown numbers of injured
- 2k dead, 20k+ injured americans in Iraq
- a completely ruined reputation with the rest of the world
- *unknown* numbers of people arrested for being terrorists then sent to secret detention centers to be tortured!
- billions of dollars wasted on a war that will undoubtably be lost in the end
- loss of civil rights for americans
- destruction of previously well-run government departments (FEMA, etc)
- insane tax-cutting and defense spending that has resulted in a federal deficit being financed by the chinese
- willfully breaking the congressional legislation that requires congressional approval for wiretaping. Then lying about it. Several times.
The conservatives in the US were very hot to impeach Clinton over that blowjob. Now, what about an impeachment over something that *really* affects our lives?
> At 60,000 per CPU, you could hire some more people to make things run a little smoother.
you'd think so, but it seldom really ends up that way. Say you've got 10 production CPUs - so $600k. Now that's list, so you'll really probably just pay $400k. With $400k you can hire two contractors for one year (assuming $75 / hour plus office space, etc).
Would two people for one year really make mysql run as well as oracle or db2? ah, no. They might make some minor improvements, or might be able to write a lot of code around the database to do what oracle/db2 do out of the box. But then how well will it really work? And what's your labor cost to maintain that code after that first year?
There's some code in the database that really doesn't need to be there. And there's code that does - that you shouldn't try to reinvent. The ability to run queries in parallel, to spread data across a hundred separate servers - and allow them to cooperatively process the data, etc. You don't want to try to rewrite that. Eventually postgresql & mysql will support this natively. Then the economics will completely change.
> Plus, you're only looking at the production stuff. > If you want to do development, you also have to pay for the developer versions, so that people can actually develop. But then when you > put it on the production server, it runs differently, so ideally, you'd want to develop on the exact same software that you run in production, > which is much easier with MySQL or PostgreSQL.
Yep, mysql or postgresql are easier - just because you get to avoid the procurement process. But development environments are usually free these days. It's the large test infrastructure that isn't. But this isn't necessarily in favor of mysql or postgresql either: since they lack parallelism & partitioning, you can easily be forced to buy larger production environments in order to match what you could do with db2/oracle. Now you'll have to buy all that extra hardware a second time if you want identical environments.
SAME (Stripe and Mirror Everything) trades off hardware costs for labor: you end up with twice as much storage as you really need. Further, it works better with expensive heavily-cached storage such as EMC, IBM Shark, etc rather than low-cost direct attached. Sometimes that's a good deal - if you really can't get a good dba, it's probably a safe choice.
On the flip side, it's just plain expensive and you seldom get to really buy as much storage as you'd like. I've often spoke with people who've had to implement SAME on top of insufficient hardware - since they couldn't get approval for all that they needed, or outgrew their initial configuration and were waiting for the budget to buy a lot more.
In my opinion SAME is fine if you have a small & simple little database. But anything worth benchmarking deserves an intelligent compromise. My data warehouse (db2) has two sets of arrays: fast and slow. Each has their own set of dedicated disk adapters.
- the slow set of arrays are each configured with raid5 on top of 10k rpm drives. Data, backups, etc are stored here.
- the fast set of arrays are each configured with raid10 on top of 15k rpm drives. The temp space and log files are stored here.
- eventually, I'll add a second set of fast arrays to hold indexes, small dimension tables, small summary tables, and possibly current-only detail data. This could go into the same pool as the existing fast arrays holding temp & logs, but performance will probably suffer - since those two sets of data are almost guaranteed to be used simultaneously.
Note that the fast drives cost almost exactly *twice* what the slow drives cost. By splitting the data up this way we've saved about $50k. On a single server.
No, i'm not really interested in seeing benchmarks in which people try to save pennies by buying mysql to then blow it on excessive hardware costs.
> you may want to look at the mysql website again, the remaining differences between their ANSI > compliant mode and the standard seem like slim pickings for a critic.
Yeah, they're covering their butts in the documentation much better than the old days in which they blatantly stated that transactions and pk/fk constraints were bad. Now, it's much harder to find the things that they are embarressed about - like their old licensing faq, or compatibility issues. The url you provided only shows those compatibility problems that they have worked out. It does not list the existing ones for which there is no good solution. And I'm not in the mood to go through their documentation to find it all.
Which says only:
"Chapter 15. MySQL Partitioning
This is the beginning of the Partitioning chapter. "
Maybe you're thinking of their separate clustering product? Which being limited to your available memory, isn't really the samething. Partitioning splits your data by either hash key or ranges into multiple physical tables. These tables are kept on either the same server (range partitioning via oracle, mdc via db2) or on multiple servers (hash partitioning via db2, think beowulf). In either case your application is unaware, the database handles the details. You just get speed. MySQL doesn't do this.
> as has memory tuning.
Not really. And keeping in mind that memory tuning is one of the most critical features of a database - one that allows you to eke out the maximum benefit from your expensive hardware...
But this looks more like a haphazzard collection that has grown over time, rather than anything methodical. I couldn't actually find any recommendations for how to tune these in the documentation. Maybe it was hidden somewhere, but searches on 'memory' didn't turn it up. Moreover:
1. it fails to give you the flexibility you need: it appears to still be impossible to set up multiple buffer pools for read caching that are shared between sets of like tables. This is typically the *first* thing you should plan & tune. It allows you to ensure that some large tables get 99.9% of their hits from cache rather than memory - without having to move them into the memory storage engine in mysql - which you may not have sufficient memory to do.
2. it inefficiently uses memory: buffers can't be dedicated to a functional purpose, they often have to be dedicated to a storage engine. This means that if you are using both innodb & myisam you will pay a penalty in your memory usage.
MySQL does offer a lot of knobs to twist - really as many as db2 or oracle. It just doesn't seem to document what they do or how to analyze their result. And they aren't well thought out or complete, they're bizarre. Maybe duct-table and bailing wire to catch problems like:
"Note that if you use any other option to ALTER TABLE than RENAME, MySQL always creates a
temporary table, even if the data wouldn't strictly need to be copied (such as when you change the
name of a column). For MyISAM tables, you can speed up the index re-creation operation (which is the
slowest part of the alteration process) by setting the myisam_sort_buffer_size
system variable to a high value."
Here you can see that mysql has some mysterious problem doing ALTER TABLE - in which i
if you're moving data in & out frequently, then you should look at partitions rather than indexes.
BTree indexes are usually only useful when you're selecting less than 2% of the data. Partitioning works great with 10%, 20% or even 80%. And can often allow you to move partitions in & out of the database within a transaction.
I agree that updating a snowlake is tougher, but it depends on what you want to optimize for. If you want to optimize for fast & easy retrieval for analysis, you really want to use a star schema most of the time.
> So, given your rankings above, PostgreSQL, tends to find a middle ground between Oracle's
performance/scalability ranking.
> In other words, PostgreSQL tends to scale less than Oracle yet tends to perform better.
While I agree with your definitions of scaleability & performance, what would lead you to the above conclusion?
Both postgresql and mysql lack query parallelism, fine memory tuning, any kind of partitioning, etc. Without these features postgresql is *at best* only faster than oracle on index-oriented extremely transactional applications. And I'd say it was pretty safe bet that Oracle would stomp postgresql in that space as well on reasonable hardware (ie, not desktops, etc).
Note, of the two databases I would still rather work with postgresql - it's a fine product and allows me to avoid dealing with oracle personnel. It's just not as fast.
> MySQL, on the other hand, performs fairly fast for read-only databases but scales very poorly.
Scaleability should also consider economics, and mysql can cost. It all depends on how you use it. At what? $500/year/database it can easily be more expensive than a small oracle/db2/etc database (which after the first year will drop to 18% or so maintenance). Given that cost postgresql could pull ahead - even assuming equal performance.
And compared to db2 or oracle, then for the classic read-only applications (reporting) mysql will fall *far* behind. Again, the lack of parallelism & partitioning can easily lead to a 40x difference in performance between db2 & mysql. And while hardware costs are dropping, they aren't dropping so far that you want to spend 40x as much on a server in order to support mysql.
> I did hear a rumor around the water cooler that it costed $20k / Year / Processor.
No, it could cost $32k/cpu list plus something like 15%/year for maintenance. Of course, that's if you want a beowulf-cluster like solution.
You could just run the workstation edition on a single 4-way and still support vast amounts of data. At between $1500 & $7500 / CPU list that can easily handle a data warehouse with billions of rows of data - using MDC, summary tables, etc, etc. Note that db2 (and Oracle) can be often be cheaper than MySQL on smaller systems.
> I once heard an urban legend that MySQL is faster than DB2 with queries on tables that are less than 250k rows.
Could be - mysql could be faster than db2 on extremely small hardware. Then again, db2 has better memory tuning - so you could probably tune db2 smaller. Not positive of that. And mysql won't be faster on writes and certainly not on large selects on a SMP.
> Installing DB2 (across multiple machines for paralell processing) was a little bit like having my teeth pulled
Yep, until a few years ago all the databases were pretty nasty to install. SQL Server was the first one that was really easy - around 2000. It took Oracle and IBM years to clean up their installs. They're pretty easy now though, wizard-driven if you want it, other wise just a few simple commands. EEE is still more than that - since you're installing it on sometimes dozens of separate machines, but it's also easier.
> heh, rather than burning all your mod points on my posts, why dont you just foe me? it'd probably be far more satisfying.
You don't appear irrational, so I assume that you provide value in non-mysql contexts. And who knows? Maybe you'll open up and take an objective view at mysql. Assuming that you don't work for them.
> and appear much less silly than just modding down every post i make. modding down posts which have nothing to do with
> you makes you appear comically spiteful...
Hmmm, I almost never mod down. Do you think that there's a conspiracy to mod you down? If so, then what? Not sure what to tell you. I suppose it happens occasionally. I hope most people have too many other more interesting things to do! What makes you think that I'm modding you down?
> No, you really don't want to like mysql. Why would you want your favorite object of derision to go away?
> Besides, it's a moving target anyway. Even if all your objections were solved, you'd just come up with new ones.
Is this how you justify ignoring current problems with the product? By tar & feathering database experts that point to issues?
Personally, I can't wait for mysql to address these issues - there's a considerable list of other products, technologies and methodologies in which the ratio of hype to reality is nearly as bad.
> no really, freebsd IS better than linux.
You've over-extended your analogy: nobody considers bitkeeper a threat to linux licensing or sco a threat to the linux future. This is in direct contrast to mysql licensing & innodb concerns.
Why the effort to whitewash the mysql issues and ignore obvious challenges they face?
Even if you love the company & product you've got to face the innodb issue.
Even if you don't care about data quality problems you've got to want it to catch up to competitors here.
Even if you don't pay for licensing you've got to be aware of the mixed-message they're sending on this subject.
I hope some day to really like MySQL AB and their product. But it won't happen until they catch up with the rest of the industry. Or clean up the product and shrink its mission down to something along the lines of sqlite.
> and freebsd is better than linux.
No, not every disagreement about methodologies, technologies or products can be easily dismissed as preferences of an extremist:
- a ford taurus really is better than a yugo
- cars really are safer today than they were forty years ago
- mp3s really don't sound as good as cds
- mysql ab really has tried to convince users that transactions, views, subselects, etc aren't useful
- mysql really has unacceptable data quality problems due to silent errors
- mysql ab has really been deliberately deceptive about its licensing
- mysql's future really is in question due to oracle's purchase of Innodb
None of the above are worth arguing about. The reason discussion needs to be about what mysql can do to mitigate or correct the above and stay relevant.
> I know it is fashionable for "real" computer scientists and DBAs to sneer at MySQL. But that's actually a sign of insecurity. Real
> mechanics don't sneer at zinc plated steel bolts because 316 is available: they just don't use zinc plate under salt spray conditions.
No, but when the vendor of these zinc plated bolts insist that:
- 99% of aquatic applications don't require stainless
- iso standards for bolts aren't really that important
- zinc is simply "good enough"
- zinc is stronger than stainless
- zinc bolts are free
Then the vendor loses credibility in the eyes of professionals who've worked with a half-dozen different products. Experience with products that:
- are more standards compliant
- provide *far* better data quality safeguards
- are freer (postgresql, etc)
- and don't push "worst practices" propaganda
hmmm, in thinking about that last point and ms sql/oracle, let me revise this one: aren't guilty of pushing bizarre and shocking "worst practices" propaganda. While these vendors might exaggerate the ease of use, performance, etc of their products - they never tried to tell people in 2002-2003 that they really didn't need transactions!
MySQL is definitely improving. But they started out far behind the competition, and even now rely on an Oracle product (Innodb) for their most important functionality. If we didn't have better alternatives like Postgresql, Firebird, etc then their fate might be more important.
> A lot of people have religious problems with the whitespace significance in python
I'd advise to give it a try. I understand that there are a few inconveniences. But after a few years of writing python code I find that these are very minor issues. On the plus side:
1. there is more runnable code / line of text: Just like an ACM article from many years ago suggested, I find that my comprehension of code is aided if I can see entire code blocks, functions, classes, methods, etc all at once. Python is more vertically compressed in comparison to perl, java, etc - and so easier to see more code at once.
2. no misleading indentation. The code runs the way it looks at first glance. Again, just another little feature that helps you in quickly understanding the code.
I often have to look back at code I haven't seen for 6+ months. I settled on python because it was the easiest for me to pick up again later without having to invest more time in relearning the code.
> Those who are incompetent, in their vast majority, do not back up OR pay attention to exploit bulletins.
> These will either get fired, killed or their companies will fail. Either way, they do not need to be worried
> because they are hopeless.
That *might* make sense if only 1-5% of the dbas out their were incompetent. Unfortunately, that number is probably more like 50-60%. And those databases that do fortunately have a competent dba often don't have competent management - that would pay to test out the backup strategy. So your recovery is going to depend on an *untested* backup process.
> They do not need to be worried because they can easily recover.
Right, so long as:
- their experienced dba is available, not the junior guy - remember, the company may have hundreds of databases to fix
- so long as they're media turns out to be valid
- so long as the attack didn't take a while to be noticed, and contaminate all backups
- so long as the application can roll back its other persisted data stored within XML, etc to the
same point in time as the database (don't count on it)
- so long as the backups actually backed *everything* in the database up - remember, this recovery was probably never tested!
> If you were a serious Oracle DBA you have
Sure, but look at all the non-skilled labor now doing this work. Whether due to outsourcing, attempts to save labor money, etc - I see a ton of very junior people doing installs. The major databases now have install wizards that are pretty good - and allow complete amateurs to install the product. This is great. But these folks have no idea how to lock down the database once they've installed it.
And think beyond oracle to mysql - where almost no dbas install the product, but rather junior programmers. I'm sure that if you looked you could find some very scary mysql installs!
> If you were a Sr. DBA or DB architect being paid triple figures. Deleting those accounts would be done
> with [i]your[/i] schema setup/install scripts. It would be second nature to secure the DB at the
> install (as taught in many Oracle adv. DBA classes).
Sure, but many databases are installed by extremely junior personnel. Sometimes somewhat-embedded within an application, at other times stand-alone.
I've found as a sr. dba that setting up a dozen secure databases to be a challenge without tools to help automate some of the inspections.
This attack relies on default userids & passwords, not on any vulnerability. Oracle used to use default scott/tiger userid/passwd. I think it still does in 10g, but I'm not positive.
Given enough databases, someone will forget to change these. For that reason any shop with more than a half-dozen databases should be using some kind of application policy-checker that will automatically test for this kind of a policy violation.
> If you use your full name for an email address...You're a fool.
Keep in mind that if you used that email address to register for a service that required your full name they've probably got both anyway. Or used your email address in a purchase with a visa card. Or etc.
That's what makes data integration efforts so dangerous - data mining on its own isn't really that useful. Data mining on pre-integrated and cleansed data *is*.
Randygj2,
I don't think that it's necessarily that difficult to build this, just need the right people, and they are as you mentioned, difficult to find.
But it'll be the politics that will kill the project. Horrific waterfall processes requiring Big Upfront Designs - that are notorious death-traps for data warehousing, reporting, searching, and any implementation of fragile technology like federated databases.
SAIC tried to pin the blame for the Virtual CaseFile project on the FBI's changing requirements. But changing requirements always happen, the real culprit was antiquated methodologies used by both the FBI and SAIC.
> You mean google?
5 5234&tid=185&tid=103&tid=218
no:
* Google is (primarily) a search engine for unstructured data (documents, web pages, etc)
* Data warehousing is a method of consolidating & distributing primarily structured data
* Federated searching of databases is a method of spreading a search across multiple databases
So, data warehousing would be used to consolidate explicit data from multiple sources like:
- financial, credit, and purchasing info
- legal history
- travel info
- demographics & psychographics
- personal relationship data (friends, family members, friends of friends, etc)
- organizational memberships
Once together within a data warehouse you can easily ship that integrated set of data to data marts for further analysis:
- trend analysis
- searching (not freeform like google with tons of false positives, but contextual)
- scoring - for degree of match between people & organizations and established profiles
This is considerably more powerful than google for this application, though a google-like solution would also be useful for all the unstructured documents. The warehouse could even incorporate links to documents with the integrated personal info.
Federated searching is a completely separate solution that overlaps warehousing: in which you use modern capabilities of db2 or oracle (but db2 especially) to create a virtual database that maps to possibly hundreds of databases behind it. One query will be sent to all that it applies to (the optimiser is smart enough to know which to send it to usually). The reason for federation is that it is supposedly easier to implement than warehousing (don't have to move data into another platform). The downsides though include:
- performance will generally stink
- aggregate operations like scoring or trending can't be done on large sets of data
- it's fragile, and prone to break easily
But perhaps as an initial deliverable it could allow you to provide narrowly scoped searches across a variety of databases very quickly.
I'm not surprised that they're planning to do this. Ok, well a little surprised that they didn't do it at least four years ago - the data warehousing at least is a set of very mature methodologies & technologies. But they'll probably blow it - remember the $170m fiasco with SAIC over their "Virtual Case File" project?
http://it.slashdot.org/article.pl?sid=05/01/13/14
Note that the Virtual Case File project also included a data warehouse. That wsa probably flushed though, so no code reuse I suspect.
> Do you have any evidence to support your statement that AIX is somehow better than linux for databases or critical apps?
Sure, it's the reference platform for db2, has a great logical volume manager, and has far fewer patches than linux. Note that if I was running a lot of critical oracle I'd be leaning towards Solaris for similar reasons. Note also that the difference in patch frequency can result in a cost difference of a few thousand dollars a year in admin labor.
> Sure, you can use AIX for those things - but linux is certainly a viable choice. With 30 million customers and 24x7
> operation, amazon.com seems to indicate that linux does a fine job powering web services and back end databases.
You're confusing issues. Of course you can support mission critical applications on linux. But that doesn't mean it is the easiest and currently best way to go for everyone.
> AIX somehow "better"? Sure it's mature and stable, but it is showing its age, and it is a bit eccentric.
It's always been a bit eccentric. As linux gets more popular AIX will have to change to keep up - you can already feel minor differences with simple utilities falling behind - like cron (doesn't support time equations, etc). But since gnu utilities are now provided on aix it is helping. And I can generally live with these issues in return for the mature and stable parts. Especially for a database server.
> And what about the fact that linux is dominating the supercomputing space? can anyone really imagine that we are still talking about "low end x86"?
Linux is now dominating the supercomputing space when it comes to MPP architectures - machines that can be built from a highspeed internal network along with thousands of share-nothing nodes. First off, this architecture has existed almost unchanged since 1994 or so. Secondly, this only really helps satisfy some supercomputing needs - where you can divide the problem nicely into separate pieces. Third, it's a simple matter of economics. Note that this architecture was initially developed for super-computing, but then sold most of its units for large-scale databases. Where it can make more sense to use a two-way P5 than a two-way opteron.
But today when I need to support a financial database on a pair of redundant servers, I'm going to heavily lean towards aix or solaris (assuming db2 or oracle). What works in an MPP environment doesn't apply here. This will probably change in five years, but for now it holds.
> I think some dinosaurs see their way of life threatened by linux, so the dismissal of linux' capabilities is a
> perfecly understandable psychological defense mechanism. I trust that time will teach these folks a lesson or two.
I think that it's unfortunate that some linux advocates are unable to accept the idea that any OS can be better at anything than Linux. Even admitting that linux will probably surpass unix in five years isn't enough - they demand that you admit that it is in all ways better than unix right now.
There's really no room for dogma in IT.
> Solaris 10 kicks AIX butt in features and performance.
Do you know of any public benchmarks that would show this? I know that AIX 5.3 on Power5 is almost 50% faster than AIX 5.2...
And more to the point - IBM is much healthier than Sun, and much more likely to be around in 10 years.
> AIX is not free, Solaris 10 and Linux are free to users (maintenance is extra, unless you chose to get it
> via the LUGs or other sources)
That's not really accurate. You *can* get a free linux distribution, but quite often you'll find that applications require RedHat or Suse. And RHEL4 is more expensive than AIX.
> Linux is a very good choice for a Small/Medium Business environment and even for some desktops.
Sure, I'd actually recommend both. Want to run websevers or print servers? Linux works fine. Want to run critical app or database servers? AIX is a better fit. Obviously linux will be better on desktops or laptops.
> Right now Linux does not scale really well once you get past about 16 CPUs.
That seems to be a diminishing issue: the big driver of massive SMPs was Oracle, and they've headed towards a more distributed architecture. It certainly doesn't matter at all for db2 or informix - where they have for years supported more of a beowulf-like clustering approach.
> It will be interesting to watch them duke it out - done right, both may benefit. If IBM really wants to compete,
r /
> they need to start selling low cost workstations to encourage further development on AIX.
They've got them, and as low as $5,575. Though I think they are primarily intended for unix graphics packages like Catia:
http://www-03.ibm.com/servers/intellistation/powe
They've also got very low cost power5 servers that start at $3700:
http://www-03.ibm.com/systems/p/hardware/entry
There's not much of a price difference between these and intel boxes. And they can run either AIX or Linux. In fact, with the P510 (around $5k) you should be able to run multiple lpars, including linux lpars. Pretty cool stuff.
> Better compare that to RHEL 4 instead of 3 while you're at it :-P
dang! i hate mistakes like that.
> Having worked with both, I must say that Linux is way better, in all respects.
What are you comparing? AIX 4.3 to RHEL 3?
AIX is now on v5.2 & v5.3.
I find that I keep my db2 servers on aix even though db2 also supports linux. The better logical volume management, firmware reporting, etc makes it a slightly better platform. I haven't benchmarked the two but would expect AIX to be the fastest. And perhaps most importantly - the fewer patches means that it costs far less to support.
Why did you find linux better? Even the gnu tools are now commonly installed on aix.
ken
> Sure, Bush has shown he's capable of seriously overstepping his bounds, raping the Constitution,
> and being a very naughty boy who needs to be punished....but the Democrats hardly had a Utopia going.
What? Almost all of your points are crazy-rush-limbaugh talking points:
- economy? it was great under clinton
- nanny state? guess what? FEMA actually worked under Clinton
- foreign adventures? guess what? Clinton was given no flexibility in trying to stop genocide by
the far right (who attacked him relentlessly). And was generally successful.
- wako? so a couple of hundred died accidently - that was a shame. but also a tiny incident.
- perjury? first off, what kind of people would have even asked that question? secondly,
it's a shame that clinton didn't just say "fuck you", third, besides Hillary and some crotch-obsessed
conservatives nobody else really gives a damn.
So, aside from Hillary and the wacko cult everyone else did great. Now compare that to:
- 30k-100k dead in Iraq, unknown numbers of injured
- 2k dead, 20k+ injured americans in Iraq
- a completely ruined reputation with the rest of the world
- *unknown* numbers of people arrested for being terrorists then sent to secret detention centers to be tortured!
- billions of dollars wasted on a war that will undoubtably be lost in the end
- loss of civil rights for americans
- destruction of previously well-run government departments (FEMA, etc)
- insane tax-cutting and defense spending that has resulted in a federal deficit being financed by the chinese
- willfully breaking the congressional legislation that requires congressional approval for wiretaping. Then lying about it. Several times.
The conservatives in the US were very hot to impeach Clinton over that blowjob. Now, what about an impeachment over something that *really* affects our lives?
> At 60,000 per CPU, you could hire some more people to make things run a little smoother.
you'd think so, but it seldom really ends up that way. Say you've got 10 production CPUs - so $600k. Now that's list, so you'll really probably just pay $400k. With $400k you can hire two contractors for one year (assuming $75 / hour plus office space, etc).
Would two people for one year really make mysql run as well as oracle or db2? ah, no. They might make some minor improvements, or might be able to write a lot of code around the database to do what oracle/db2 do out of the box. But then how well will it really work? And what's your labor cost to maintain that code after that first year?
There's some code in the database that really doesn't need to be there. And there's code that does - that you shouldn't try to reinvent. The ability to run queries in parallel, to spread data across a hundred separate servers - and allow them to cooperatively process the data, etc. You don't want to try to rewrite that. Eventually postgresql & mysql will support this natively. Then the economics will completely change.
> Plus, you're only looking at the production stuff.
> If you want to do development, you also have to pay for the developer versions, so that people can actually develop. But then when you
> put it on the production server, it runs differently, so ideally, you'd want to develop on the exact same software that you run in production,
> which is much easier with MySQL or PostgreSQL.
Yep, mysql or postgresql are easier - just because you get to avoid the procurement process. But development environments are usually free these days. It's the large test infrastructure that isn't. But this isn't necessarily in favor of mysql or postgresql either: since they lack parallelism & partitioning, you can easily be forced to buy larger production environments in order to match what you could do with db2/oracle. Now you'll have to buy all that extra hardware a second time if you want identical environments.
> Ever heard of S.A.M.E.?2 000_same.pdf
> http://otn.oracle.com/deploy/availability/pdf/oow
> It's trivial to implement Mysql or Postgresql on top of S.A.M.E.
SAME (Stripe and Mirror Everything) trades off hardware costs for labor: you end up with twice as much storage as you really need. Further, it works better with expensive heavily-cached storage such as EMC, IBM Shark, etc rather than low-cost direct attached. Sometimes that's a good deal - if you really can't get a good dba, it's probably a safe choice.
On the flip side, it's just plain expensive and you seldom get to really buy as much storage as you'd like. I've often spoke with people who've had to implement SAME on top of insufficient hardware - since they couldn't get approval for all that they needed, or outgrew their initial configuration and were waiting for the budget to buy a lot more.
In my opinion SAME is fine if you have a small & simple little database. But anything worth benchmarking deserves an intelligent compromise. My data warehouse (db2) has two sets of arrays: fast and slow. Each has their own set of dedicated disk adapters.
- the slow set of arrays are each configured with raid5 on top of 10k rpm drives. Data, backups, etc are stored here.
- the fast set of arrays are each configured with raid10 on top of 15k rpm drives. The temp space and log files are stored here.
- eventually, I'll add a second set of fast arrays to hold indexes, small dimension tables, small summary tables, and possibly current-only detail data. This could go into the same pool as the existing fast arrays holding temp & logs, but performance will probably suffer - since those two sets of data are almost guaranteed to be used simultaneously.
Note that the fast drives cost almost exactly *twice* what the slow drives cost. By splitting the data up this way we've saved about $50k. On a single server.
No, i'm not really interested in seeing benchmarks in which people try to save pennies by buying mysql to then blow it on excessive hardware costs.
> you may want to look at the mysql website again, the remaining differences between their ANSI
> compliant mode and the standard seem like slim pickings for a critic.
Yeah, they're covering their butts in the documentation much better than the old days in which they blatantly stated that transactions and pk/fk constraints were bad. Now, it's much harder to find the things that they are embarressed about - like their old licensing faq, or compatibility issues. The url you provided only shows those compatibility problems that they have worked out. It does not list the existing ones for which there is no good solution. And I'm not in the mood to go through their documentation to find it all.
> Also, partitioning has been in there forever
Well, there's this: http://dev.mysql.com/doc/refman/5.0/en/partitionin g.html
Which says only:
"Chapter 15. MySQL Partitioning
This is the beginning of the Partitioning chapter. "
Maybe you're thinking of their separate clustering product? Which being limited to your available memory, isn't really the samething. Partitioning splits your data by either hash key or ranges into multiple physical tables. These tables are kept on either the same server (range partitioning via oracle, mdc via db2) or on multiple servers (hash partitioning via db2, think beowulf). In either case your application is unaware, the database handles the details. You just get speed. MySQL doesn't do this.
> as has memory tuning.
Not really. And keeping in mind that memory tuning is one of the most critical features of a database - one that allows you to eke out the maximum benefit from your expensive hardware...
The mysql documentation barely mentions memory tuning at all. It does give you a large set of pools, heaps, etc to tweak (about 28):
- http://dev.mysql.com/doc/refman/5.0/en/server-para meters.html
But this looks more like a haphazzard collection that has grown over time, rather than anything methodical. I couldn't actually find any recommendations for how to tune these in the documentation. Maybe it was hidden somewhere, but searches on 'memory' didn't turn it up. Moreover:
1. it fails to give you the flexibility you need: it appears to still be impossible to set up multiple buffer pools for read caching that are shared between sets of like tables. This is typically the *first* thing you should plan & tune. It allows you to ensure that some large tables get 99.9% of their hits from cache rather than memory - without having to move them into the memory storage engine in mysql - which you may not have sufficient memory to do.
2. it inefficiently uses memory: buffers can't be dedicated to a functional purpose, they often have to be dedicated to a storage engine. This means that if you are using both innodb & myisam you will pay a penalty in your memory usage.
MySQL does offer a lot of knobs to twist - really as many as db2 or oracle. It just doesn't seem to document what they do or how to analyze their result. And they aren't well thought out or complete, they're bizarre. Maybe duct-table and bailing wire to catch problems like:
"Note that if you use any other option to ALTER TABLE than RENAME, MySQL always creates a
temporary table, even if the data wouldn't strictly need to be copied (such as when you change the
name of a column). For MyISAM tables, you can speed up the index re-creation operation (which is the
slowest part of the alteration process) by setting the myisam_sort_buffer_size
system variable to a high value."
Here you can see that mysql has some mysterious problem doing ALTER TABLE - in which i
if you're moving data in & out frequently, then you should look at partitions rather than indexes.
BTree indexes are usually only useful when you're selecting less than 2% of the data. Partitioning works great with 10%, 20% or even 80%. And can often allow you to move partitions in & out of the database within a transaction.
I agree that updating a snowlake is tougher, but it depends on what you want to optimize for. If you want to optimize for fast & easy retrieval for analysis, you really want to use a star schema most of the time.