> If people would use PostgresSQL, most companies' OLTP systems would be thrown, performance-wise, back > into the stone ages. No matter how you cut it, DB2 (and some of the other commercial RDBMSs) are simply > light years ahead of open source software.
Yeah, there are still many reasons to choose a commercial dbms. Like:
1. db2 just set the world record for transaction speed - at about 50,000 transactions a second. Last I heard mysql was trumpeting 800 transactions a second with innodb. Not sure about postgresql.
2. with partitioning, parallelism, and clustering, you can get subsecond response time from db2 *adhoc* queries against tables containing over a terrabyte of data. Postgresql, Mysql, and Firebird aren't even in the ballpark here. Note: mysql "speed" will end up requiring you to index every single column, which will kill your insert speed, double the size of the data, and their optimizer won't use the indexes anyway whenever you want to access more than 5% of the data.
3. Mature, proven high-availability solutions.
4. Mature, proven replication solutions.
5. Cost. Really - cost is a reason (sometimes) to use commercial software. Here's how this works: lets say you've got a critical business problem in which 1 minute of downtime = a loss of $10,000 dollars in revenue. Add to this a development team of 20 people ($1,000,000/year). Add hardware costs ($500,000). Now, that commercial database license may run you $50,000 (vs $500+ mysql, free for postgresql). But $50k is nothing compared to the costs at risk:
- online changes to db2 vs recycling mysql & postgresql
- robust ha on db2 vs replication for mysql
- standard sql functionality & productivity on db2 vs mysql
- less hardware for db2 than msyql/postgresql to get same performance
- etc, etc, etc So, on a big project where the database is critical - you will actually *save* money going with a commercial database. Well, on large & critical applications anyway.
6. Consistency: since most organizations will require a commercial database for their most demanding applications - and they can benefit from a complexity reduction by using the same database on all applications. This way they've got just one set of skills to get all dbas on, they can get by with a smaller dba team (read: less labor = less cost), when a new version, patches, etc - they can get up to speed with it much faster, etc.
Not to say that the open source solutions aren't great: they are, and can pick up much of the database work these days. But there's still a huge case to be made for commercial products - and will be for a while, since the functionality missing in mysql & postgresql needed to compete at the top-end will be very difficult to implement.
> reinventing the wheel should be a last resort, not a first instinct.
yeah, but this guy doesn't know enough about IT to understand that most commercial IT wheels come attached to a commercial IT locomotive engine.
For example, when you only wanted a utility to copy data between databases, and you end up stuck with a $250,000 commercial ETL solution - you are hosed: - It's going to cost you $75,000 in annual licensing fees - will cost more to add to more servers - will probably require training - configuration for a simple task will take longer than a simpler custom solution - etc, etc, etc
> The first thing you should think about before deciding to develop software rather than purchase it > is: Is our organization a software company? If you aren't a software company, what makes you think you > can successfully deploy a software project?
Right, likewise - nobody should make their own food at home: there are plenty of restaurants out there that can make it for you. Think about it - you can save $20,000 for the cost of a kitchen on your next hour - and then spend an extra hour a day at work instead of making dinner.
Hell, why even wipe your own ass? Are you a programmer or an ass-wiper? You can probably find someone to wipe your ass for $20.
Of course, there are benefits to sometimes doing this stuff yourself:
- less likely to get screwed by a software integration company
- ability to exploit internal agile methods, rather than being stuck with more expensive waterfall methods due to contract language limitations.
- ability to build a solution that is exactly what you need, rather than forced to make do with a COTS app - that may be very different than what you need.
- ability to ensure that your hamburger wasn't dropped on the floor
- ability to ensure that your ass is wiped nicely.
Of course, if you are going to do your own software development - it is essential that you understand the risks associated with this. Get a few really senior, experienced, and talented people. Don't even bother writing your own solutions if you just want to hire idiots. Of course, there's nothing revolutionary here either - don't skimp on talent in engineering brake systems either...
> Any freshman course in Engineering economics can tell you that 99% of the time it's better to buy than to make
And any graduate course in engineering economics can tell you that there is no choice of buy vs make for large enterprise apps: the choice is buy & customize vs make.
And given that most enterprise apps have been built as monolithic software rather than truly distributed components - customization is expensive, risky, time-consuming, and difficult to maintain.
Not to say that COTS software is all bad - but I have seen a trail of carnage over the past twenty years that probably exceeds that of custom software development:
- consistent underestimation of the cost of customization
- lack of required functionality - due to expected cost of customization.
- inability to keep skilled IT personnel necessary to maintain & continue to customize code: since no developer worth his salt wants that job.
- inability to easily upgrade - since required customizations interfere.
I don't think that this will change much until we begin to architect solutions differently: web services & SOAs should be a good start, along with commodity components & protocols that allow tons of off the shelf components to be plugged into an enterprise service bus.
Right, and this solution has its own limitations within this context: namely that if you crunch your data real time, rather than read it from a data store:
1. if you decide to add a new analytic you have to start with new data - you can't deploy a new analtyical component and against historical data.
2. if your machine crashes - it takes all your accumulated analytical data along with it. Maintaining a distribution of activity calculated every 5 minutes over 90 days? Great, but after the server comes back up your data starts all over.
3. if your analtyical component needs to run against a lot of history each time (ex: total number of unique telephone numbers accessed by day, calculate rolling median) then you'll have to maintain that detail data in memory. As you can imagine - you can *easily* identify calculations that will exceed your memory. So, to tune you'll be forced to keep your calculations to relatively recent data only.
>> I've seen these techniques used to save companies million, even hundreds of millions of dollars.
> I will give you the the "million" number without question/issue...
> But I am going to have to call your bluff on "hundreds of millions of dollars". I just don't > think thats a reasonable estimate at all.
ha, yeah I thought that later - it didn't sound credible. However, back in 1998 I did save a company $150 million using a warehouse that used these techniques. The savings came from comparing all financial data for a fortune 10 company to all inventory data - and finding lost assets.
The partitioning & parallelism methods I mentioned earlier weren't directly responsible for the savings. But this system was originally built in 1995 - supporting a terrabyte of data on 4.5 gbyte 7200 rpm drives and on 66 mhz cpus with 256 mbyte of memory- and gave 30 second response time to adhoc queries that had to scan 300+ gbytes of data. Without these database features it wouldn't have been practical to compare such large volumes of data - and the application just wouldn't have been built. And the savings wouldn't have been made.
Since then I've been on projects to automate reports that drove the workload of 6,000 engineers at quest - with a warehouse that loaded 50 million rows a day, and produced reports in 6 minutes on a 4-way using oracle that replaced a 12-way e6500 running sas that took 14 hours a day to run.
I've built another warehouse that improved service managability so much that it ended up becoming the #1 selling point for one (huge) managed service provider.
etc, etc. But the $150 mil is really the biggest tangible benefit I can point to I guess. Not quite the same as saving hundreds of millions in database licensing costs. But an indirect savings anyway.
> Materialized views can be implemented in PostgreSQL with triggers, and often are.
Right, that's a fine workaround. Of course, you'll also want user-maintainable ones as well (in order to coordinate around loads, etc) but you can drive that at the application level also.
> It's the same thing as far as I can tell, but it is missing the ability for the planner to > automatically select the view instead of the table for a certain query.
Right, and this is both the critical & difficult to implement aspect of this functionality. Note also that there's no direct relationship to rolap here - other than rolap is one type of application that often heavily leverages this functionality.
And as far as rolap is concerned, I almost always implement summary tables for analytical apps. When we write these apps from scratch, then I typically have them read directly from user-maintained summary tables. When using canned applications (Cognos, Business Objects, Microstrategy) then the only choice is to go with materialized views (Materialized Query Tables in db2) & query rewrite.
>> I still believe that MySQL is the Access of the OSS dtabases
> Perhaps, except that it is n* times better. I've run some pretty intense, badly-written db-abusive > e-commerce sites (we are talking $5 million per year in cash flow) using MySQL without problems.
Hmmm, but wouldn't it be better to use a database that doesn't silently truncate numbers so that you could be a $50 million per year company instead?
> I'm not convinced. SQL is supposed to a standard, so you can move from one database server to another > with not much effort. This is a big step away from that. Much like the features you'd find in Oracle or > MS SQL.
Well, every other mature dbms vendor is convinced, as are most major application vendors: the ability to write simple procedures can be of enormous value to some applications. And as long as you don't go nuts, it's typically is easy to port as well.
Using a stored procedure language you can: 1. only expose views (rather than tables), even allowing writes to views with joins via an 'instead of' trigger + stored procedure.
2. completely encapsulate tables & views behind stored procedures. This provides enormous benefits to rapidly-evolving applications, since the dba can make whatever changes are needed on the backend (for performance, security, functionality, etc) in parallel to application developers on the application layer.
3. automatically maintain multiple copies of data: when using recursive data structures you often need to keep a second copy for massive scans. The stored procedure + trigger can easily manage this for you. You can likewise use this method to automatically maintain a transaction history table. This can be much more reliably & productively than you'd ever do in an application.
4. encapsulate logic for use within queries. Need to convert ip addresses between strings for presentation & integers for storage? A simple stored procedure function can do that job for you. And the value of doing it in the database, is that you won't have to write application logic around simple adhoc queries.
Postgresql is ready to take on the smaller, non-critical databases that oracle used to get. This is significant proportion of the databases out there, and will take revenue away from oracle. (Mysql will actually probably take more revenue away, but it has too many quality problems and functionality gaps to really deserve to.)
But there are many other, more demanding databases that postgresql isn't yet ready for. Oracle, DB2, and even SQL Server 2005 all have very mature & solid: optimizers, replication, partitioning solutions, parallelism, failover/clustering support, etc.
Here are two examples:
Using db2 for example, you can create a view which is automatically populated by the database like a table (MQT). Then any queries against the base tables that could be sped up by hitting this view will be rewritten by the engine to hit the view. Now, this might seem like needless fluff if you're just writing a hobby php app. But if you need to implement a commercial app like SAP with its 6,000 tables - and you have performance issues - you can make adjustments in the database layer this way. Also, if you're provoding adhoc reporting for hundreds of users on a terrabyte of data - this technique can provide *dramatic* performance benefits.
Another example is partitioning. Back to db2 (which I work with the most): you can spread a database across a dozen separate servers using a hashkey. Now, every query will have all dozen servers working independently on its own fraction of the data. On each of those servers, you can then partition again, this time using ranges or values (MDC) - so that data that doesn't apply to a query will be skipped in tablescans of that table. Using these techniques you can get sub-second response to *adhoc* queries against a terrabyte of data - without indexes (notoriously unreliable here).
Lots more examples where the above came from. Sure, you will pay real money for licensing, hardware, and labor to implement these. Then again, the two above features actually save you in hardware costs. Additionally, some problems are big enough that they can easily justify the cost of licensing a product like this. I've seen these techniques used to save companies million, even hundreds of millions of dollars.
Re:Geographic Information Systems
on
MySQL CEO Interview
·
· Score: 2, Insightful
> Have you tried doing bounds-checking in whatever scripting language your frontend application is > written in, before passing it to MySQL?
yes, we used to exclusively rely on the application to manage data quality back in the 70s and early 80s (when using hierarchical databases, flat files, and ISAM). Of course, then we discovered that the procedural application code did a *horrible* job of consistently performing these checks, for various reasons like: 1. checks changed over time, but the application programmer failed to revalidate 100% of the historical data. 2. multiple application interfaces implemented checks differently (j2ee client vs.net client vs etc, etc).
So, as of about 1984, I've been using these capabilities pretty extensively. Not to say that I don't also perform simple constraint-checking in the app - there are some usability benefits there. But the database provides a redundant, declarative, and failure-proof assurance of many constraints.
> MySQL just assumes you're smart enough to deal with stuff like that your own way if you don't > like the way it's going to.
No, MySQL suffers from quick development focused on marketing rather than engineering. These errors look more like oversights than deliberate engineering, they are misleading & inconsistent. Further, mysql is the *only* product I know of that claims to be a RDBMS that has these issues. How is any of this a good thing?
> Fortunately you do get to see exactly how MySQL deals with exceptions, and you can even change it > if you don't like it.
Oh sure, you read about these documented bugs - but you still won't get an exception for a numeric or string overflow, or invalid date. So, if you want your app to run on five different databases - you've got to write extra code for mysql - around its bugs.
This isn't good engineering, it's sloppiness. And MySQL shouldn't get a free pass just because they're open source. We'd expect more from oracle, sql server, or postgresql. MySQL will fix these problems - since they're now courting commercial application developers they have no choice. But it's disingenuous to pretend that these bugs are deliberate.
Re:Geographic Information Systems
on
MySQL CEO Interview
·
· Score: 5, Interesting
get in line...
MySQL is still implementing functionality common twenty years ago. And many of their enhancements of the last few years have left major gaps (innodb/replication awkwardness, etc).
Additionally, they still haven't addressed their problem with silent exceptions (quietly truncating strings that don't fit, quietly converting numbers that don't fit, allowing invalid dates, etc, etc).
So, yeah, it would be nice for them to pick up some OORDBMS functionality that postgesql has like spatial awareness, ip functions, etc - but I hope that they clean the product up first instead.
MySQL certainly has a lot to fear from software patents: it's a commercial company that could be easily sued.
And it's just now implementing functionality that other vendors put into their products 10-20 years ago. Many of these vendors have patents that cover some of the better approaches.
Any idea which dbms patents mysql is stepping on most blatently? Does oracle have multi-version-consistency patented?
2 relevant sententces out of 38
on
MySQL CEO Interview
·
· Score: 4, Informative
Not much here:
What do you think was the top story in the Linux and open source arena in 2004?
Marten Mickos:None of the legal attacks on open source or Linux have been successful. None of that stuff has gone anywhere. That's the biggest story.
On that subject, MySQL has come to the conclusion that software patents will ultimately be demonstrated to be harmful to the industry. So, we are sponsoring a campaign in the European Union today to educate politicians and decision makers on the negative impact of software patents.
> 1. What does that have to do with anything? This is a discussion about a 64-bit version of Windows.
read the parent post, it may make sense then.
> 2. You already can use 8, 12, or 16 Gigs of memory via a 32 bit OS. Look up Intel's PAE.
Yes, if I felt like rewriting application & database servers, I *could* do that. But I have no plans to develop my own database management software just so that I could use flaky 32-bit extentions.
Both Oracle & DB2 support this - but can only use the memory for buffer caching, not for sorting, or other memory needs. So, it has some value - and can be the strategy to get you out of a tight spot - but having personally seen OS patches cause problems with this functionality, I always avoid it. Especially since 64-bit is about all that we buy anymore in the unix world anyway.
Most project management methods push waterfall development - with its huge reliance on time-consuming and error-prone upfront analysis & requirements gathering.
Of course, they hate requirements changes. And of course, their initial requirements are usually wrong - and fail to meet the need.
The answer isn't to stop changes - but to use methods that aren't so vulnerable to impact of change - like patterns, agile methods, passionate & highly skilled staff, etc, etc.
> these arguements of companies not "losing" anything from piracy are "complete bullshit". to use > your own words...they may not "lose" anything, but that doesn't mean it doesn't hurt them.
Actually, most of the time the people hurt the most by piracy are the 2nd & 3rd tier players: if x millions of people who can't afford MS Office just pirate it, then they won't buy the more affordable OO, SmartSuite, etc, etc.
The chief reason that MS Office is a de facto corporate standard today - is due to piracy. So yeah, it's helped out these companies enormously.
Now that they've become the monopoly however, they'd like to chase those dollars, so they're cracking down. Well, perhaps the flip side is that it'll help encourage the use of free software.
> You seem to have a problem with reality. Companies will not change because you told them this new app > is OSS, they don't care. They want what works with minimum fuss. Exchange/Outlook works, there are > clear upgrade paths, and it plugs in nicely with other MS business apps, Sharepoint being the one > that comes up most in this disscussion.
*Some* companies won't leave exchange. Sure. Then again *some* companies are still using software that sucked ten years ago. No matter how good the alternatives, large government and private bureacracies will be unable to change until they're forced to.
But many new/smaller/more agile companies will make the dump:
1. They don't have millions of Excel macros to worry about (because they weren't so foolish to create them in the first place), and for the small number they have they can gradually migrate them over time.
2. They don't have tons of access databases for the same reason, the ones they have are mostly unsupported, undocumented, and inaccurate anyway. They'll migrate them gradually.
3. They don't have complete idiots for staff, so if things aren't 100% identical to microsoft - that's ok. Their staff can migrate from IE to Firefox without having nervous breakdowns, they can migrate from Office to OO as well.
4. They've avoided a complete microsoft solution - in order to give themselves more flexibility. They aren't using sharepoint. They are using linux, apache, postgresql, php, Wikis, gForge, etc
This profile isn't everyone, but what an increasing number of small sharp companies look like. And as the microsoft alternatives grow - based upon these users, eventually they'll build up more credible exchange-alternative functionality, and more companies will gradually move over.
The fact that many of these often large & cumbersome organizations (Social Security Administration, etc) are struggling with a strategic relationships to microsoft made ten years ago, before open source took off doesn't mean that they want to continue spending $10+ million each year to Microsoft. And it doesn't mean that they won't eventually move off it.
Hell, many of these companies still have using connecting to mainframes over 3270. So, what? Given the rate of improvement & adoption of open source - these companies will eventually move off microsoft. It's inevitable.
>> Oracle's partitioning is cool - very managable. But only really works great when you only want a >> small amount of the data - ie, doesn't help distribute the load when you want to access all of it.
> Really? Has 10g altered how this might work? What about the "parallel query option"? I haven't played > with it that much, just worked on a system where it was set up.
I haven't worked with 10g yet, it sounds like little more than a refined version of OPS. OPS was so labor-intensive and fragile that I always avoided it. And since I primarily work on BI a HA cluster environment is seldom needed anyway.
And Parallel Query Option certainly helps (and can give a near-linear performance improvement with small numbers of cpus with many large queries. But that's not the same as the hash-partitioning you'd do on Terradata, Informix, or DB2 - where you could spread the data out evenly across 20 servers - and when querying a large table set 20-80 CPUs to quickly resolving the query.
Oracle's equiv would involve getting something like an E1500 ($3m or so) and then letting its 64-CPUs work in parallel in query. Probablem is that it doesn't scale as well as a shared-nothing architecture using hash-partitioning (where you can have 200 CPUs), and it's far more expensive than 32 two-CPU Athalon blade servers running a hash-partitioning database.
Also note that the hash-partioning method doesn't preclude also doing oracle-style range-partioning. You can easily combine both in Informix or DB2 to get really stunning performance figures. Terradata as well I suspect.
> Distributed Transactions are one of the most over-hyped features of expensive DB's and used as a huge red-herring in most every DB evaluation process.
Haven't noticed actually: in the course of dozens of db evals it really hasn't ever scored very highly as far as I recall.
Parallelism, partitioning, clustering, replication, materialized views, query rewrite, bitmap indexes, etc - these all have. And are areas in which the open source databases still lag behind enormously.
So, the open source databases (esp. postgresql & firebird) are great candidates for small applications. But are still years away from deploying credible solutions in the mission-critical and large database arena. Once they get close to delivering in this spot, then the database vendors will have to get seriously nervous. Until then - the large commercial solutions will continue to have a key diffentiator that will justify a substantial pricetag.
Since SQL Server is also catching up in this space, I assume that it's going to be in deep financial trouble starting in 2005. Sybase is basically a fringe product already. Informix has been sucked into db2. What's going to stay strong in the commercial database space? Oracle & DB2.
And in the open source space? MySQL (based on momentum), Postgresql (based on merit), and maybe Firebird (based on loyal users). Ingres and SAP-DB (adabas) are probably goners, as they should be. Most of the rest are just fringe-players.
> You shouldn't have to do much with vertical partitioning of tables to make things invisible to the users.
It depends (for a couple of reasons): - you'll want the queries that access that partitioned table to have the partitioning criteria in the where clause - otherwise you'll just scan the entire structure each time you access it without an index. - you'll want a partitioning key that has the *right* granularity ('right' depends on a lot of factors). In order to do this you may need to generate a new column from an old one via a trigger, have the app populate another column, etc, etc.
Oracle's partitioning is cool - very managable. But only really works great when you only want a small amount of the data - ie, doesn't help distribute the load when you want to access all of it.
SQL Server's partitioning seems pretty lame. And both Oracle and DB2 can do the same thing. Just wondering - have you had good success in distributing these tables across multiple servers & managing (say) 400 daily partitions?
DB2 has several partitioning options - one like Oracle's (but not quite as nice), one like SQL Server's (but nicer, but so what?), and one that distributes the data across many servers (very nice, but more complex to manage).
These are huge issues to get right when you decide you want fast access to tons of data.
> If people would use PostgresSQL, most companies' OLTP systems would be thrown, performance-wise, back
> into the stone ages. No matter how you cut it, DB2 (and some of the other commercial RDBMSs) are simply
> light years ahead of open source software.
Yeah, there are still many reasons to choose a commercial dbms. Like:
1. db2 just set the world record for transaction speed - at about 50,000 transactions a second. Last I heard mysql was trumpeting 800 transactions a second with innodb. Not sure about postgresql.
2. with partitioning, parallelism, and clustering, you can get subsecond response time from db2 *adhoc* queries against tables containing over a terrabyte of data. Postgresql, Mysql, and Firebird aren't even in the ballpark here. Note: mysql "speed" will end up requiring you to index every single column, which will kill your insert speed, double the size of the data, and their optimizer won't use the indexes anyway whenever you want to access more than 5% of the data.
3. Mature, proven high-availability solutions.
4. Mature, proven replication solutions.
5. Cost. Really - cost is a reason (sometimes) to use commercial software. Here's how this works: lets say you've got a critical business problem in which 1 minute of downtime = a loss of $10,000 dollars in revenue. Add to this a development team of 20 people ($1,000,000/year). Add hardware costs ($500,000). Now, that commercial database license may run you $50,000 (vs $500+ mysql, free for postgresql). But $50k is nothing compared to the costs at risk:
- online changes to db2 vs recycling mysql & postgresql
- robust ha on db2 vs replication for mysql
- standard sql functionality & productivity on db2 vs mysql
- less hardware for db2 than msyql/postgresql to get same performance
- etc, etc, etc
So, on a big project where the database is critical - you will actually *save* money going with a commercial database. Well, on large & critical applications anyway.
6. Consistency: since most organizations will require a commercial database for their most demanding applications - and they can benefit from a complexity reduction by using the same database on all applications. This way they've got just one set of skills to get all dbas on, they can get by with a smaller dba team (read: less labor = less cost), when a new version, patches, etc - they can get up to speed with it much faster, etc.
Not to say that the open source solutions aren't great: they are, and can pick up much of the database work these days. But there's still a huge case to be made for commercial products - and will be for a while, since the functionality missing in mysql & postgresql needed to compete at the top-end will be very difficult to implement.
> reinventing the wheel should be a last resort, not a first instinct.
yeah, but this guy doesn't know enough about IT to understand that most commercial IT wheels come attached to a commercial IT locomotive engine.
For example, when you only wanted a utility to copy data between databases, and you end up stuck with a $250,000 commercial ETL solution - you are hosed:
- It's going to cost you $75,000 in annual licensing fees
- will cost more to add to more servers
- will probably require training
- configuration for a simple task will take longer than a simpler custom solution
- etc, etc, etc
> The first thing you should think about before deciding to develop software rather than purchase it
> is: Is our organization a software company? If you aren't a software company, what makes you think you
> can successfully deploy a software project?
Right, likewise - nobody should make their own food at home: there are plenty of restaurants out there that can make it for you. Think about it - you can save $20,000 for the cost of a kitchen on your next hour - and then spend an extra hour a day at work instead of making dinner.
Hell, why even wipe your own ass? Are you a programmer or an ass-wiper? You can probably find someone to wipe your ass for $20.
Of course, there are benefits to sometimes doing this stuff yourself:
- less likely to get screwed by a software integration company
- ability to exploit internal agile methods, rather than being stuck with more expensive waterfall methods due to contract language limitations.
- ability to build a solution that is exactly what you need, rather than forced to make do with a COTS app - that may be very different than what you need.
- ability to ensure that your hamburger wasn't dropped on the floor
- ability to ensure that your ass is wiped nicely.
Of course, if you are going to do your own software development - it is essential that you understand the risks associated with this. Get a few really senior, experienced, and talented people. Don't even bother writing your own solutions if you just want to hire idiots. Of course, there's nothing revolutionary here either - don't skimp on talent in engineering brake systems either...
> Any freshman course in Engineering economics can tell you that 99% of the time it's better to buy than to make
And any graduate course in engineering economics can tell you that there is no choice of buy vs make for large enterprise apps: the choice is buy & customize vs make.
And given that most enterprise apps have been built as monolithic software rather than truly distributed components - customization is expensive, risky, time-consuming, and difficult to maintain.
Not to say that COTS software is all bad - but I have seen a trail of carnage over the past twenty years that probably exceeds that of custom software development:
- consistent underestimation of the cost of customization
- lack of required functionality - due to expected cost of customization.
- inability to keep skilled IT personnel necessary to maintain & continue to customize code: since no developer worth his salt wants that job.
- inability to easily upgrade - since required customizations interfere.
I don't think that this will change much until we begin to architect solutions differently: web services & SOAs should be a good start, along with commodity components & protocols that allow tons of off the shelf components to be plugged into an enterprise service bus.
Right, and this solution has its own limitations within this context: namely that if you crunch your data real time, rather than read it from a data store:
1. if you decide to add a new analytic you have to start with new data - you can't deploy a new analtyical component and against historical data.
2. if your machine crashes - it takes all your accumulated analytical data along with it. Maintaining a distribution of activity calculated every 5 minutes over 90 days? Great, but after the server comes back up your data starts all over.
3. if your analtyical component needs to run against a lot of history each time (ex: total number of unique telephone numbers accessed by day, calculate rolling median) then you'll have to maintain that detail data in memory. As you can imagine - you can *easily* identify calculations that will exceed your memory. So, to tune you'll be forced to keep your calculations to relatively recent data only.
ken
>> I've seen these techniques used to save companies million, even hundreds of millions of dollars.
> I will give you the the "million" number without question/issue...
> But I am going to have to call your bluff on "hundreds of millions of dollars". I just don't
> think thats a reasonable estimate at all.
ha, yeah I thought that later - it didn't sound credible. However, back in 1998 I did save a company $150 million using a warehouse that used these techniques. The savings came from comparing all financial data for a fortune 10 company to all inventory data - and finding lost assets.
The partitioning & parallelism methods I mentioned earlier weren't directly responsible for the savings. But this system was originally built in 1995 - supporting a terrabyte of data on 4.5 gbyte 7200 rpm drives and on 66 mhz cpus with 256 mbyte of memory- and gave 30 second response time to adhoc queries that had to scan 300+ gbytes of data. Without these database features it wouldn't have been practical to compare such large volumes of data - and the application just wouldn't have been built. And the savings wouldn't have been made.
Since then I've been on projects to automate reports that drove the workload of 6,000 engineers at quest - with a warehouse that loaded 50 million rows a day, and produced reports in 6 minutes on a 4-way using oracle that replaced a 12-way e6500 running sas that took 14 hours a day to run.
I've built another warehouse that improved service managability so much that it ended up becoming the #1 selling point for one (huge) managed service provider.
etc, etc. But the $150 mil is really the biggest tangible benefit I can point to I guess. Not quite the same as saving hundreds of millions in database licensing costs. But an indirect savings anyway.
ken
> Materialized views can be implemented in PostgreSQL with triggers, and often are.
Right, that's a fine workaround. Of course, you'll also want user-maintainable ones as well (in order to coordinate around loads, etc) but you can drive that at the application level also.
> It's the same thing as far as I can tell, but it is missing the ability for the planner to
> automatically select the view instead of the table for a certain query.
Right, and this is both the critical & difficult to implement aspect of this functionality. Note also that there's no direct relationship to rolap here - other than rolap is one type of application that often heavily leverages this functionality.
And as far as rolap is concerned, I almost always implement summary tables for analytical apps. When we write these apps from scratch, then I typically have them read directly from user-maintained summary tables. When using canned applications (Cognos, Business Objects, Microstrategy) then the only choice is to go with materialized views (Materialized Query Tables in db2) & query rewrite.
>> I still believe that MySQL is the Access of the OSS dtabases
> Perhaps, except that it is n* times better. I've run some pretty intense, badly-written db-abusive
> e-commerce sites (we are talking $5 million per year in cash flow) using MySQL without problems.
Hmmm, but wouldn't it be better to use a database that doesn't silently truncate numbers so that you could be a $50 million per year company instead?
> I'm not convinced. SQL is supposed to a standard, so you can move from one database server to another
> with not much effort. This is a big step away from that. Much like the features you'd find in Oracle or
> MS SQL.
Well, every other mature dbms vendor is convinced, as are most major application vendors: the ability to write simple procedures can be of enormous value to some applications. And as long as you don't go nuts, it's typically is easy to port as well.
Using a stored procedure language you can:
1. only expose views (rather than tables), even allowing writes to views with joins via an 'instead of' trigger + stored procedure.
2. completely encapsulate tables & views behind stored procedures. This provides enormous benefits to rapidly-evolving applications, since the dba can make whatever changes are needed on the backend (for performance, security, functionality, etc) in parallel to application developers on the application layer.
3. automatically maintain multiple copies of data: when using recursive data structures you often need to keep a second copy for massive scans. The stored procedure + trigger can easily manage this for you. You can likewise use this method to automatically maintain a transaction history table. This can be much more reliably & productively than you'd ever do in an application.
4. encapsulate logic for use within queries. Need to convert ip addresses between strings for presentation & integers for storage? A simple stored procedure function can do that job for you. And the value of doing it in the database, is that you won't have to write application logic around simple adhoc queries.
etc, etc, etc, etc.
Postgresql is ready to take on the smaller, non-critical databases that oracle used to get. This is significant proportion of the databases out there, and will take revenue away from oracle. (Mysql will actually probably take more revenue away, but it has too many quality problems and functionality gaps to really deserve to.)
But there are many other, more demanding databases that postgresql isn't yet ready for. Oracle, DB2, and even SQL Server 2005 all have very mature & solid: optimizers, replication, partitioning solutions, parallelism, failover/clustering support, etc.
Here are two examples:
Using db2 for example, you can create a view which is automatically populated by the database like a table (MQT). Then any queries against the base tables that could be sped up by hitting this view will be rewritten by the engine to hit the view. Now, this might seem like needless fluff if you're just writing a hobby php app. But if you need to implement a commercial app like SAP with its 6,000 tables - and you have performance issues - you can make adjustments in the database layer this way. Also, if you're provoding adhoc reporting for hundreds of users on a terrabyte of data - this technique can provide *dramatic* performance benefits.
Another example is partitioning. Back to db2 (which I work with the most): you can spread a database across a dozen separate servers using a hashkey. Now, every query will have all dozen servers working independently on its own fraction of the data. On each of those servers, you can then partition again, this time using ranges or values (MDC) - so that data that doesn't apply to a query will be skipped in tablescans of that table. Using these techniques you can get sub-second response to *adhoc* queries against a terrabyte of data - without indexes (notoriously unreliable here).
Lots more examples where the above came from. Sure, you will pay real money for licensing, hardware, and labor to implement these. Then again, the two above features actually save you in hardware costs. Additionally, some problems are big enough that they can easily justify the cost of licensing a product like this. I've seen these techniques used to save companies million, even hundreds of millions of dollars.
> Have you tried doing bounds-checking in whatever scripting language your frontend application is
.net client vs etc, etc).
> written in, before passing it to MySQL?
yes, we used to exclusively rely on the application to manage data quality back in the 70s and early 80s (when using hierarchical databases, flat files, and ISAM). Of course, then we discovered that the procedural application code did a *horrible* job of consistently performing these checks, for various reasons like:
1. checks changed over time, but the application programmer failed to revalidate 100% of the historical data.
2. multiple application interfaces implemented checks differently (j2ee client vs
So, as of about 1984, I've been using these capabilities pretty extensively. Not to say that I don't also perform simple constraint-checking in the app - there are some usability benefits there. But the database provides a redundant, declarative, and failure-proof assurance of many constraints.
> MySQL just assumes you're smart enough to deal with stuff like that your own way if you don't
> like the way it's going to.
No, MySQL suffers from quick development focused on marketing rather than engineering. These errors look more like oversights than deliberate engineering, they are misleading & inconsistent. Further, mysql is the *only* product I know of that claims to be a RDBMS that has these issues. How is any of this a good thing?
> Fortunately you do get to see exactly how MySQL deals with exceptions, and you can even change it
> if you don't like it.
Oh sure, you read about these documented bugs - but you still won't get an exception for a numeric or string overflow, or invalid date. So, if you want your app to run on five different databases - you've got to write extra code for mysql - around its bugs.
This isn't good engineering, it's sloppiness. And MySQL shouldn't get a free pass just because they're open source. We'd expect more from oracle, sql server, or postgresql. MySQL will fix these problems - since they're now courting commercial application developers they have no choice. But it's disingenuous to pretend that these bugs are deliberate.
get in line...
MySQL is still implementing functionality common twenty years ago. And many of their enhancements of the last few years have left major gaps (innodb/replication awkwardness, etc).
Additionally, they still haven't addressed their problem with silent exceptions (quietly truncating strings that don't fit, quietly converting numbers that don't fit, allowing invalid dates, etc, etc).
So, yeah, it would be nice for them to pick up some OORDBMS functionality that postgesql has like spatial awareness, ip functions, etc - but I hope that they clean the product up first instead.
MySQL certainly has a lot to fear from software patents: it's a commercial company that could be easily sued.
And it's just now implementing functionality that other vendors put into their products 10-20 years ago. Many of these vendors have patents that cover some of the better approaches.
Any idea which dbms patents mysql is stepping on most blatently? Does oracle have multi-version-consistency patented?
Not much here:
What do you think was the top story in the Linux and open source arena in 2004?
Marten Mickos:None of the legal attacks on open source or Linux have been successful. None of that stuff has gone anywhere. That's the biggest story.
On that subject, MySQL has come to the conclusion that software patents will ultimately be demonstrated to be harmful to the industry. So, we are sponsoring a campaign in the European Union today to educate politicians and decision makers on the negative impact of software patents.
LOL, you're right - I misinterpreted the parent.
Sorry about that. No more coffee for me today.
ken
> 1. What does that have to do with anything? This is a discussion about a 64-bit version of Windows.
read the parent post, it may make sense then.
> 2. You already can use 8, 12, or 16 Gigs of memory via a 32 bit OS. Look up Intel's PAE.
Yes, if I felt like rewriting application & database servers, I *could* do that. But I have no plans to develop my own database management software just so that I could use flaky 32-bit extentions.
Both Oracle & DB2 support this - but can only use the memory for buffer caching, not for sorting, or other memory needs. So, it has some value - and can be the strategy to get you out of a tight spot - but having personally seen OS patches cause problems with this functionality, I always avoid it. Especially since 64-bit is about all that we buy anymore in the unix world anyway.
> *awaits justifications for why 64-bit linux platforms are better*
Because on a large & busy database server 8, 12, or 16 gbytes of memory is invaluable, and you can't get that via a 32-bit OS.
And frankly, if I'm going to spread a db2 ice cluster across twenty 64-bit blade servers, I'd much rather put it on aix or linux than windows.
they must be writing to a table...
Most project management methods push waterfall development - with its huge reliance on time-consuming and error-prone upfront analysis & requirements gathering.
Of course, they hate requirements changes. And of course, their initial requirements are usually wrong - and fail to meet the need.
The answer isn't to stop changes - but to use methods that aren't so vulnerable to impact of change - like patterns, agile methods, passionate & highly skilled staff, etc, etc.
Both my boys became extremely familiar with the keyboard by the age of eight with this game.
;-)
Of course, it helps that they're limited to old hardware and games - so nethack seems pretty sexy to them.
> these arguements of companies not "losing" anything from piracy are "complete bullshit". to use > your own words...they may not "lose" anything, but that doesn't mean it doesn't hurt them.
Actually, most of the time the people hurt the most by piracy are the 2nd & 3rd tier players: if x millions of people who can't afford MS Office just pirate it, then they won't buy the more affordable OO, SmartSuite, etc, etc.
The chief reason that MS Office is a de facto corporate standard today - is due to piracy. So yeah, it's helped out these companies enormously.
Now that they've become the monopoly however, they'd like to chase those dollars, so they're cracking down. Well, perhaps the flip side is that it'll help encourage the use of free software.
> You seem to have a problem with reality. Companies will not change because you told them this new app
> is OSS, they don't care. They want what works with minimum fuss. Exchange/Outlook works, there are
> clear upgrade paths, and it plugs in nicely with other MS business apps, Sharepoint being the one
> that comes up most in this disscussion.
*Some* companies won't leave exchange. Sure. Then again *some* companies are still using software that sucked ten years ago. No matter how good the alternatives, large government and private bureacracies will be unable to change until they're forced to.
But many new/smaller/more agile companies will make the dump:
1. They don't have millions of Excel macros to worry about (because they weren't so foolish to create them in the first place), and for the small number they have they can gradually migrate them over time.
2. They don't have tons of access databases for the same reason, the ones they have are mostly unsupported, undocumented, and inaccurate anyway. They'll migrate them gradually.
3. They don't have complete idiots for staff, so if things aren't 100% identical to microsoft - that's ok. Their staff can migrate from IE to Firefox without having nervous breakdowns, they can migrate from Office to OO as well.
4. They've avoided a complete microsoft solution - in order to give themselves more flexibility. They aren't using sharepoint. They are using linux, apache, postgresql, php, Wikis, gForge, etc
This profile isn't everyone, but what an increasing number of small sharp companies look like. And as the microsoft alternatives grow - based upon these users, eventually they'll build up more credible exchange-alternative functionality, and more companies will gradually move over.
The fact that many of these often large & cumbersome organizations (Social Security Administration, etc) are struggling with a strategic relationships to microsoft made ten years ago, before open source took off doesn't mean that they want to continue spending $10+ million each year to Microsoft. And it doesn't mean that they won't eventually move off it.
Hell, many of these companies still have using connecting to mainframes over 3270. So, what? Given the rate of improvement & adoption of open source - these companies will eventually move off microsoft. It's inevitable.
>> Oracle's partitioning is cool - very managable. But only really works great when you only want a
>> small amount of the data - ie, doesn't help distribute the load when you want to access all of it.
> Really? Has 10g altered how this might work? What about the "parallel query option"? I haven't played
> with it that much, just worked on a system where it was set up.
I haven't worked with 10g yet, it sounds like little more than a refined version of OPS. OPS was so labor-intensive and fragile that I always avoided it. And since I primarily work on BI a HA cluster environment is seldom needed anyway.
And Parallel Query Option certainly helps (and can give a near-linear performance improvement with small numbers of cpus with many large queries. But that's not the same as the hash-partitioning you'd do on Terradata, Informix, or DB2 - where you could spread the data out evenly across 20 servers - and when querying a large table set 20-80 CPUs to quickly resolving the query.
Oracle's equiv would involve getting something like an E1500 ($3m or so) and then letting its 64-CPUs work in parallel in query. Probablem is that it doesn't scale as well as a shared-nothing architecture using hash-partitioning (where you can have 200 CPUs), and it's far more expensive than 32 two-CPU Athalon blade servers running a hash-partitioning database.
Also note that the hash-partioning method doesn't preclude also doing oracle-style range-partioning. You can easily combine both in Informix or DB2 to get really stunning performance figures. Terradata as well I suspect.
> Distributed Transactions are one of the most over-hyped features of expensive DB's and used as a huge red-herring in most every DB evaluation process.
Haven't noticed actually: in the course of dozens of db evals it really hasn't ever scored very highly as far as I recall.
Parallelism, partitioning, clustering, replication, materialized views, query rewrite, bitmap indexes, etc - these all have. And are areas in which the open source databases still lag behind enormously.
So, the open source databases (esp. postgresql & firebird) are great candidates for small applications. But are still years away from deploying credible solutions in the mission-critical and large database arena. Once they get close to delivering in this spot, then the database vendors will have to get seriously nervous. Until then - the large commercial solutions will continue to have a key diffentiator that will justify a substantial pricetag.
Since SQL Server is also catching up in this space, I assume that it's going to be in deep financial trouble starting in 2005. Sybase is basically a fringe product already. Informix has been sucked into db2. What's going to stay strong in the commercial database space? Oracle & DB2.
And in the open source space? MySQL (based on momentum), Postgresql (based on merit), and maybe Firebird (based on loyal users). Ingres and SAP-DB (adabas) are probably goners, as they should be. Most of the rest are just fringe-players.
> You shouldn't have to do much with vertical partitioning of tables to make things invisible to the users.
It depends (for a couple of reasons):
- you'll want the queries that access that partitioned table to have the partitioning criteria in the where clause - otherwise you'll just scan the entire structure each time you access it without an index.
- you'll want a partitioning key that has the *right* granularity ('right' depends on a lot of factors). In order to do this you may need to generate a new column from an old one via a trigger, have the app populate another column, etc, etc.
Oracle's partitioning is cool - very managable. But only really works great when you only want a small amount of the data - ie, doesn't help distribute the load when you want to access all of it.
SQL Server's partitioning seems pretty lame. And both Oracle and DB2 can do the same thing. Just wondering - have you had good success in distributing these tables across multiple servers & managing (say) 400 daily partitions?
DB2 has several partitioning options - one like Oracle's (but not quite as nice), one like SQL Server's (but nicer, but so what?), and one that distributes the data across many servers (very nice, but more complex to manage).
These are huge issues to get right when you decide you want fast access to tons of data.