> Speaking of I/O performance. Is it possible I wonder to have an IO intensive Python script run faster than C?
I think your point is valid:
The first performance technique i usually pursue is to just increase read buffering so that you're reading 4k+ in each time from disk. And that's the same in c as it is in python.
The next is to locally cache data: in which the top 10-100 or so entries are kept in a local list. This is far easier to do in python than c, and can result in a big boost.
The next is to consider other more complex options like proprocessing the data, processing in steps with separate sorts, etc. All of these take time, which I'm more likely to be able to afford if I'm writing in python than c.
Beyond those techniques I'm probably going to spend most of my time on hardware and the data store itself, which is irrelevant to programming language.
> Second - Sweet Hog of Prague! Oracle 10g costs $24 grand Per CPU!?!?!?!?
oh, it can be *far* more expensive than that. The enterprise version is $40k/CPU, and that doesn't even including partitioning. To get Partitioning (and yes, you want it for any large database) you're looking at an extra $10k/CPU. And there are other extra charges as well. You can easily end up at $60k/CPU.
On the flip side, you can also get away with $5k/CPU if you know what you're doing, and if what you're doing is small. On the large side where you'd pay $60k/CPU you've probably also got $600k in hardware and a staff of at least a half-dozen. Guess what? The software & hardware almost always end up as a rounding-error compared to the labor costs. Doesn't really matter if the application is custom or commercial, they both seem to have about the same labor costs.
The reviewers know databases about as well as my grandma knows sports cars. They seem to mean well, and admit that this comparison was complex and hard. In the end they were unfortunately over their head.
PRODUCT SELECTION
1. where's postgresql? This is the product that the commercial vendors need to be the most nervous about. Sure, they're loosing more low-end revenue to mysql right now, but postgresql is getting picked up by some big players. It is far more mature than MySQL, doesn't have the quality issues, isn't partially owned by Oracle, etc.
2. where's at least a mention of all the various other solutions - from Firebird to Derby (Cloudscape)
FUTURE PROOFING
1. They mistakenly say that mysql doesn't require scaling up to enterprise versions like db2/oracle do. This is incorrect because mysql lags behind oracle & db2 for performance in many situations:
- since it doesn't support query parallelism (which provides near linear performance improves to db2/oracle)
- since it doesn't support partitioning (which can provide 10x performance improvements to db2/oracle)
- since it doesn't have a mature optimizers (which means that queries with 5 table joins can tank)
- since it lacks memory tuning flexibility Together this means that as your data increases you have to continue moving a mysql database to larger & larger hardware.
In other words, if you need to scan a table with 10 million rows in it, then join that data against 6 other tables - db2/oracle can:
- leverage partitioning so only scan 1mil rows or so instead of 10mil
- split the scan across four cpus
- leverage more efficiently tuned memory (ensuring little tables & indexes stay in memory)
- use the best possible join and probably complete the query in 1/60th the time that mysql would take. And that means that you could get better performance from db2/oracle on a $25,000 four-way smp than from mysql on a $2,000,000 32-way.
2. They fail to mention that Oracle now owns the most valuable parts of the MySQL solution (Innodb). Oracle has obviously purchased this component (which is how mysql supports transactions, pk/fk constraints, etc) in order to harm MySQL. Since there is no other viable replacement for Innodb the MySQL future is in serious doubt.
3. They probably weren't aware that MySQL is the least ANSI-SQL compliant database in the market. This is means that porting mysql code to another database is a royal pain in the butt compared to code supporting postgresql, db2, etc. Though, to be fair, it is getting much better.
LICENSING COSTS:
1. mysql isn't necessarily free, and can cost more than the commercial alternatives for small distributed commercial apps
2. db2 licensing only provided for DB2 Express- which is the low-cost 2-cpu model. That's often ok, hardly compares to Oracle standard edition also included. Also, I think they may have gotten their db2 costs mixed up between express & workgroup editions.
CONCLUSIONS & MISC
They mentioned some of the great mysql features like clustering and fault tolerance. Sorry, but mysql cluster solution is a separate telecom product that they purchased, that stores your data in memory - limiting your database size to however much memory you can afford. Not a practical solution for very many.
The mysql fault tolerance is really just replication. That's sad.
They mention one strength of mysql is their maximum database size of 64TB - which is nonsense, just because its internal registers and pointers can handle a theoretical maximum of 64TB doesn't mean that it would ever make sense to put more than 20 GB on it. DB2 & Oracle can go to 64TB, but today almost nobody is going beyond 10 TB just due to backup performance, cp
Re:Python vs. compiled Java and C
on
Guido Goes Google
·
· Score: 2, Interesting
> It is true that as a scripting language Python is slower than (byte)compiled languages. But it is slower > by a constant factor.
Python is just as fast as c or java when it comes to io-intensive applications: http://www.osnews.com/story.php?news_id=5602&page= 3 That code that cannot be optimized by Pycho is considerably slower, though the above benchmark exaggerates it through errors in their use of python.
My application processes over twenty million events a day through python - which includes transforming each event, and then applying a metadata-driven validation against each as well. The application is designed to handle a billion events a day. Performance is not a problem. Though at some point we might end up rewriting a few functions in c. We originally thought we would by 10 million events a day, but now realize that we should be fine until about 100 million a day.
And the benefits? Very low labor costs, quick time to market, easy maintenance, almost no data quality problems due to code defects. The small performance trade off has been an extremely good compromise in my experience.
Re:So Turing Machine is your language of choice
on
Larry Wall on Perl 6
·
· Score: 1
> So Turing Machine is your language of choice right?
come on, how about a little proportionality...
> I prefer a tool which best matches my problem domain. If the domain is complex, the tool should be complex.
how about no more complex than necessary?
How about if you are handed a simple log reporting application that someone wrote you don't have to buy a book or two, or spend a day googling to figure out what the original programmer was doing? Or how the language features that they used really work.
Generally, I'm more worried about a language failing to support a project due to internal managability issues - rather than lack of functionality. Knowing that I can point to any piece of code on some of our systems and immediately recognize all language features and approaches to problems greatly simplifies my job.
And that job involves data mining, scoring frameworks, queing systems, metadata-driven interfaces between multiple operational systems and a data warehouse, publishing & subscribing between a data warehouse and multiple redundant data marts. In short, hardly a simplistic domain.
> If someone feels that using the full scope of Perl results in messiness, they aren't forced by any means to use that full scope. There are > many Perl coders who limit themselves to the "C subset" of Perl. But unlike certain other unnamed languages, Perl doesn't try to > play the role of parent in telling you what you can and can't express so those who are more comfortable with a wider breadth of > linguistic forms can take advantage of that and make code that is, in a word, elegant.
Hmmm, I find that there is more elegance in a solution that uses just a few components consistently and well, than in a solution that has a vast number of components in a variety of ways.
'One way to do things' is language philosophy that may occasionally increase implementation costs, but usually shrinks learning curves and maintenance costs. That's usually a good deal.
> I made it to page 149 where it says "Python uses the indentation of statements under a header to group the statements in a nested block." > I stopped reading and tossed the book on my bookshelf on a shelf full of unused & unloved technical manuals.
One of the best things a programmer can do is try different languages. Try lisp, sql, haskell. Play with xml and yaml. Compare J2EE to Ruby on Rails. Try a language that doesn't use ALGOL-inherited code blocks. Just like an 80s ACM article said, the single best way to evaluate a programmer is by the number of languages they're fluent in.
At the end of the day Python's indentation causes a few programs, but seems to solve more. It makes it hard to share source code via email. It rules out the use of tabs. I can live with those limits. On the flip side it helps reinforce readable code. That's a very good thing - and consistent with the fundamental philosophy of the lanuage: the code must be easy to maintain.
But if you really can't get your head around that, then try Ruby. Like Python it's a well-designed, easily maintained language with a great community and future.
> Sounds like a moving target to me. No matter what mysql does (or doesn't do), it will never be "good enough", because > elitists will always need something to bash. Even if it was just the postgresql codebase renamed. It would still > "suck" because it's "mysql".
nah, once the capability = the hype, then there will be other targets for scorn.
> Sorta like the (open|free|net)bsd zealots who bash linux. They're so insecure in their choice of OS > that they need to put down something else in order to feel better.
nice, a faith-based argument in which facts don't matter - and even pointing to short-comings in a product just proves you're wrong.
Kind of like:
Brian: I'm not the Messiah! Will you please listen? I am not the Messiah, do you understand? Honestly!
Woman: Only the true Messiah denies His divinity!
Brian: What?! Well, what sort of chance does that give me? All right... I AM the Messiah!
Followers, en masse: He is! He is the Messiah!
> I'm also you don't like the licensing, but I'm not going to argue philosophy when I have the > practical experience of NEVER coming up against it, over the years. I use it as a tool, not > as something to repackage and resell.
Well, somebody is coming up against it - it pays their bills, and the possibility of making this revenue is what got them their investment dollars.;-) But, if you can commit to staying completely GPL you're probably fine. Not everybody can. Nor given the changes to their license in the past is there any guarantee that future inconvenient changes won't be made.
> Please expand on why Innodb is a valid reason to reject MySQL or even make it unattractive.
Oracle now owns Innodb. They compete with mysql for low-end database revenue, and they certainly didn't buy it to do MySQL any favors. Their solution is to either increase licensing fees and gain revenue off mysql, to harm MySQL by GPLing Innodb, or just to undermine MySQL growth by injecting uncertainly into its future. There are other possibilities but none seem credible.
This is a very real threat, one that took the MySQL folks almost a month to even respond to directly. Sure, they can fork Innodb, but they lack the personnel & skills to pull that off. And there are no other comparable products for them to go to in the market.
Given that Innodb is where about 80% of the innovation and must-have capabilities within MySQL come from (transactions, foreign key constraints, etc) their future is seriously in question. Until this is resolved, I would not use this database for a new project unless my shop was already 100% commited to MySQL.
> For websites, MySQL still seems like a good choice. MySpace uses MySQL. I wonder what they would have to say?
I think it was a good choice three years ago, but not really since then. The thing about using mysql for your website is that you'll probably want some other product for other applications internally. And then what? You've got to now learn and support multiple database products. Given the cost of labor, that's generally not a great strategy. It's generally cheaper to stick with a single product.
> MySQL is fine for the vast majority of applications out there.
Ya, I've heard that line of bs from mysql for about a half-dozen years:
- they said it when they didn't have transactions - and it wasn't true
- they said it when they didn't have unions or subselects - and it wasn't true
- they said it when they didn't have referential integrity - and it wasn't true
- they said it when they didn't have triggers, stored procs, and views - and it wasn't true
Now, they've resolved *most* of the problems, and it's *almost* true. Sure, you can build robust applications with it. Of course, you can build robust applications with msql as well - it's just the extra effort that is required to achive "robustness" when:
- silent errors and data corruption problems current and historical
- frequent deviations from ansi sql (comments, nulls, etc)
- simple optimizer that is notorious for performance problems on 5+ way joins
- if you're planning on having your app run at various isps, most don't support current version - leaving you stuck historical issues (no views, etc)
- lack of parallelism or partitioning features - giving it about 2-5% of the speed of oracle/db2/informix when it comes to large table scans (reporting, analytics, etc)
So, sure. You can build robust apps with it. But man, it is so much more work than using postgresql. Let alone db2 or oracle. Maybe this makes sense for somebody (asp model targeting large number of isps) where you can afford the economics of re-inventing the wheel since most isps are running back-level versions.
Now, this might change in two years. Assuming that MySQL comes up with a substitute for Innodb (no attractive options yet), simplifies their licensing, and resolves the most significant existing issues. Then yes, it will be a reasonable option, right up there with postgesql, etc. Until then save your licensing dollars for something better and freer.
> The US is a great nation because it has taken risks. For better or worse, those risks have propelled us forward.
and cleverly positioned ourselves thousands of miles from Hirohito & Hitler! Those fools that set up countries closer were completely overrun, then while they were rebuilding we took over much of the economy around the world.
> I would rather live proudly in a country that isn't afraid to face issues than to live in a state of mediocrity.
Right, I personally admire our great leadership for the way they decisively tackle:
- governmental incompetence brought about by new old boys network
- foreign debt that threatens our entire econmy
- foreign oil dependency that threatens our entire economy
- growing polarization between the faith-based & reality-based parts of the country
- high cost of health care that leaves many without *any* health care
- attacks on US soil by immediately attacking an uninvolved third-party (that has huge amounts of oil) As I watch us lose yet another industry (auto), and watch us put engineers out to pasture while we outsource to india and china(!) I'm left just happy & glad to know that our leadership isn't afraid to face issues.
> Take this as cocky if you must but where would the world be without the U.S. involved in the past 100 years?
Right, i'm sure the rest of the world is just jealous of our great successes in Viet Nam, Iraq, and Afganistan.
That's what is really fabulous about 'merica: it really doesn't matter what the rest of the world thinks. We've got our own reality goin' on: http://www.warblogging.com/archives/000935.php
> For read-only, or even read-mostly, MySQL is blisteringly fast.
I think you mean that when doing lookups of a very small (less than 1%) set of data from a single table with simple queries that mysql understands the b-tree index in myisam or oracle's innodb is as fast as any other database. In the case of myisam maybe a little more, in the case of innodb maybe a little less.
I'm sure you don't mean that when selecting 10% of the data of a single table of the database (thereby unable to do b-tree lookups) and doing table scans instead that it is very fast at all. It might be competitive with postgresql, firebird, and sqllite there, but falls *completely* behind oracle, db2, informix, sqlbase, and now sql server when using partitioning. Or parallelism.
And you probably didn't mean that it was fast when handling complex queries. It's notoriously bad about handling them.
> On Linux, with a disk caching policy of "Never, ever commit anything unless you have to swap something > from RAM, or are about to umount the file system" and enough RAM to cache the whole table file, MySQL writes > almost as fast as it reads. OSes with more conservative policies, such as insisting to decache often and > verifying before releasing the RAM, obviously won't be so fast {but who'll be laughing at who when the power > comes back on?}.
Wouldn't this be better resolved through a storage adapter with 128 mbytes or more of battery-backed disk cache, and then turning on write-caching - and having your storage system handle it? In this scenario you are very unlikely to corrupt or lose data due to a power outage or crash.
And you had mentioned large files - what if you've got a 10 gbyte file? Doing lots of concurrent writing to it? This won't fit into memory, so now you're back to the writing-at-the-speed-of-a-snail speed.
> What is it with the MySQL bashers around here?
- too much hype - company leadership that covered up missing *basic* features in the product for years insisting people don't need them anyway - unsubstantiated claims (blisteringly fast) that end up being gross exaggerations at best - most non-ansi implementation in the marketplace - deliberately complex dual-licensing scheme that doesn't comply with GPL - inability to handle even moderately complex queries - absolutely bizarre exception handling issues (silent truncations, etc)
MySQL is a success story, but mostly a marketing success story. It started as a sql layer on top of flat files - not intended to be a database management system - just a file management system. And then people applied this tool to database management - without even the most basic of features (views, transactions, etc).
This isn't to say that people can't successfully use it for database management. Of course you can. You can also pull stumps with a ford explorer. It's just that the explorer wasn't really designed for pulling stumps, and a tractor does it *so* much better.
> o DB2 has a different locking model that has lock escalation, which causes frequent deadlocks (i.e. on concurrent DDL).
Actually, I find that genuine deadlocks are very rare, and are usually resolved by simply getting the apps to use the same table access sequence. Waits are much more common - in which one app has to wait for another to commit its transaction in order to access the same object, and the transaction can die if it has to wait too long. In my experience it's usually a symptom of bad tuning that's easy to fix on all but the busiest app designed specifically for oracle.
> DB2 does not expose many statistics for tuning in an easy-to-use format like Oracle does (V$ tables). Sure, you > can set event monitors, but they are cumbersome, and they don't provide enough information. And where are the > timed event (or wait) statistics?
yeah, db2 could use better runtime stats. Personally, I never bother with the event monitors, especially since they're only good for finding the needle in a haystack. Have you tried the snapshot functions & commands? I find them much easier to work with and more valuable.
> o DB2's Java-based tools are slow and bulky. Yep like oracle, you want a client with 512 mbytes of memory to use them. Even then they're occasionally a pita. There are a few other tools you can get tho - toad (expensive), quest(very expensive), aqt (very cheap), etc. Also, note that you really don't want the 8.1.0 version of the admin client - that was very slow.
> o DB2 doesn't expose space usage statistics for objects such as tables and indexes. The documentation literally > says to _calculate_ the object size!
don't know about that - by performing runstats you'll have the number of rows, the average row size, the number of pages physically allocated to each table. Now, you could calculate size logically based on row num * row size, or get the exact physical allocation through total pages * page size for any table or index. That isn't bad.
DB2 definitely has a few warts. Many of them are the legacy of supporting various communication protocols or fashionable technologies over the years (typed tables, appc, etc). Others are just the occasional failure to achive consistency in the interface (db2batch vs db2 cli arguments, etc).
Still, the improvements in the last four years have been great:
- mdc
- mqt
- most admin tasks can now be done online
- etc administration is now quite easy:
- add a storage device to a tablespace and db2 automatically, and online rebalanced all your data across all devices
- various wizards can recommend configuration & tuning settings
- client tools, though bulky, can generate almost any command you want
- I've found it easy to train new dbas - without any formal training - just through OTJ and a few months these guys have become fully proficient.
Right now I'm running a mission-critical data warehouse and set of marts on db2. I could fully license the warehouse for $15k total list. The cost for oracle would be $80k. This project is now supporting hundreds of customers, well over a tb of data, 100,000 queries a day on mostly very old hardware. And it was a complete snap to set up - without even a db2 dba at the time. Caveat - I did have plenty of dba & warehousing experience on other databases, not saying no dba required.
But if I used oracle for this project I would have had to spent a lot more time trying to convince people to fund it because of the greater licensing cost and it never would have flown without a dedicated oracle dba from the begining.
> So for more agility in your database designs, you endorse LESS normalization? I can't imagine a less normalized > databse every being more agile than a properly normalized one. Either I'm missing what you mean by dynamic > model, or you don't understand the benefits normalization.
Right - i'm not talking about 'denormalization' - in the way that you would denormalize a modeling to simplify sql and improve performance on a reporting application. I'm talking about not applying that set of database modeling rules at all.
> You do know one of the main goals of the relational model was to allow agility right?
Yep, and it has done that well: relational databases are far more agile than the hierarchical ones that preceded them. But - they aren't agile enough for some problems.
For example, lets say that you have a bicycle-shop-management application that you sell to small shops. You sell it for, what? $5,000 plus 18% annual maintenance. It handles bicycle inventory, sales, some light marketing, etc. Well, one day one of your customers decides to sell books about bicycles. Well, perhaps you've got a generic inventory table that he can describe things in - but if you've got a 3-5NF model - it isn't that generic. There are no columns specific to books in it. And he really can't afford to spend $10-50k on an update to support that.
So, ideally you've got a model in which some attributes of items are kept in key-value pair tables. This isn't wonderful for a lot of reasons - but it does give the application owner the ability to define new kinds of attributes that were unforseen by the dba. And, if done well, he can even define (in the database) rules for when some of these attributes are required, what their domain is, what their type is, what their default is, etc. These "dynamic attributes" would give the user the ability to create whatever new columns they want to describe the entity "book".
Additionally, you could design the model to support the concept of "dynamic entities": in which concepts such as book, bike, helmet, wrench, tire can be logical subtypes of inventory item. Not just identified through a single simple tag - these concepts can be related through many-to-many relationships to one another, to multiple stores, to customers, etc. The relationships between these entities can be dated, prioritized, weighted, and the entities can inherit from multiple parents in this case. Now when the store owner wants to add the concept of book they can *easily* also create overlapping sub-categories below it (mountain biking, road biking, family biking, competitive biking, history, etc) - and then relate these items to other inventory items that share that category. End result - you click on the bike shop's web site and look at a heading called "winter biking" - and see everything remotely related to this concept. And - it was easy to set up, and there's nothing specific to "winter biking" in the structure of the data.
Sort of similar to what the topic maps community is trying to do with XML: http://www.topicmaps.org/
Though in my opinion they are only shooting for a subset of what we should be trying to do at this time, and what we can do via relational databases or whatever. Still, with strong db2 support for topic maps that may be the easiest way to go for now.
> You are not quite correct. There will be one database engine, and one optimizer. There will be > two query languagges. XML will be stored on the disk in a different format than the tabular storage already. > The parser, optimizer, and database engine have been enhanced to understand XQuery. As a matter of fact, > you can combine query languages, for example using XQuery in a subquery of a SQL query.
hmmm, i wonder how this will interact with everything else?
It's difficult sometimes to explain to people why they might not want to implement fringe features (thinking OORDBMS stuff like typed tables, etc) - various wonderful things just don't support them (online reorgs, whatever). Of course, no vendor ever likes to discuss these issues in a marketing campaign;-)
Any thoughts of the usefulness of XQuery compared to SQL?
> I don't think that this functionality is a category killer. But I can imagine why some people love the idea. > Lots of people would like to be able to define records in their RDBMS that have arbitrary fields that the > designer of the schema did not know about when the database was built. SQL does not cope with this scenario at all.
Well, relational databases can handle this situation - you just have to avoid relational *modeling* within the database. And the challenge you get into at that point is that you lose some valuable features such as foreign key support, etc. But it is doable, just performs slowly and is labor-intensive to create. Still, it has its place.
> However in my view correct normalisation solves most of these issues and makes > the need for native XML unnecessary.
Hmmm, not sure about that. Dynamic models weren't really an issue thirty years ago when Codd was coming up with these ideas: business changed much more slowly. Today we're changing business rules so quickly - and expect to modify major systems in the blink of an eye. As mentioned above, we can support this using relational systems, but we end up heading away from normalization, not towards it.
> The complexity of such an implementation would be high, particularly within the context of a > database that still has good indexing, table management and performance. Foreign keys would > be an intriguing challenge. There is nothing about the problem that is inherently unsolvable but performance would be a real challenge.
Until people determine what the 'best practices' are for such a database they can get into trouble with it: how do you convert the data when the application changes? Is it easy as it is today with relational databases? Or do you have to write entire data conversion apps (like you did with hierarchical databases twenty years ago). How do you design & tune for performance? How do you handle data quality? Well, it will probably be best to start with some very small projects;-)
> With Oracle's purchase of Inno-DB and their recent release of a free version of their database software, > it looks like a war will be shaping up over the low end of the database market.
I think you're completely right there - the big vendors know that the little databases generate cash too - and mindshare. They can't afford to lose it. This is a market-protection plan for them.
> Besides for being open-source, what advantages do PostgreSQL and MySQL have over Oracles' 10g Express, > Microsoft's SQL Server 2005 Express, and IBM's proposed DB2 Express?
Well, MySQL is tarnished for a few reasons now: - future is uncertain due to innodb buy-out - history of inexplicable data quality and exception handling issues - dual-licensing complexity
Postgresql is looking much better now: - they had some performance problems 3-4 years ago, but are now well-beyond that - is completely free - starting to get picked up within very large commercial applications
In comparison SQL Server express and Oracle express offer: - a free database for very small applications - the opportunity to deploy a tiny database, then replace it with a larger one without any application code changes - opportunity for vendors and shops to reduce the number of supported databases
DB2 Express offers: - a low-cost database (last I looked it was around $750/server) - with much more scalability than sqlserver/oracle express versions:
- no storage limitations
- partitioning is included (via mdc)
- just two cpus (don't know if they can be multi-core or not)
- I think 64-bit memory is supported, but is still limited to 4GB
So, Oracle & SQL Server have one strategy (offer an extremely limited product for free), while DB2 has another (offer a slightly-limited product for less than or about cost of MySQL). IBM might change the DB2 strategy, but I hope they just add a extremely-limited free version, and keep the existing express version.
And this strategy works: I've got oracle, sql server, db2, postgresql, and mysql in our organization, and am standardizing on db2. When we get a small database it uses a cheap db2 license. This keeps my labor costs down (which are far more than the software costs). If it wasn't for the cheaper licensed versions I'd probably be putting all of the small databases on postgresql - and growing that skillbase within the organization.
> Both now have native XML support with XQuery, both do stored procedures > although SQL Server has for some time.
So has DB2 - though SQL Server stored procedures (inherited from Sybase) are the easiest to work with in the industry. And have almost no exception handling, though perhaps (I doubt) they've improved this in 2005.
> What's interesting is that db2 can have the Zend core bolted on as the equivalent of.NET and that db2 does > very nice document store handling but that's always been a selling point for a while now. I really like the notion > of using it for a document store.
DB2 is very focused on this, with entire sets of add-on products for content management. Never used them, no idea how good they are or how expensive though.
> I wonder what the price point for Viper is going to be in comparison. I already know what it is for the > various versions of SQL Server 2005. Ouch!
Note that db2 will now have two database engines: the xml engine, and the relational engine. Both are getting upgraded in Viper. The relational engine is picking up:
- Oracle-style partitioning (to add to its own three other types)
- Label-based access controls (to handle more complex security requirements)
- adaptive self-tuning memory (to automatically configure itself)
I've got no idea how they plan to price the XML engine within Viper. But I'm asuming that the existing database engine will be priced similarly to how it is priced today.
For example, the top-end db2 product costs $32k/CPU, with some potential add-ones if you want spatial analysis or whatever. Oracle is $50k/CPU and the add-ons tend to be much more critical ($10k/CPU for partitioning, etc). SQL Server's top end product is now $25k/CPU. These are all list prices - and subject to huge discounts.
But of the three DB2 includes a lot more in their base product - so you can actually use the lowest-end products at about $1000/*server* and still get a partitioning solution that can handle a terabyte data warehouse. Or $7.5k/CPU and out-perform the SQL Server database at $25k/CPU on warehousing & reporting apps.
SQL Server is getting a lot of press on their analysis services and new reporting services features. DB2 partners with essbase for olap engine equiv to analysis services. I think this would also be an equiv to reporting services, but I haven't yet seen a good detailed description of exactly what that is. It looks like little more than crystal reports or brio from what I've seen so far. If that's the case then there's little there of real value. Still, it's obvious that db2 & oracle both need to shore up this area of functionality.
> It still sucks for real time applications. DB2 is a good warehouse DB, good for batch processes and such.
The differences between oracle & db2 for transactional apps are mostly: - db2 is about 1/3rd the cost of oracle - db2 is faster - db2 includes some warehousing features (range-partitioning via MDC) for free which are often also useful in these applications - db2 is simpler to administer - oracle has a locking interface that's easier to use (MVC instead of row-locks) - db2 likes to use static sql that requires binds (pita, but optional)
> I must admit those IBM guys know how to butter the sales to the management with all those golf subscriptions, > hockey tickets what have you.
Hmmm, i've worked with sales staff from quite a few different companies. But I've never worked with people as nasty as at oracle. They go *way* beyond mere buttering up of management all the way to stabbing the technical staff in the back when the want their professional services team to get their work, or when the oracle product fails to deliver the labor savings that sales promised. Oh, and then there's the famous oracle trick of leaving vital pieces of the product out of the discounted original deal, and slaying the customer when they discover that these are required...
> Now, Sql Server 2005 and Oracle have excellent Xml support right now, not next year. Which means IBM, you are late.
That's incorrect - DB2 has supported XML for probably two years now. What they're rolling out is a database engine that has much improved support for XML. Prior to Viper the existing database engine would convert the xml to/from tabular format within the database.
Now, what's the value? Well, this should allow more functionality and flexibility for XML queries, and should also allow for queries against XML data to also include non-XML data.
1. The client was a defense contractor. defense contractors are some of the most absolutely incompetent companies I've ever worked with. Just as bad as telecom (old at&t) and government.
2. The client apparently went with a waterfall project plan, in which there were few if any milestones. And surprise, they discover at the very end that there are problems. Duh.
3. According to the poster, the client wasn't capable of simple math: didn't know that the contracting run rate would consume their budget before the project was complete. Again, duh.
4. According to the poster, IBM was charging $325 for everyone. That doesn't sound accurate in my experience with IBM (and other large consulting companies) - in which a couple of top people would be at $325, and the shock troops anywhere from $150-$225.
5. Also, the customer hired programmers for a small project from a large system integrator. That's never a good way to save money, it's a good way to assemble a team overnight.
6. The poster doesn't really understand knowledge management, business intelligence, or customer relationship marketing. By simply dismissing these domains as over-hyped, he's just revealing ignorance. This isn't to say that everyone needs everything that all vendors claim they can deliver, but these are huge domains full of history and detail. And can deliver a lot *if* you understand them and their best practices. If you don't, then you're probably buying/building the wrong solution anyway.
On the flip side, I do agree that IBM has a hard time holding onto top talent. They don't pay enough, and their bureacracy can be a pain in the butt. When you get a team you should absolutely interview every member, and put milestones in the project where you can jettison the team if they suck. But, this isn't an IBM-thing, it's something you should do for whatever team you work with.
> But the RPG hobby has become seriously consumerrhoidic.
yeah, i think that transition occured around 1990 when TSR started to spew books for every stupid combination: the left-handed gnome handbook, etc.
The review above mentioned page after page of prestige classes. Same tactic: force everyone to buy hundreds of books.
> Playing the game should be the point of the hobby . . . not collecting books.
ideally, unless of course the books are just great reads. Like the GURPS worldbooks - even if you don't play gurps, they are often well-researched and useful on topics from steampunk to ancient egypt to voodoo to cliffhangers: http://www.sjgames.com/gurps/books/
Plus, the Goblins campaign book is one of the funniest things I've ever read in RPG. Just absolutely fabulous.
I suspect that they dealt with mysql hoping for a cheaper alternative to oracle. Right now, IBM's the only real game in town for them: both other major commercial database owners are competitors to SAP. MySQL was probably the best option - since a few years ago anyway they had the best performance, and they're really the least open source of the open source options.
And this desire for independence from Oracle is probably how they probably ended up with MaxDB as well - which was SAPDB, which was Adabas - an early 1980s pre-relational database. It would be about rock bottom in any kind of database ranking.
But in the end both of these efforts have failed: Adabas was antiquated and probably not very useful, and MySQL fatally stumbled by letting Oracle buy their best part out from under them (Innodb). I'd think that at this point SAP is looking for a new strategy - Oracle appears to be getting stronger, not weaker, and their opensource partner is nearly useless. Maybe they'll reconsider their Microsoft merge?
> The reason we worry about fan traps is that we know our end-users would get themselves into trouble > with them if they had to report off of the relational schema directly and this in itself would stop us > from rolling out tools to let them make their own reports. With a semantic layer in between the > relational schema and the user and with neat graphical tools for them to use, allowing them direct > access to the data becomes feasible. We hope.
cool - anything that can help you get there iteratively is probably a good thing.
> Speaking of I/O performance. Is it possible I wonder to have an IO intensive Python script run faster than C?
I think your point is valid:
The first performance technique i usually pursue is to just increase read buffering so that you're reading 4k+ in each time from disk. And that's the same in c as it is in python.
The next is to locally cache data: in which the top 10-100 or so entries are kept in a local list. This is far easier to do in python than c, and can result in a big boost.
The next is to consider other more complex options like proprocessing the data, processing in steps with separate sorts, etc. All of these take time, which I'm more likely to be able to afford if I'm writing in python than c.
Beyond those techniques I'm probably going to spend most of my time on hardware and the data store itself, which is irrelevant to programming language.
> Second - Sweet Hog of Prague! Oracle 10g costs $24 grand Per CPU!?!?!?!?
oh, it can be *far* more expensive than that. The enterprise version is $40k/CPU, and that doesn't even including partitioning. To get Partitioning (and yes, you want it for any large database) you're looking at an extra $10k/CPU. And there are other extra charges as well. You can easily end up at $60k/CPU.
On the flip side, you can also get away with $5k/CPU if you know what you're doing, and if what you're doing is small. On the large side where you'd pay $60k/CPU you've probably also got $600k in hardware and a staff of at least a half-dozen. Guess what? The software & hardware almost always end up as a rounding-error compared to the labor costs. Doesn't really matter if the application is custom or commercial, they both seem to have about the same labor costs.
The reviewers know databases about as well as my grandma knows sports cars. They seem to mean well, and admit that this comparison was complex and hard. In the end they were unfortunately over their head.
PRODUCT SELECTION
1. where's postgresql? This is the product that the commercial vendors need to be the most nervous about. Sure, they're loosing more low-end revenue to mysql right now, but postgresql is getting picked up by some big players. It is far more mature than MySQL, doesn't have the quality issues, isn't partially owned by Oracle, etc.
2. where's at least a mention of all the various other solutions - from Firebird to Derby (Cloudscape)
FUTURE PROOFING
1. They mistakenly say that mysql doesn't require scaling up to enterprise versions like db2/oracle do. This is incorrect because mysql lags behind oracle & db2 for performance in many situations:
- since it doesn't support query parallelism (which provides near linear performance improves to db2/oracle)
- since it doesn't support partitioning (which can provide 10x performance improvements to db2/oracle)
- since it doesn't have a mature optimizers (which means that queries with 5 table joins can tank)
- since it lacks memory tuning flexibility
Together this means that as your data increases you have to continue moving a mysql database to larger & larger hardware.
In other words, if you need to scan a table with 10 million rows in it, then join that data against 6 other tables - db2/oracle can:
- leverage partitioning so only scan 1mil rows or so instead of 10mil
- split the scan across four cpus
- leverage more efficiently tuned memory (ensuring little tables & indexes stay in memory)
- use the best possible join
and probably complete the query in 1/60th the time that mysql would take. And that means that you could get better performance from db2/oracle on a $25,000 four-way smp than from mysql on a $2,000,000 32-way.
2. They fail to mention that Oracle now owns the most valuable parts of the MySQL solution (Innodb). Oracle has obviously purchased this component (which is how mysql supports transactions, pk/fk constraints, etc) in order to harm MySQL. Since there is no other viable replacement for Innodb the MySQL future is in serious doubt.
3. They probably weren't aware that MySQL is the least ANSI-SQL compliant database in the market. This is means that porting mysql code to another database is a royal pain in the butt compared to code supporting postgresql, db2, etc. Though, to be fair, it is getting much better.
LICENSING COSTS:
1. mysql isn't necessarily free, and can cost more than the commercial alternatives for small distributed commercial apps
2. db2 licensing only provided for DB2 Express- which is the low-cost 2-cpu model. That's often ok, hardly compares to Oracle standard edition also included. Also, I think they may have gotten their db2 costs mixed up between express & workgroup editions.
CONCLUSIONS & MISC
They mentioned some of the great mysql features like clustering and fault tolerance. Sorry, but mysql cluster solution is a separate telecom product that they purchased, that stores your data in memory - limiting your database size to however much memory you can afford. Not a practical solution for very many.
The mysql fault tolerance is really just replication. That's sad.
They mention one strength of mysql is their maximum database size of 64TB - which is nonsense, just because its internal registers and pointers can handle a theoretical maximum of 64TB doesn't mean that it would ever make sense to put more than 20 GB on it. DB2 & Oracle can go to 64TB, but today almost nobody is going beyond 10 TB just due to backup performance, cp
> It is true that as a scripting language Python is slower than (byte)compiled languages. But it is slower
= 3
> by a constant factor.
Python is just as fast as c or java when it comes to io-intensive applications:
http://www.osnews.com/story.php?news_id=5602&page
That code that cannot be optimized by Pycho is considerably slower, though the above benchmark exaggerates it through errors in their use of python.
My application processes over twenty million events a day through python - which includes transforming each event, and then applying a metadata-driven validation against each as well. The application is designed to handle a billion events a day. Performance is not a problem. Though at some point we might end up rewriting a few functions in c. We originally thought we would by 10 million events a day, but now realize that we should be fine until about 100 million a day.
And the benefits? Very low labor costs, quick time to market, easy maintenance, almost no data quality problems due to code defects. The small performance trade off has been an extremely good compromise in my experience.
> So Turing Machine is your language of choice right?
come on, how about a little proportionality...
> I prefer a tool which best matches my problem domain. If the domain is complex, the tool should be complex.
how about no more complex than necessary?
How about if you are handed a simple log reporting application that someone wrote you don't have to buy a book or two, or spend a day googling to figure out what the original programmer was doing? Or how the language features that they used really work.
Generally, I'm more worried about a language failing to support a project due to internal managability issues - rather than lack of functionality. Knowing that I can point to any piece of code on some of our systems and immediately recognize all language features and approaches to problems greatly simplifies my job.
And that job involves data mining, scoring frameworks, queing systems, metadata-driven interfaces between multiple operational systems and a data warehouse, publishing & subscribing between a data warehouse and multiple redundant data marts. In short, hardly a simplistic domain.
> If someone feels that using the full scope of Perl results in messiness, they aren't forced by any means to use that full scope. There are
> many Perl coders who limit themselves to the "C subset" of Perl. But unlike certain other unnamed languages, Perl doesn't try to
> play the role of parent in telling you what you can and can't express so those who are more comfortable with a wider breadth of
> linguistic forms can take advantage of that and make code that is, in a word, elegant.
Hmmm, I find that there is more elegance in a solution that uses just a few components consistently and well, than in a solution that has a vast number of components in a variety of ways.
'One way to do things' is language philosophy that may occasionally increase implementation costs, but usually shrinks learning curves and maintenance costs. That's usually a good deal.
> I made it to page 149 where it says "Python uses the indentation of statements under a header to group the statements in a nested block."
> I stopped reading and tossed the book on my bookshelf on a shelf full of unused & unloved technical manuals.
One of the best things a programmer can do is try different languages. Try lisp, sql, haskell. Play with xml and yaml. Compare J2EE to Ruby on Rails. Try a language that doesn't use ALGOL-inherited code blocks. Just like an 80s ACM article said, the single best way to evaluate a programmer is by the number of languages they're fluent in.
At the end of the day Python's indentation causes a few programs, but seems to solve more. It makes it hard to share source code via email. It rules out the use of tabs. I can live with those limits. On the flip side it helps reinforce readable code. That's a very good thing - and consistent with the fundamental philosophy of the lanuage: the code must be easy to maintain.
But if you really can't get your head around that, then try Ruby. Like Python it's a well-designed, easily maintained language with a great community and future.
> Sounds like a moving target to me. No matter what mysql does (or doesn't do), it will never be "good enough", because
> elitists will always need something to bash. Even if it was just the postgresql codebase renamed. It would still
> "suck" because it's "mysql".
nah, once the capability = the hype, then there will be other targets for scorn.
> Sorta like the (open|free|net)bsd zealots who bash linux. They're so insecure in their choice of OS
> that they need to put down something else in order to feel better.
nice, a faith-based argument in which facts don't matter - and even pointing to short-comings in a product just proves you're wrong.
Kind of like:
Brian: I'm not the Messiah! Will you please listen? I am not the Messiah, do you understand? Honestly!
Woman: Only the true Messiah denies His divinity!
Brian: What?! Well, what sort of chance does that give me? All right... I AM the Messiah!
Followers, en masse: He is! He is the Messiah!
> I'm also you don't like the licensing, but I'm not going to argue philosophy when I have the
;-) But, if you can commit to staying completely GPL you're probably fine. Not everybody can. Nor given the changes to their license in the past is there any guarantee that future inconvenient changes won't be made.
> practical experience of NEVER coming up against it, over the years. I use it as a tool, not
> as something to repackage and resell.
Well, somebody is coming up against it - it pays their bills, and the possibility of making this revenue is what got them their investment dollars.
> Please expand on why Innodb is a valid reason to reject MySQL or even make it unattractive.
Oracle now owns Innodb. They compete with mysql for low-end database revenue, and they certainly didn't buy it to do MySQL any favors. Their solution is to either increase licensing fees and gain revenue off mysql, to harm MySQL by GPLing Innodb, or just to undermine MySQL growth by injecting uncertainly into its future. There are other possibilities but none seem credible.
This is a very real threat, one that took the MySQL folks almost a month to even respond to directly. Sure, they can fork Innodb, but they lack the personnel & skills to pull that off. And there are no other comparable products for them to go to in the market.
Given that Innodb is where about 80% of the innovation and must-have capabilities within MySQL come from (transactions, foreign key constraints, etc) their future is seriously in question. Until this is resolved, I would not use this database for a new project unless my shop was already 100% commited to MySQL.
> For websites, MySQL still seems like a good choice. MySpace uses MySQL. I wonder what they would have to say?
I think it was a good choice three years ago, but not really since then. The thing about using mysql for your website is that you'll probably want some other product for other applications internally. And then what? You've got to now learn and support multiple database products. Given the cost of labor, that's generally not a great strategy. It's generally cheaper to stick with a single product.
> MySQL is fine for the vast majority of applications out there.
Ya, I've heard that line of bs from mysql for about a half-dozen years:
- they said it when they didn't have transactions - and it wasn't true
- they said it when they didn't have unions or subselects - and it wasn't true
- they said it when they didn't have referential integrity - and it wasn't true
- they said it when they didn't have triggers, stored procs, and views - and it wasn't true
Now, they've resolved *most* of the problems, and it's *almost* true. Sure, you can build robust applications with it. Of course, you can build robust applications with msql as well - it's just the extra effort that is required to achive "robustness" when:
- silent errors and data corruption problems current and historical
- frequent deviations from ansi sql (comments, nulls, etc)
- simple optimizer that is notorious for performance problems on 5+ way joins
- if you're planning on having your app run at various isps, most don't support current version - leaving you stuck historical issues (no views, etc)
- lack of parallelism or partitioning features - giving it about 2-5% of the speed of oracle/db2/informix when it comes to large table scans (reporting, analytics, etc)
So, sure. You can build robust apps with it. But man, it is so much more work than using postgresql. Let alone db2 or oracle. Maybe this makes sense for somebody (asp model targeting large number of isps) where you can afford the economics of re-inventing the wheel since most isps are running back-level versions.
Now, this might change in two years. Assuming that MySQL comes up with a substitute for Innodb (no attractive options yet), simplifies their licensing, and resolves the most significant existing issues. Then yes, it will be a reasonable option, right up there with postgesql, etc. Until then save your licensing dollars for something better and freer.
...unless you've got a build process that ensures it or testing tools that compare them for you.
> The US is a great nation because it has taken risks. For better or worse, those risks have propelled us forward.
and cleverly positioned ourselves thousands of miles from Hirohito & Hitler! Those fools that set up countries closer were completely overrun, then while they were rebuilding we took over much of the economy around the world.
> I would rather live proudly in a country that isn't afraid to face issues than to live in a state of mediocrity.
Right, I personally admire our great leadership for the way they decisively tackle:
- governmental incompetence brought about by new old boys network
- foreign debt that threatens our entire econmy
- foreign oil dependency that threatens our entire economy
- growing polarization between the faith-based & reality-based parts of the country
- high cost of health care that leaves many without *any* health care
- attacks on US soil by immediately attacking an uninvolved third-party (that has huge amounts of oil)
As I watch us lose yet another industry (auto), and watch us put engineers out to pasture while we outsource to india and china(!) I'm left just happy & glad to know that our leadership isn't afraid to face issues.
> Take this as cocky if you must but where would the world be without the U.S. involved in the past 100 years?
Right, i'm sure the rest of the world is just jealous of our great successes in Viet Nam, Iraq, and Afganistan.
That's what is really fabulous about 'merica: it really doesn't matter what the rest of the world thinks. We've got our own reality goin' on:
http://www.warblogging.com/archives/000935.php
> For read-only, or even read-mostly, MySQL is blisteringly fast.
I think you mean that when doing lookups of a very small (less than 1%) set of data from a single table with simple queries that mysql understands the b-tree index in myisam or oracle's innodb is as fast as any other database. In the case of myisam maybe a little more, in the case of innodb maybe a little less.
I'm sure you don't mean that when selecting 10% of the data of a single table of the database (thereby unable to do b-tree lookups) and doing table scans instead that it is very fast at all. It might be competitive with postgresql, firebird, and sqllite there, but falls *completely* behind oracle, db2, informix, sqlbase, and now sql server when using partitioning. Or parallelism.
And you probably didn't mean that it was fast when handling complex queries. It's notoriously bad about handling them.
> On Linux, with a disk caching policy of "Never, ever commit anything unless you have to swap something
> from RAM, or are about to umount the file system" and enough RAM to cache the whole table file, MySQL writes
> almost as fast as it reads. OSes with more conservative policies, such as insisting to decache often and
> verifying before releasing the RAM, obviously won't be so fast {but who'll be laughing at who when the power
> comes back on?}.
Wouldn't this be better resolved through a storage adapter with 128 mbytes or more of battery-backed disk cache, and then turning on write-caching - and having your storage system handle it? In this scenario you are very unlikely to corrupt or lose data due to a power outage or crash.
And you had mentioned large files - what if you've got a 10 gbyte file? Doing lots of concurrent writing to it? This won't fit into memory, so now you're back to the writing-at-the-speed-of-a-snail speed.
> What is it with the MySQL bashers around here?
- too much hype
- company leadership that covered up missing *basic* features in the product for years insisting people don't need them anyway
- unsubstantiated claims (blisteringly fast) that end up being gross exaggerations at best
- most non-ansi implementation in the marketplace
- deliberately complex dual-licensing scheme that doesn't comply with GPL
- inability to handle even moderately complex queries
- absolutely bizarre exception handling issues (silent truncations, etc)
MySQL is a success story, but mostly a marketing success story. It started as a sql layer on top of flat files - not intended to be a database management system - just a file management system. And then people applied this tool to database management - without even the most basic of features (views, transactions, etc).
This isn't to say that people can't successfully use it for database management. Of course you can. You can also pull stumps with a ford explorer. It's just that the explorer wasn't really designed for pulling stumps, and a tractor does it *so* much better.
> o DB2 has a different locking model that has lock escalation, which causes frequent deadlocks (i.e. on concurrent DDL).
Actually, I find that genuine deadlocks are very rare, and are usually resolved by simply getting the apps to use the same table access sequence. Waits are much more common - in which one app has to wait for another to commit its transaction in order to access the same object, and the transaction can die if it has to wait too long. In my experience it's usually a symptom of bad tuning that's easy to fix on all but the busiest app designed specifically for oracle.
> DB2 does not expose many statistics for tuning in an easy-to-use format like Oracle does (V$ tables). Sure, you
> can set event monitors, but they are cumbersome, and they don't provide enough information. And where are the
> timed event (or wait) statistics?
yeah, db2 could use better runtime stats. Personally, I never bother with the event monitors, especially since they're only good for finding the needle in a haystack. Have you tried the snapshot functions & commands? I find them much easier to work with and more valuable.
> o DB2's Java-based tools are slow and bulky.
Yep like oracle, you want a client with 512 mbytes of memory to use them. Even then they're occasionally a pita. There are a few other tools you can get tho - toad (expensive), quest(very expensive), aqt (very cheap), etc. Also, note that you really don't want the 8.1.0 version of the admin client - that was very slow.
> o DB2 doesn't expose space usage statistics for objects such as tables and indexes. The documentation literally
> says to _calculate_ the object size!
don't know about that - by performing runstats you'll have the number of rows, the average row size, the number of pages physically allocated to each table. Now, you could calculate size logically based on row num * row size, or get the exact physical allocation through total pages * page size for any table or index. That isn't bad.
DB2 definitely has a few warts. Many of them are the legacy of supporting various communication protocols or fashionable technologies over the years (typed tables, appc, etc). Others are just the occasional failure to achive consistency in the interface (db2batch vs db2 cli arguments, etc).
Still, the improvements in the last four years have been great:
- mdc
- mqt
- most admin tasks can now be done online
- etc
administration is now quite easy:
- add a storage device to a tablespace and db2 automatically, and online rebalanced all your data across all devices
- various wizards can recommend configuration & tuning settings
- client tools, though bulky, can generate almost any command you want
- I've found it easy to train new dbas - without any formal training - just through OTJ and a few months these guys have become fully proficient.
Right now I'm running a mission-critical data warehouse and set of marts on db2. I could fully license the warehouse for $15k total list. The cost for oracle would be $80k. This project is now supporting hundreds of customers, well over a tb of data, 100,000 queries a day on mostly very old hardware. And it was a complete snap to set up - without even a db2 dba at the time. Caveat - I did have plenty of dba & warehousing experience on other databases, not saying no dba required.
But if I used oracle for this project I would have had to spent a lot more time trying to convince people to fund it because of the greater licensing cost and it never would have flown without a dedicated oracle dba from the begining.
> So for more agility in your database designs, you endorse LESS normalization? I can't imagine a less normalized
> databse every being more agile than a properly normalized one. Either I'm missing what you mean by dynamic
> model, or you don't understand the benefits normalization.
Right - i'm not talking about 'denormalization' - in the way that you would denormalize a modeling to simplify sql and improve performance on a reporting application. I'm talking about not applying that set of database modeling rules at all.
> You do know one of the main goals of the relational model was to allow agility right?
Yep, and it has done that well: relational databases are far more agile than the hierarchical ones that preceded them. But - they aren't agile enough for some problems.
For example, lets say that you have a bicycle-shop-management application that you sell to small shops. You sell it for, what? $5,000 plus 18% annual maintenance. It handles bicycle inventory, sales, some light marketing, etc. Well, one day one of your customers decides to sell books about bicycles. Well, perhaps you've got a generic inventory table that he can describe things in - but if you've got a 3-5NF model - it isn't that generic. There are no columns specific to books in it. And he really can't afford to spend $10-50k on an update to support that.
So, ideally you've got a model in which some attributes of items are kept in key-value pair tables. This isn't wonderful for a lot of reasons - but it does give the application owner the ability to define new kinds of attributes that were unforseen by the dba. And, if done well, he can even define (in the database) rules for when some of these attributes are required, what their domain is, what their type is, what their default is, etc. These "dynamic attributes" would give the user the ability to create whatever new columns they want to describe the entity "book".
Additionally, you could design the model to support the concept of "dynamic entities": in which concepts such as book, bike, helmet, wrench, tire can be logical subtypes of inventory item. Not just identified through a single simple tag - these concepts can be related through many-to-many relationships to one another, to multiple stores, to customers, etc. The relationships between these entities can be dated, prioritized, weighted, and the entities can inherit from multiple parents in this case. Now when the store owner wants to add the concept of book they can *easily* also create overlapping sub-categories below it (mountain biking, road biking, family biking, competitive biking, history, etc) - and then relate these items to other inventory items that share that category. End result - you click on the bike shop's web site and look at a heading called "winter biking" - and see everything remotely related to this concept. And - it was easy to set up, and there's nothing specific to "winter biking" in the structure of the data.
Sort of similar to what the topic maps community is trying to do with XML:
http://www.topicmaps.org/
Though in my opinion they are only shooting for a subset of what we should be trying to do at this time, and what we can do via relational databases or whatever. Still, with strong db2 support for topic maps that may be the easiest way to go for now.
> You are not quite correct. There will be one database engine, and one optimizer. There will be
;-)
> two query languagges. XML will be stored on the disk in a different format than the tabular storage already.
> The parser, optimizer, and database engine have been enhanced to understand XQuery. As a matter of fact,
> you can combine query languages, for example using XQuery in a subquery of a SQL query.
hmmm, i wonder how this will interact with everything else?
It's difficult sometimes to explain to people why they might not want to implement fringe features (thinking OORDBMS stuff like typed tables, etc) - various wonderful things just don't support them (online reorgs, whatever). Of course, no vendor ever likes to discuss these issues in a marketing campaign
Any thoughts of the usefulness of XQuery compared to SQL?
> I don't think that this functionality is a category killer. But I can imagine why some people love the idea.
;-)
> Lots of people would like to be able to define records in their RDBMS that have arbitrary fields that the
> designer of the schema did not know about when the database was built. SQL does not cope with this scenario at all.
Well, relational databases can handle this situation - you just have to avoid relational *modeling* within the database. And the challenge you get into at that point is that you lose some valuable features such as foreign key support, etc. But it is doable, just performs slowly and is labor-intensive to create. Still, it has its place.
> However in my view correct normalisation solves most of these issues and makes
> the need for native XML unnecessary.
Hmmm, not sure about that. Dynamic models weren't really an issue thirty years ago when Codd was coming up with these ideas: business changed much more slowly. Today we're changing business rules so quickly - and expect to modify major systems in the blink of an eye. As mentioned above, we can support this using relational systems, but we end up heading away from normalization, not towards it.
> The complexity of such an implementation would be high, particularly within the context of a
> database that still has good indexing, table management and performance. Foreign keys would
> be an intriguing challenge. There is nothing about the problem that is inherently unsolvable but performance would be a real challenge.
Until people determine what the 'best practices' are for such a database they can get into trouble with it: how do you convert the data when the application changes? Is it easy as it is today with relational databases? Or do you have to write entire data conversion apps (like you did with hierarchical databases twenty years ago). How do you design & tune for performance? How do you handle data quality? Well, it will probably be best to start with some very small projects
> With Oracle's purchase of Inno-DB and their recent release of a free version of their database software,
> it looks like a war will be shaping up over the low end of the database market.
I think you're completely right there - the big vendors know that the little databases generate cash too - and mindshare. They can't afford to lose it. This is a market-protection plan for them.
> Besides for being open-source, what advantages do PostgreSQL and MySQL have over Oracles' 10g Express,
> Microsoft's SQL Server 2005 Express, and IBM's proposed DB2 Express?
Well, MySQL is tarnished for a few reasons now:
- future is uncertain due to innodb buy-out
- history of inexplicable data quality and exception handling issues
- dual-licensing complexity
Postgresql is looking much better now:
- they had some performance problems 3-4 years ago, but are now well-beyond that
- is completely free
- starting to get picked up within very large commercial applications
In comparison SQL Server express and Oracle express offer:
- a free database for very small applications
- the opportunity to deploy a tiny database, then replace it with a larger one without any application code changes
- opportunity for vendors and shops to reduce the number of supported databases
DB2 Express offers:
- a low-cost database (last I looked it was around $750/server)
- with much more scalability than sqlserver/oracle express versions:
- no storage limitations
- partitioning is included (via mdc)
- just two cpus (don't know if they can be multi-core or not)
- I think 64-bit memory is supported, but is still limited to 4GB
So, Oracle & SQL Server have one strategy (offer an extremely limited product for free), while DB2 has another (offer a slightly-limited product for less than or about cost of MySQL). IBM might change the DB2 strategy, but I hope they just add a extremely-limited free version, and keep the existing express version.
And this strategy works: I've got oracle, sql server, db2, postgresql, and mysql in our organization, and am standardizing on db2. When we get a small database it uses a cheap db2 license. This keeps my labor costs down (which are far more than the software costs). If it wasn't for the cheaper licensed versions I'd probably be putting all of the small databases on postgresql - and growing that skillbase within the organization.
> Both now have native XML support with XQuery, both do stored procedures
.NET and that db2 does
> although SQL Server has for some time.
So has DB2 - though SQL Server stored procedures (inherited from Sybase) are the easiest to work with in the industry. And have almost no exception handling, though perhaps (I doubt) they've improved this in 2005.
> What's interesting is that db2 can have the Zend core bolted on as the equivalent of
> very nice document store handling but that's always been a selling point for a while now. I really like the notion
> of using it for a document store.
DB2 is very focused on this, with entire sets of add-on products for content management. Never used them, no idea how good they are or how expensive though.
> I wonder what the price point for Viper is going to be in comparison. I already know what it is for the
> various versions of SQL Server 2005. Ouch!
Note that db2 will now have two database engines: the xml engine, and the relational engine. Both are getting upgraded in Viper. The relational engine is picking up:
- Oracle-style partitioning (to add to its own three other types)
- Label-based access controls (to handle more complex security requirements)
- adaptive self-tuning memory (to automatically configure itself)
I've got no idea how they plan to price the XML engine within Viper. But I'm asuming that the existing database engine will be priced similarly to how it is priced today.
For example, the top-end db2 product costs $32k/CPU, with some potential add-ones if you want spatial analysis or whatever. Oracle is $50k/CPU and the add-ons tend to be much more critical ($10k/CPU for partitioning, etc). SQL Server's top end product is now $25k/CPU. These are all list prices - and subject to huge discounts.
But of the three DB2 includes a lot more in their base product - so you can actually use the lowest-end products at about $1000/*server* and still get a partitioning solution that can handle a terabyte data warehouse. Or $7.5k/CPU and out-perform the SQL Server database at $25k/CPU on warehousing & reporting apps.
SQL Server is getting a lot of press on their analysis services and new reporting services features. DB2 partners with essbase for olap engine equiv to analysis services. I think this would also be an equiv to reporting services, but I haven't yet seen a good detailed description of exactly what that is. It looks like little more than crystal reports or brio from what I've seen so far. If that's the case then there's little there of real value. Still, it's obvious that db2 & oracle both need to shore up this area of functionality.
> It still sucks for real time applications. DB2 is a good warehouse DB, good for batch processes and such.
The differences between oracle & db2 for transactional apps are mostly:
- db2 is about 1/3rd the cost of oracle
- db2 is faster
- db2 includes some warehousing features (range-partitioning via MDC) for free which are often also useful in these applications
- db2 is simpler to administer
- oracle has a locking interface that's easier to use (MVC instead of row-locks)
- db2 likes to use static sql that requires binds (pita, but optional)
> I must admit those IBM guys know how to butter the sales to the management with all those golf subscriptions,
> hockey tickets what have you.
Hmmm, i've worked with sales staff from quite a few different companies. But I've never worked with people as nasty as at oracle. They go *way* beyond mere buttering up of management all the way to stabbing the technical staff in the back when the want their professional services team to get their work, or when the oracle product fails to deliver the labor savings that sales promised. Oh, and then there's the famous oracle trick of leaving vital pieces of the product out of the discounted original deal, and slaying the customer when they discover that these are required...
> Now, Sql Server 2005 and Oracle have excellent Xml support right now, not next year. Which means IBM, you are late.
That's incorrect - DB2 has supported XML for probably two years now. What they're rolling out is a database engine that has much improved support for XML. Prior to Viper the existing database engine would convert the xml to/from tabular format within the database.
Now, what's the value? Well, this should allow more functionality and flexibility for XML queries, and should also allow for queries against XML data to also include non-XML data.
A few points:
1. The client was a defense contractor. defense contractors are some of the most absolutely incompetent companies I've ever worked with. Just as bad as telecom (old at&t) and government.
2. The client apparently went with a waterfall project plan, in which there were few if any milestones. And surprise, they discover at the very end that there are problems. Duh.
3. According to the poster, the client wasn't capable of simple math: didn't know that the contracting run rate would consume their budget before the project was complete. Again, duh.
4. According to the poster, IBM was charging $325 for everyone. That doesn't sound accurate in my experience with IBM (and other large consulting companies) - in which a couple of top people would be at $325, and the shock troops anywhere from $150-$225.
5. Also, the customer hired programmers for a small project from a large system integrator. That's never a good way to save money, it's a good way to assemble a team overnight.
6. The poster doesn't really understand knowledge management, business intelligence, or customer relationship marketing. By simply dismissing these domains as over-hyped, he's just revealing ignorance. This isn't to say that everyone needs everything that all vendors claim they can deliver, but these are huge domains full of history and detail. And can deliver a lot *if* you understand them and their best practices. If you don't, then you're probably buying/building the wrong solution anyway.
On the flip side, I do agree that IBM has a hard time holding onto top talent. They don't pay enough, and their bureacracy can be a pain in the butt. When you get a team you should absolutely interview every member, and put milestones in the project where you can jettison the team if they suck. But, this isn't an IBM-thing, it's something you should do for whatever team you work with.
> But the RPG hobby has become seriously consumerrhoidic.
yeah, i think that transition occured around 1990 when TSR started to spew books for every stupid combination: the left-handed gnome handbook, etc.
The review above mentioned page after page of prestige classes. Same tactic: force everyone to buy hundreds of books.
> Playing the game should be the point of the hobby . . . not collecting books.
ideally, unless of course the books are just great reads. Like the GURPS worldbooks - even if you don't play gurps, they are often well-researched and useful on topics from steampunk to ancient egypt to voodoo to cliffhangers:
http://www.sjgames.com/gurps/books/
Plus, the Goblins campaign book is one of the funniest things I've ever read in RPG. Just absolutely fabulous.
I suspect that they dealt with mysql hoping for a cheaper alternative to oracle. Right now, IBM's the only real game in town for them: both other major commercial database owners are competitors to SAP. MySQL was probably the best option - since a few years ago anyway they had the best performance, and they're really the least open source of the open source options.
And this desire for independence from Oracle is probably how they probably ended up with MaxDB as well - which was SAPDB, which was Adabas - an early 1980s pre-relational database. It would be about rock bottom in any kind of database ranking.
But in the end both of these efforts have failed: Adabas was antiquated and probably not very useful, and MySQL fatally stumbled by letting Oracle buy their best part out from under them (Innodb). I'd think that at this point SAP is looking for a new strategy - Oracle appears to be getting stronger, not weaker, and their opensource partner is nearly useless. Maybe they'll reconsider their Microsoft merge?
> The reason we worry about fan traps is that we know our end-users would get themselves into trouble
> with them if they had to report off of the relational schema directly and this in itself would stop us
> from rolling out tools to let them make their own reports. With a semantic layer in between the
> relational schema and the user and with neat graphical tools for them to use, allowing them direct
> access to the data becomes feasible. We hope.
cool - anything that can help you get there iteratively is probably a good thing.
good luck,
ken