In the past I've found that sql server is faster than oracle straight out of the box. Don't know if this applies to Yukon or 10g.
However, SQL Server doesn't (or didn't) have an equiv to oracle's table partitioning. So, any time you're doing table scans (say you need to access more than 3% of the table) you've got to scan everything. With oracle (or db2, informix, etc) you can partition the table so that you only scan the data you care about (maybe 10%).
This alone gives oracle a vast performance edge over sql server. The downside is that partitioning costs extra, sometimes it isn't purchased, and some applications don't perform tablesscans. Reporting & search engines do tho, big time.
> What technology was being used for this and what kinds of errors? One of our hopes for using Business Universes or Data Models is that > they can be used to shield users from common mistakes that would lead to incorrect reports.
good question - this was financial analysts using sas. which would lead to more errors than users with metadata-driven reports.
but even when using microstrategy, business objects, etc I've seen some of the same thing. Since warehousing is usually an iterative effort, the data architecture changes over time. These changes can require users to make changes to old reports - such as adding new filters to old reports to continue to get the same meaning that they had before.
Regarding fan traps - that's great if the tool can prevent it. But why are you seeing it in the first place? Is this a measure in a fact table that is repeated across many rows and requires complete grouping to be additive? Is it a measure in a snowflake dimension(rare)? Or is it a column in a relational (not dimensional) model that is also supporting reporting? If the latter case then, I wonder if a better solution is a reporting model that would potentially solve this as well as dozens of different performance problems at the same time?
You need to re-read my post. I didn't say Oracle, I said DB2. DB2 is typically half the price of oracle, and when it comes to reporting/warehousing systems it is even cheaper since one of its partitioning solutions is included in the lowest-end products. I suspect that oracle could also offer a cheaper reporting solution than sql server, but i'm not as familiar with their current licensing.
Anyhow, for db2 the low-end db2 version isn't free, it costs $750. It is limited to two cpus, 4 gbytes memory, and unlimited storage. It supports multi-dimensional clustering for reporting queries (MDC), materialized views (MQT), and a wide variety of other highly useful data warehousing features. On this kind of server, depending on the amount of summarization being performed, you could easily keep a terabyte of raw data.
Once it runs out of steam, you could then up it to the workstation edition - which supports 64-bit installation (no practical memory limits), and four cpus. I think the cost of this edition is $1500/cpu.
Now, using MDC on four cpus this warehouse can easily handle a huge amount of work. But if you run out here then it's time to go with db2's beowulf-style partitioning. In this scenario you can cluster across n-number of separate servers. Could be 10,250,1000 separate servers. But you will pay dearly - the cost is about $32k/cpu, which is almost the cost of high-end four-way servers and their dedicated storage. Discounts again apply, and this cost can be worth it. The important thing is that you can start at the low end and work up.
> That's not a data warehouse, that's a data convenience store.
No, that is a warehouse. You're probably thinking of a data mart with hundreds of users. That is more expensive. But not what I said. The data mart will cost you more, especially if it's on the web. A four-way data mart on the internet supporting your customers with, say 16 gbytes of memory, will run you $7500 / cpu (list price).
But the important thing is that the most important database in the architecture (the warehouse) will cost you less than $1500. That's nothing.
So, back to sql server. Like I said, it is sometimes the cheapest option, but often not. Certainly not in this configuration:
- db2 data warehouse:
- small: $1500
- medium: $7500 / cpu
- large: $32000 / cpu
- sql server warehouse:
- small/medium/large: $25000 / cpu
- db2 data mart:
- medium: $7500 / cpu
- large: $32000 / cpu
- sql server data mart: $25000 / cpu
SQL Server does have other editions besides enterprise that are considerably cheaper. But they lack partitioning - which is a key feature for reporting. So, in order to get reasonable performance you will have to use their top of the line product.
In this scenario, db2 is vastly cheaper than sql server. The $7500/cpu edition of DB2 is faster than the $25000/cpu version of sql server given a single SMP server. And the $32000/cpu edition of db2 is far faster than the $25000/cpu version of sql server across a cluster. With that kind of price and scalability sql server isn't even remotely competitive in warehousing, reporting, etc.
> Err, I'm afraid I don't follow your analogy there. Now, if there were like 4 available custom web site tool sets, and you had to pay > $800 per dev seat up front, and a $300 yearly upgrade fee, and those tools all sucked. Then yes, I would agree with your analogy.
Those aren't the only options. What are reports? Well, they can be carefully formatted charts/graphs/tables that print nicely. But that sucks, most people don't care about print any more. What else are they? Tables with pagination, sorting, and navigation along with charts and graphs with similar navigation.
You can easily create very professional, very easy to use online olap-oriented reporting sites using something as simple as PHP & Chart Director. Just focus on a site-map that looks like a genuine OLAP cube, just flatten it out like you'd flatten a globe into multiple 2D maps and then link them together well. And it won't be crappy.
> a) I don't good
> b) Why not? Key-value pair tables can't be flattned out. You can't use them as dimensions to filter or sort your queries. Well, not without a ton of sql. They're very handy sometimes for applications, but just don't work for reporting.
d) Why not? oo databases pretty much died when it became clear that they couldn't handle the massive table scans involved in reporting. At first they brushed this off as completely unnecessary for an application, but eventually people got tired of spending a lot of cash on an OO database, and then spending another lot of cash on a relational database to support reporting.
g2) Entirely dependant on the database design. a single transactional database seldom has complete history - it's both a performance impact, and complete pain in the butt to maintain historical versions of every row in every table. Setting up foreign keys, writing sql, etc, becomes a nightmare. Further, a single transactional database seldom has much data from other related transactional databases (unless it's an integration-oriented one like a CRM database). But you need history for trending, and you need data from other transactions for more data points.
> You're a data warehouse guy, I get it. And I love being able to use warehouse designs in specific situations, but its not the end > all be all solution. Warehousing is awesome for reporting, but it is a pain in the ass for data management, security, integrity, and space.
that depends on the implementation: a data warehouse is just a consolidation & distribution point for data. It's great for archiving data efficiently, security usually isn't a problem, and it is an *amazing* tool for finding data quality problems in source systems that you'd never find otherwise.
Now a data mart on the other hand (what a warehouse publishes to) can be a horrific space hog. And is complex to tighten security down on. But there are approaches to these problems as well.
> That's why a lot of data warehouses are based off of summarized OO databases.
hmmm, i've never encountered that. Do you mean oo applications sitting on top of relational databases? If so, I see that all the time. But if you mean an actual oo database, hmm, haven't even run into one in four years now...
> But, we have another application. An internal/consultant developed Commission application. The thing is huge, tracks hundred of > sale members, service agreements, commissions, splits, charge backs, payouts, etc... And it was designed from the ground > up with and emphasis on historical data. They can pull up the exact and complete history on every item, sales person, accessory, > department, office, order, etc. The trade off however is performance. Like you mentioned, even on a beefcake server, it is > impossible to gather full real time data for reporting. So a series of data warehousing processes are run at night, or after > changes to a record that alter multiple entities.
> Yeah MySQL can kick Ingres's ass from here to next week!!! Yet another example of how open-source developers can compete with the > so-called 'enterprise class' software corporations on their own terms.
Laf, nobody calls Engris "Enterprise-class" except for CA. The last year it was competitive was probably something like 1988. So, mysql can compete with 20 year old technology? Good job, especially now that it has taken how long? Seven years to pick up views?
Please, you might be better off pointing to a more mature and less buggy product than mysql for an example of ass-kicking competance.
hmmm, I've found that particular form of reporting to get quickly unmanageable - with an explosion of hundreds of almost identical reports created, but not maintained by users. At one shop I visited they had so many that we could only test a sample of them - and found almost 50% to give incorrect results.
Since most reports of this type are created by users looking for answers to simple questions, i've found that creating dashboards with great navigation & drilling between graphs, charts, and tables to be a better option. This can then be managed by the warehouse or transactional database owners with its accuracy ensured.
Sigh, i've got one database that needs an upgrade right now and the only obstacle is the brio users with their reports that will break. I can't fix their reports, so I will have to give them some terribly-long transition time.
This was in 2001, and i don't remember the details very well. The price was from microsoft without any discounts. We had lost our partnership status since our certified employees left so we were forced to look at full price. Even if we could have gotten it at $20k - that would have cost a total of $80k which that company absolutely couldn't afford. And is hardly cheap compared to oracle or db2.
> Our shop supports MS, Oracle, DB2, and postgres. And you can move freely between any of those from one to the other, no vendor lock in.
The sql server database I mention was over 100 gbytes in size. Data transfers had been implemented with their terrible ETL tool (DTS), it used a queue manager product from microsoft, sent emails via exchange, etc, etc. Luckily, there was no.net in it. Unfortunately, at that time the stored procedures were too complex to easily migrate to postgresql.
> Vendor lock in is when you use specific functionality inherent to MSSQl(or db2, oracle, post) so > that if you need to move from one to the other, that is the guy who built the databases fault, not > the company that supplies it.
Some product differences are innocent obstacles to portability - such as different locking or partitioning features. This even includes non-ansi sql extensions that you often have to use to meet performance requirements on large databases.
Others are part of a deliberate vendor lock in strategy. Most of the vendor's add-on products fall into this category - all the etl tools, reporting tools, data mining tools, search tools, content management tools, etc. Sometimes these can be useful, but usually they aren't even that good.
Microsoft is heavily pushing the integration of sql server with the rest of the microsoft platform. It is not a bad database, but I wouldn't really recommend it to any shop that doesn't intend to stay a microsoft.
> Do you have any idea how much Oracle actually costs, compared to a Windows Server licence?
Three years ago I faced a $120k charge to upgrade a four-cpu SQL Server7 server to SQL Server 2000.
Was this cheaper than oracle?
Eh, not really - first off, I could have implemented oracle on a two-way server instead of a four-way and gotten equiv performance (search engine queries). Secondly, I could probably have gotten the oracle licenses down to $20k a cpu.
Oracle now has a *free* low-end database.
These days I'm running a multi-terabyte data warehouse for hundreds of customers on db2 - and it's *far* cheaper than SQL Server. Since db2 bundles most of their top-end capabilities into even the smallest version, I could run my warehouse on the db2 express version for $1500 (total).
So yeah, sometimes sql server is the cheapest solution. But it often isn't.
And then comes the topic of vendor lock-in. With db2 or oracle if you want to move to another database later on it isn't such a big deal. I can move most of my apps between oracle, db2, postgresql, sybase, etc fairly easily. Non-standard sql behavior in mysql are a pain in the butt. And any microsoft *platform* code in mssqlserver are a pain in the butt.
> Oh my god do I feel sorry for you. Custom reporting solutions have often been descriped as a pile of crap. Business Objects, the current owners of > Crystal Reports (Formerly of Seagate, Crystal Decision, and lord knows what other companies) has long been considered the best of these options. In > otherwords, they were/are the best piece of crap solution available. Microsoft (insert booing and hissing here) has finally decided to create > their own reporting solution (SQL Reporting) which from what I've heard is a significant improvement over CR XI (yes, 11 versions and there are > still limitations on Cut and Paste)
A few points:
1. business objects has a meta-driven reporting platform, and is one of the top three vendors in the OLAP marketplace.
2. it didn't get there by buying crystal reports - crystal reports is the lowest-end product in their suite
3. the microsoft reporting solution is merely another low-end reporting solution, competes with Crystal Reports, but not the rest of the Business Object suite.
4. the business objects BI (business intelligence) platform is very powerful, and is a reasonable tool to use if you are supporting or performing a vast amount of reporting (lets say you're a financial analyst for a large company). otherwise it's often overkill. Note: there are no cheap/free equivilents to this product - the other commercial ones are sold by Cognos and Microstrategy. These are products that are often sold for hundreds of thousands of dollars.
5. describing custom reporting solutions as piles of crap is about as reasonable as describing custom web sites as piles of crap. Some are better than others. The hard thing for many programmers to get their head around is that when you need to analyze a business process in order to get a good idea of what's going on:
a. you really don't want to have to marshall a few million java objects
b. you really don't want to have key-value pair tables in your model
c. you really don't want to think of data as merely persisted objects
d. you really don't want to have to use an oo database
e. you are much more likely to drive mysql into the ground due to its lack of parallelism
f. if you run the reports on a typical transactional data model you'll do millions of joins and the sql will suck
g. back to f, and the performance will suck, and your server will die
g. back to f, and the functionality of your analysis will suck due to lack of historical data and lack of integrated data from other systems
Reporting is a discipline with its own best practices, patterns and anti-patterns. Unfortunately, most non-reporting people still think that painting reports via Crystal Reports is a neat idea. Which it hasn't been since around 1995.
> The "cost" of licensing linux is one you chose to pay, there are many linux distributions you can get without paying a license fee.
Right, but many software products only support a few distributions. While you can run them on debian, etc - you're up a creek if you start having odd problems with the product. Since the cost of diagnosing problems can quickly exceed the os licensing savings most servers in a commercial setting just go with suse or redhat.
> Linux will run on the POWER5 platforms, IBM actively support it on these platforms too.
Yep, i'm assuming that eventually i'll be running linux on power5. But since we'll probably have to re-compile most source ourselves on that platform I'm dragging my feet on that.
> And as for patches, this is also distribution specific, and it`s also easier to remove what you dont use from linux (consequently you don`t need to install a patch for > something you don`t have installed)
Again, it costs too much in labor to customize our distributions too much: I'm standardizing on a few packages and leaving it at that. So, for example, if one server doesn't need python, perl, php it'll get them anyway - because it so much simpler to keep everything as consistent as possible. Of course, this means extra patching tho. But again even here - I don't want to have to spend hours researching a patch to determine whether or not it applies. Too many servers too little time...
> The ad was targeting commercial Unix, and asserting that it's expensive and inflexible compared to Wintel. In fact, the ad was right.
i've got servers on both aix and redhat (3 & 4). A couple of observations: 1. the aix boxes require fewer patches - this dramatically reduces maintenance cost 2. the aix boxes cost less to license the os 3. the aix boxes are more flexible for managing storage dynamically 4. the power5 (aix) platforms are more expensive, but not necessarily much more 5. the power5 platforms support extremely dynamic lpars which in turn allow me to fit multiple logical servers onto a single physical server with relatively little overhead 6. the power5 platform has all hardware quickly reporting any problems via notification services. The number of servers that have failed or crashed is minimal.
So, i'm actually seeing cost and flexibility benefits to aix over linux. These benefits would be magnified when compared to windows - with its even more frequent patch schedule and reliability limitations (when compared to aix).
Note that I'm not saying this as an aix zealot - I generally prefer command-line productivity on linux to aix/solaris/etc. Nor did I expect to find aix and power5 as competitive as we have. But I guess it's a cool surprise to see that there's still considerable life left in the old unixes. And it's good news (to me) for linux - it means that over the next five years I should expect to see many of these same benefits in the linux world.
> Netezza is a completely parallel architecture. The appliance is essentially a PC (the host) connected to a large number of disk drives that each have > their own FPGA computer attached. The host runs a heavily modified version of PostgreSQL. Data is partitioned (hashed) across all disk drives > (with mirroring for redundancy). There are essentially no indexes, at least how they work with traditional databases, though they do have > some automatic indexing built into the hashing scheme. Queries are executed by programming each disk drive to only return the data that the > host needs. This saves a tremendous amount of bandwidth and processing time. They take the approach since many / most queries require a full > table scan, make full table scans as cheap as possible.
yep, that's an approach commonly used on db2 & oracle: i normally rely on partitioning for most index-like functionality. And spreading the data across multiple share-nothing nodes has been standard for informix and db2 since around 1995. I haven't used that architecture for a while though - it's something I head towards when a single node isn't sufficient: since it takes more labor to administer (when changes to data distribution or nodes occur).
> I don't know what you mean by memory management or storage management. There is essentially zero administration with the Netezza appliance. > You buy an appliance of a certain size. There are no tuning parameters (as far as I know). The only tuning is that you decide how to partition > the data (you can optionally hash on a specific column).
Hmmm, that's cool in a way - if you don't have labor on staff to help with tuning. I've found that manual tuning is always far more effective than automated tuning - it can save you a bundle. On the other hand, you've got to get a good dba.
> Very true, though I haven't seen a database as easy as Netezza. It's standard SQL (based on PostgreSQL), so there isn't much to learn. > The tools are based on the PostgreSQL tools, so those are easy, especially if you've used PostgreSQL.
Yep, very nice little database. But doesn't support query parallelism, and some of the memory management discussed. This means that you'd generally want to parallelize with 1-cpu boxes. Which is ok...except managing a few four-ways is much cheaper than managing 16 1-way cpus. Tuning is easier, patching the boxes is far easier, etc.
> Your IBM solution sounds very interesting and I wish I had the chance to learn about such things. I'd love to hear more. I'm not saying that Netezza > is the best solution out there. Coming from experience with open source databases and Oracle at a small company on PC hardware, Netezza was very > impressive.
ya, sounds like a cool solution. I'm fond of postgresql, so maybe I'll check out one of those web marketing sessions they invite everyone to.
Thanks for the info - it's always better to hear from a user than just another marketing person!:-)
> Netezza. From my brief experience with the product, I can say that it really does live up to their claim > of 10-50 times the performance at half the cost. It is absolutely unsuitable for OLTP, but that's no > surprise as it is designed for data warehousing with adhoc queries. Nothing else on the market comes > close for the price or ease of use.
> Bizgres looks like a clone of their architecture and it might be a competitor at some point. Though, it > won't be nearly as easy to use. Having a single machine delivered ready to go is a lot easier than > setting up a cluster yourself.
the benefits of getting a warehouse appliance are that it's already tuned & sized. The drawbacks include:
- inability to repurpose existing hardware for the data warehouse. This is a serious drawback, since some of the best projects start out very iteratively on borrowed hardware, colocated on existing servers, etc until they prove themselves. Once they've proved the design, then it's much easier to get funding to buy a larger server. If you have to get the funding before the pilot is in production you can spend 6-12 months wasting your time (depending on the organization) documenting, meeting, etc.
- it doesn't sound like netezza or bizgres have all memory management, storage management, or parallel capabilities of db2, oracle, informix, or teradata. From just an MPP perspective I haven't seen anything offered in these products that looks beyond where db2/informix were in 1995.
- it's sometimes very useful to have a more general-purpose solution. you can save quite a bit of money sometimes. This month I'm rolling out a warehouse and two marts on a P570 with separate lpars - in which the marts can fail over to one another. This allows me to reboot a mart without a loss in availability - and since they share computing resources - without having to buy double the machine.
- another set of technology to master - it's generally easier to be an expert at one database manager than two or three.
Maybe these are best positioned for shops that lack the skills to handle oracle or db2, and don't yet have those products in-house?
> Those production DBA's that do not have their head in the sand after the release of 10g know the score. > Production DBAs are a dying breed. Oracle databases will eventually be self-managing and self-tuning in > the very near future.
Yep, they're trying to get there. It's a long haul from where they started, but the production dba role will be 95% automated in the near future.
It's kind of like IT operations staff. In the old days we always had operators to manage the daily manual tasks of IT. Now this is viewed as a crutch - things should be *so* automated that operators aren't necessary any more.
> What ticks me off is the "know-it-all" Oracle production DBA that tries to tell everyone what he/she needs, rather than the other way around.
I think this is a common symptom of administrators who are too far away from the business. Whether system, network, database, or applications - they guys often start to think of themselves at the center of the universe - instead of serving some business need.
> All the things you mention are normal everyday DBA issues, not-specific to Oracle. right
> The issues you're referring to are mostly problems for the development DBA, and that's where all the smarts are necessary.
This might be part of the confusion: don't separate production vs development dba roles. You're right - that a production dba role can be very simple. So why even bother with it? Unless you've got complex security requirements for it, you are far better off without that role.
So then, back to the dba role. Yep, you need to be able to write complex sql, design a data model, tune the database, select the hardware, and plan and administer a data recovery procedure. And yes, it is a hard job.
> Actually, I'm interested, to tell the truth I haven't had the misfortune of dealing with badly designed databases and I would like to know what it takes for people to make them unable to > provide answers while the information is there.
here's a list of major screw-ups: 1. objects are stored as binaries instead of separate columns 2. data is organized in massive and simple key-value pair columns rather than as separate and dedicated columsn 3. data is kept as strings that must be parsed with application code instead of with sql
lesser screw-ups would include: 4. overly normalized model 5. missing data 6. lack of versioning or history 7. data versioning inconsistently applied 8. data quality inconsistent due to all validation being performed at application layer 9. etc
Many of these issues don't prevent users from getting their data back out the application that wrote it. The problem is that when they have a simple question like, well, how many transactions per day have we been getting for the past year, and is the rate increasing? That there's no way to answer that question through SQL, but more expensive application logic has to be built. Then you next discover that the application that may have performed fine for simple transactions *completely dies* when you try to have it scan a year's worth of data.
The root cause of all this is that many people are still building applications like it was 1985 - and just being able to CRUD a few objects is good enough. Well, it isn't. People generally expect to be able to get insights about a business process in addition to simply creating operational transactions within it. And if you don't know that up front, and then design the database in accordance with the latest fads in J2EE, etc then you *will* suffer once that application starts getting used.
> Try this: host 100 clients on a single server (+backups). 100*128MB vs 100*10MB... *bzzzt* Your solution simply doesn't scale.
Sorry, I really don't think I know of 100 trivial databases that will fit on your 486. I'm sure there are probably tons of little team bowling score databases, grandma recipe databases, etc. But I just can't put my finger on more than a couple.
I don't work on oracle on daily basis, but I do work on db2 which is similar (at least when compared to mysql). And I can tell you that if you did want to put 100 tiny databases on a single db2 server with 1 gbyte of memory you could make it fit. I suspect you could do it almost as easily in Oracle.
Also, in the case of DB2 I suspect that you could probably put *more* databases on db2 for the same amount of memory than mysql. I don't know this to be a fact, but I do know that db2 offers you far more memory & database controls than mysql does.
>> Try this: >> - assuming that your company firewall is getting 5,000,000 rows/day >> - query: "is the current accept & deny traffic for any given protocol and port 5-minute interval more than 3 standard deviations from the mean for its time of day and day of week over >>last 90 days?" >Ugh, and you really do that in SQL? You're using an abysmal tool for this job. By using the right data > structure, you can answer these queries in real time on any hardware, with a minimal memory footprint > and O(1) access time. And while you can say that you have a modern server doing this task, your > solution doesn't scale, while mine does.
The difference is that I'm not targeting discarded computers, but I am targeting low-labor. Hardware is cheap compared to labor. And that query query can be written and promoted into production in 1 hour against an existing database. Meanwhile, you'll spend another $10k in labor to create a real-time monitor that'll probably have to be reset every time your box is rebooted - all to save $5k on hardware.
Databases have been used for these purposes for ten years now. Unfortunately, your understanding of their capabilities may have been limited by the use of a limited tool incapable of this kind of (increasingly common) analytical processing.
> Oracle works better for clusters. > MySQL works better for a single machine.
sorry, no
> MySQL is a lot faster. Oracle takes distributed processing a lot better.
again, no
- oracle has parallel capabilities that on a single box (2-4 way) allow it to completely kill mysql in performance comparisons
- oracle has an optimizer that can handle joins of more than 5 tables without collapsing
- oracle has the ability to rewrite your query to hit an automatically-maintained summary table
The only validity to your claim is that mysql might be a better choice if you're running 1995-era hardware, or are buying new hardware but want less than 128 mbytes of memory (for some unfathomable reason). In this case, ya, mysql can run on a lower memory footprint than oracle.
> on the other hand, on my firewall (486, 32MB ram) MySQL can handle Apache logs (only about 200k hits, though) taking a split second for any reasonable query.
ah, yeah, 1995-era equipment.
ok, now, how about a real-world situation. Try this:
- assuming that your company firewall is getting 5,000,000 rows/day
- query: "is the current accept & deny traffic for any given protocol and port 5-minute interval more than 3 standard deviations from the mean for its time of day and day of week over last 90 days?"
I write queries like that all the time on db2, and it handles them easily - even spanning hundreds of millions of rows. And this is on a single server, not a distributed one. And I could do the same on oracle. But mysql? Not a chance.
> If MySQL will do what you want, then you don't need Oracle.
Not necessarily true, the idea of 'just use the best tool for the job' is very myopic. For exammple, in a 500-person organization with say, twenty database servers, has a need for consistency. Why? Because otherwise they waste a lot of time:
- training dbas and developers on sql extentions and limitations
- training dbas on multiple database backup and restore methods, issues, and management
- managing multiple license types
- training dbas on performance tuning on multiple databases
- etc, etc
In fact, you can often save money this way while paying surprisingly high database licensing costs - through reduced administrative and development labor costs. Don't get me wrong, I still like site-licensed or free databases since it makes architecting solutions so much simpler: you can create a new database without waiting months for funding & procurement. And I'm not really a fan of Oracle.
While it's true that keeping a tiny oracle vanilla database running can be very simple - especially if you don't care much about the data, and aren't developing on it (canned app).
It is not simple to admin if:
- you've got development trying a lot of different ideas - and you're spending all your time researching issues
- you've got a ton of data
- you've got business critical data and need to configure for maximum reliability
- you get into a slightly off-the-path data restoration scenario
And the work of a dba isn't trivial if you are trying to do more than just keep the server running. In that case you should be:
- handling data quality issues
- handling data concurrency issues
- handling multiple apps using the same data elements
- handling replication issues
- designing reporting models
- often implementing the ETL process between operational and reporting models
- performance-tuning databases that are constantly changing over time, often selecting the new hardware, then migrating them to new hardware.
Figuring out where to spend your money to get performance you need: more memory? more cpus? more/faster disk?
- writing queries for developers that don't know how to write simple sql
- writing queries for advanced developers that aren't sure how to get the best performance out of complex queries
- creating automatically-maintained summary tables
- researching fixpacks and upgrades to figure out which you need, and what they'll break; then doing
emergency un-installs or work-arounds when you're wrong.
If your dba job is simple you're probably doing about 1% of what dbas do at our location
> But I still think the deliberately crippled nature of the product makes it unattractive relative to the > open source contenders, in terms that even PHB's will understand: "Boss, if we go with 'free' Oracle, > we're going to run into that disk space limit pretty fast, and then we'll have to pay $$$."
Actually, it looks to me like it will be extremely attractive to quite a few people:
- can easily handle small or departmental databases
- very cheap (free)
- completely consistent
- same codebase as oracle 10g
- should enable easy portability for oracle-enabled applications
- should be able to easily leverage existing oracle & developer skills
While it is true that this version has extreme limitations (1 CPU, 1 gbyte memory, 4 gbytes data), and if this is your *only* database server it may not be a good choice. But on a 3ghz server with 15k disk and good disk cache it could be extremely fast for small projects. Meanwhile, if you've got or need a dozen databases in-house, ranging from small to large, this solution could keep the small database costs down - while also eliminating the labor costs associated with managing a variety of products. Further, if the occasional project needs to go up from 1 CPU to 2 - the licensing costs are fairly cheap. Going from 1 CPU to 64 is a completely different situation (could cost a ton), but then again think of the massive labor savings.
I'm currently involved in leveraging this exact same strategy with db2 (their cheap express product came out about two years ago): we try to migrate all databases to db2 to use consistency to keep labor costs down. Then we choose the database license that'll be the cheapest to purchase and manage. Conversions from mysql are sometimes a pain in the butt due to weirdness of that product - then again, its data quality problems usually make it worthwhile anyway. Postgresql is more complex - the quality of this product is fine, the only benefit to conversion is administrative consistency.
Anyhow, is this product a mysql-killer? No, but could definitely cut into its market:
- ISVs with small databases could use this instead of mysql with no problems
- it is freer than mysql
- ISVs that need more power in their product beyond this can easily pick a more powerful version of Oracle that can completely outpace mysql anyway
(assuming you use parallel capabilities). This will cut down on hardware and development costs.
- mysql's future is now completely up in the air since Oracle has bought out the best part of mysql (innodb)
Of course, postgresql and firebird are still out there. These products are healthy and steadily improving, just lack the market share to compete well with oracle right now.
sorry about the 'exclusive salvage' - I was tired. I meant 'exclusive salvation' - the concept that you can only be saved through the belief of a single god. This concept is primarily pushed by the big-three religions, I suppose this is probably a religious innovation that can be chaulked up to judaism. Anyhow, the net effect is that it enables & encourages people to see others as inferior to themselves, to discard the value of their lives, etc, etc. Contrast this to the religons of the greeks - in which they were happy to accept anyone else's gods.
Thanks for a civil discourse on this subject. I think however that we're unlikely to see eye-to-eye on this subject. It seems to me that you base a lot of faith on the words of one book. And there have been thousands of gods with hundreds of books. Of them all, this may be the most significant book, but it offers no proof, just a plea to believe. Nothing really distinctive that the others don't offer as well. Meanwhile, I feel that the concept of god is a matter of obsolete and primitive philosophy. And I will not discard simple logic for hearsay or propaganda - whether it is passed to me by word of mouth or via a two thousand year old book.
I once wrote a list of reasons for my children why I don't believe in god - one of them touched on your point:
- Why would an all-knowing and all-powerful got bother to create us? After all, he already knows everything we're going to do. Why bother testing us? Assuming there are such a thing as "souls" he could just create them with an artificial past history and put them in his little heaven or hell as appropriate. Why would he do that? Well, yeah - the idea of an all-knowing & all-powerful god contradicts the concept of the christian/jewish/islamic creation.
I don't think god is evil. I think that the concept of god is a primitive idea useful to people who are reaching out for explanations of the world around them. We can do *far* better today than to invent invisible and imaginary creatures to explain the world.
>> 1. so many "followers of the book" are quite eager to kill one another over differrences in their interpretation of it > The proportion of people claiming to be Christians and willing to kill each other over matters of doctrine is fairly small.
The killing of jews by christians is hardly a small matter, the killing of moslems and christians by one another is hardly a small matter, the killings of the hugeunots by the catholics, the persecution of catholics by luther and his followers, the fighting in irelands, etc, etc, etc. I wouldn't call it a small matter.
side note: how many religious wars can you think of between two sets of polytheistic people? The romans and greeks never fought "holy wars", I can't remember any religious wars between greeks and egyptions. This concept of killing people because they believe in a different god seems to be largely a result of the concept of "exclusive salvage" found in the bible.
>> 3. there is any value in graduate studies in christian theology - if the book was "understandable" >> and not subject to so much interpretation then it would read like a printer repair manual.
> False dichotomy. Just because it is easy to understand what is necessary for salvation and a lot of > other stuff as well doesn't mean that there aren't bits requiring more thought and training and deeper > understanding and insight available through further study.
If this was true you wouldn't have hundreds of conflicting formulas for salvage: whether or not you can dance, drink, work on the sabbath, use religious icons and symbols, divorce, etc, etc, etc. And of course, you can say that those that disagree with you are misguided, or aren't christians BUT they call themselves christians are are following the same book. The problem is that the book is SO subject to interpretation. Again, this cuts away at the entire concept of a small god - you'd think he would write something people could agree upon, rather than something that causes wars.
>> Beyond the question of whether or not the thousands of christian, jewish, and islamic sects would >>agree with you in your interpretation...please, if there is a god that is omnipotent and omniscient I >>would sincerely hope that he would communicate a little more clearly than via one person's dreams from >>2,000 years ago. Heck, it sounds like something out of a Diskworld no
> You're being a little silly there, reducing the Bible to 'one person's dreams.' It contains thousands > of years of God clearly intervening in history and making himself known, sometimes in dramatic, > obvious ways, sometimes in more subtle ways, but most significantly in the person of Jesus. To claim > that God has not communicated clearly when we have a document such as the Bible with a historicity far > surpassing that of other documents from antiquity, an event such as the crucifixion which divides time > in two and a life such as Jesus' which few have not heard of, is ludicrous.
Rather than determine the validity of this book through the eyes of a biased convert, take a look at it through the eyes of an objective observer: so, we're supposed to be swayed because the events in the book are documented...in the book itself. That's circular reasoning, that gives it no validity over Homer's Illiad, or any number of religious texts that have survived to today.
Your concept of the significance of events such as the crucifixion is derrived from the book itself, this event isn't documented elsewhere. Plus, much of the bible was written many years after the original events occured, and is a combination of various documents that the editors felt would be handy to read together. Documents from the same period that didn't lend themselves to the editorial vision were discarded - but have been found since. What a *mess*.
Again, if god is a smart guy I think he could communicate the basic message in about 50 pages. Hmmm,
In the past I've found that sql server is faster than oracle straight out of the box. Don't know if this applies to Yukon or 10g.
However, SQL Server doesn't (or didn't) have an equiv to oracle's table partitioning. So, any time you're doing table scans (say you need to access more than 3% of the table) you've got to scan everything. With oracle (or db2, informix, etc) you can partition the table so that you only scan the data you care about (maybe 10%).
This alone gives oracle a vast performance edge over sql server. The downside is that partitioning costs extra, sometimes it isn't purchased, and some applications don't perform tablesscans. Reporting & search engines do tho, big time.
> What technology was being used for this and what kinds of errors? One of our hopes for using Business Universes or Data Models is that
> they can be used to shield users from common mistakes that would lead to incorrect reports.
good question - this was financial analysts using sas. which would lead to more errors than users with metadata-driven reports.
but even when using microstrategy, business objects, etc I've seen some of the same thing. Since warehousing is usually an iterative effort, the data architecture changes over time. These changes can require users to make changes to old reports - such as adding new filters to old reports to continue to get the same meaning that they had before.
Regarding fan traps - that's great if the tool can prevent it. But why are you seeing it in the first place? Is this a measure in a fact table that is repeated across many rows and requires complete grouping to be additive? Is it a measure in a snowflake dimension(rare)? Or is it a column in a relational (not dimensional) model that is also supporting reporting? If the latter case then, I wonder if a better solution is a reporting model that would potentially solve this as well as dozens of different performance problems at the same time?
You need to re-read my post. I didn't say Oracle, I said DB2. DB2 is typically half the price of oracle, and when it comes to reporting/warehousing systems it is even cheaper since one of its partitioning solutions is included in the lowest-end products. I suspect that oracle could also offer a cheaper reporting solution than sql server, but i'm not as familiar with their current licensing.
Anyhow, for db2 the low-end db2 version isn't free, it costs $750. It is limited to two cpus, 4 gbytes memory, and unlimited storage. It supports multi-dimensional clustering for reporting queries (MDC), materialized views (MQT), and a wide variety of other highly useful data warehousing features. On this kind of server, depending on the amount of summarization being performed, you could easily keep a terabyte of raw data.
Once it runs out of steam, you could then up it to the workstation edition - which supports 64-bit installation (no practical memory limits), and four cpus. I think the cost of this edition is $1500/cpu.
Now, using MDC on four cpus this warehouse can easily handle a huge amount of work. But if you run out here then it's time to go with db2's beowulf-style partitioning. In this scenario you can cluster across n-number of separate servers. Could be 10,250,1000 separate servers. But you will pay dearly - the cost is about $32k/cpu, which is almost the cost of high-end four-way servers and their dedicated storage. Discounts again apply, and this cost can be worth it. The important thing is that you can start at the low end and work up.
> That's not a data warehouse, that's a data convenience store.
No, that is a warehouse. You're probably thinking of a data mart with hundreds of users. That is more expensive. But not what I said. The data mart will cost you more, especially if it's on the web. A four-way data mart on the internet supporting your customers with, say 16 gbytes of memory, will run you $7500 / cpu (list price).
But the important thing is that the most important database in the architecture (the warehouse) will cost you less than $1500. That's nothing.
So, back to sql server. Like I said, it is sometimes the cheapest option, but often not. Certainly not in this configuration:
- db2 data warehouse:
- small: $1500
- medium: $7500 / cpu
- large: $32000 / cpu
- sql server warehouse:
- small/medium/large: $25000 / cpu
- db2 data mart:
- medium: $7500 / cpu
- large: $32000 / cpu
- sql server data mart: $25000 / cpu
SQL Server does have other editions besides enterprise that are considerably cheaper. But they lack partitioning - which is a key feature for reporting. So, in order to get reasonable performance you will have to use their top of the line product.
In this scenario, db2 is vastly cheaper than sql server. The $7500/cpu edition of DB2 is faster than the $25000/cpu version of sql server given a single SMP server. And the $32000/cpu edition of db2 is far faster than the $25000/cpu version of sql server across a cluster. With that kind of price and scalability sql server isn't even remotely competitive in warehousing, reporting, etc.
> Err, I'm afraid I don't follow your analogy there. Now, if there were like 4 available custom web site tool sets, and you had to pay
> $800 per dev seat up front, and a $300 yearly upgrade fee, and those tools all sucked. Then yes, I would agree with your analogy.
Those aren't the only options. What are reports? Well, they can be carefully formatted charts/graphs/tables that print nicely. But that sucks, most people don't care about print any more. What else are they? Tables with pagination, sorting, and navigation along with charts and graphs with similar navigation.
You can easily create very professional, very easy to use online olap-oriented reporting sites using something as simple as PHP & Chart Director. Just focus on a site-map that looks like a genuine OLAP cube, just flatten it out like you'd flatten a globe into multiple 2D maps and then link them together well. And it won't be crappy.
> a) I don't
good
> b) Why not?
Key-value pair tables can't be flattned out. You can't use them as dimensions to filter or sort your queries. Well, not without a ton of sql. They're very handy sometimes for applications, but just don't work for reporting.
d) Why not?
oo databases pretty much died when it became clear that they couldn't handle the massive table scans involved in reporting. At first they brushed this off as completely unnecessary for an application, but eventually people got tired of spending a lot of cash on an OO database, and then spending another lot of cash on a relational database to support reporting.
g2) Entirely dependant on the database design.
a single transactional database seldom has complete history - it's both a performance impact, and complete pain in the butt to maintain historical versions of every row in every table. Setting up foreign keys, writing sql, etc, becomes a nightmare. Further, a single transactional database seldom has much data from other related transactional databases (unless it's an integration-oriented one like a CRM database). But you need history for trending, and you need data from other transactions for more data points.
> You're a data warehouse guy, I get it. And I love being able to use warehouse designs in specific situations, but its not the end
> all be all solution. Warehousing is awesome for reporting, but it is a pain in the ass for data management, security, integrity, and space.
that depends on the implementation: a data warehouse is just a consolidation & distribution point for data. It's great for archiving data efficiently, security usually isn't a problem, and it is an *amazing* tool for finding data quality problems in source systems that you'd never find otherwise.
Now a data mart on the other hand (what a warehouse publishes to) can be a horrific space hog. And is complex to tighten security down on. But there are approaches to these problems as well.
> That's why a lot of data warehouses are based off of summarized OO databases.
hmmm, i've never encountered that. Do you mean oo applications sitting on top of relational databases? If so, I see that all the time. But if you mean an actual oo database, hmm, haven't even run into one in four years now...
> But, we have another application. An internal/consultant developed Commission application. The thing is huge, tracks hundred of
> sale members, service agreements, commissions, splits, charge backs, payouts, etc... And it was designed from the ground
> up with and emphasis on historical data. They can pull up the exact and complete history on every item, sales person, accessory,
> department, office, order, etc. The trade off however is performance. Like you mentioned, even on a beefcake server, it is
> impossible to gather full real time data for reporting. So a series of data warehousing processes are run at night, or after
> changes to a record that alter multiple entities.
Right - I didn't mean to imply that you sh
> Yeah MySQL can kick Ingres's ass from here to next week!!! Yet another example of how open-source developers can compete with the
> so-called 'enterprise class' software corporations on their own terms.
Laf, nobody calls Engris "Enterprise-class" except for CA. The last year it was competitive was probably something like 1988. So, mysql can compete with 20 year old technology? Good job, especially now that it has taken how long? Seven years to pick up views?
Please, you might be better off pointing to a more mature and less buggy product than mysql for an example of ass-kicking competance.
hmmm, I've found that particular form of reporting to get quickly unmanageable - with an explosion of hundreds of almost identical reports created, but not maintained by users. At one shop I visited they had so many that we could only test a sample of them - and found almost 50% to give incorrect results.
Since most reports of this type are created by users looking for answers to simple questions, i've found that creating dashboards with great navigation & drilling between graphs, charts, and tables to be a better option. This can then be managed by the warehouse or transactional database owners with its accuracy ensured.
Sigh, i've got one database that needs an upgrade right now and the only obstacle is the brio users with their reports that will break. I can't fix their reports, so I will have to give them some terribly-long transition time.
This was in 2001, and i don't remember the details very well. The price was from microsoft without any discounts. We had lost our partnership status since our certified employees left so we were forced to look at full price. Even if we could have gotten it at $20k - that would have cost a total of $80k which that company absolutely couldn't afford. And is hardly cheap compared to oracle or db2.
.net in it. Unfortunately, at that time the stored procedures were too complex to easily migrate to postgresql.
> Our shop supports MS, Oracle, DB2, and postgres. And you can move freely between any of those from one to the other, no vendor lock in.
The sql server database I mention was over 100 gbytes in size. Data transfers had been implemented with their terrible ETL tool (DTS), it used a queue manager product from microsoft, sent emails via exchange, etc, etc. Luckily, there was no
> Vendor lock in is when you use specific functionality inherent to MSSQl(or db2, oracle, post) so
> that if you need to move from one to the other, that is the guy who built the databases fault, not
> the company that supplies it.
Some product differences are innocent obstacles to portability - such as different locking or partitioning features. This even includes non-ansi sql extensions that you often have to use to meet performance requirements on large databases.
Others are part of a deliberate vendor lock in strategy. Most of the vendor's add-on products fall into this category - all the etl tools, reporting tools, data mining tools, search tools, content management tools, etc. Sometimes these can be useful, but usually they aren't even that good.
Microsoft is heavily pushing the integration of sql server with the rest of the microsoft platform. It is not a bad database, but I wouldn't really recommend it to any shop that doesn't intend to stay a microsoft.
> Do you have any idea how much Oracle actually costs, compared to a Windows Server licence?
Three years ago I faced a $120k charge to upgrade a four-cpu SQL Server7 server to SQL Server 2000.
Was this cheaper than oracle?
Eh, not really - first off, I could have implemented oracle on a two-way server instead of a four-way and gotten equiv performance (search engine queries). Secondly, I could probably have gotten the oracle licenses down to $20k a cpu.
Oracle now has a *free* low-end database.
These days I'm running a multi-terabyte data warehouse for hundreds of customers on db2 - and it's *far* cheaper than SQL Server. Since db2 bundles most of their top-end capabilities into even the smallest version, I could run my warehouse on the db2 express version for $1500 (total).
So yeah, sometimes sql server is the cheapest solution. But it often isn't.
And then comes the topic of vendor lock-in. With db2 or oracle if you want to move to another database later on it isn't such a big deal. I can move most of my apps between oracle, db2, postgresql, sybase, etc fairly easily. Non-standard sql behavior in mysql are a pain in the butt. And any microsoft *platform* code in mssqlserver are a pain in the butt.
> Oh my god do I feel sorry for you. Custom reporting solutions have often been descriped as a pile of crap. Business Objects, the current owners of
> Crystal Reports (Formerly of Seagate, Crystal Decision, and lord knows what other companies) has long been considered the best of these options. In
> otherwords, they were/are the best piece of crap solution available. Microsoft (insert booing and hissing here) has finally decided to create
> their own reporting solution (SQL Reporting) which from what I've heard is a significant improvement over CR XI (yes, 11 versions and there are
> still limitations on Cut and Paste)
A few points:
1. business objects has a meta-driven reporting platform, and is one of the top three vendors in the OLAP marketplace.
2. it didn't get there by buying crystal reports - crystal reports is the lowest-end product in their suite
3. the microsoft reporting solution is merely another low-end reporting solution, competes with Crystal Reports, but not the rest of the Business Object suite.
4. the business objects BI (business intelligence) platform is very powerful, and is a reasonable tool to use if you are supporting or performing a vast amount of reporting (lets say you're a financial analyst for a large company). otherwise it's often overkill. Note: there are no cheap/free equivilents to this product - the other commercial ones are sold by Cognos and Microstrategy. These are products that are often sold for hundreds of thousands of dollars.
5. describing custom reporting solutions as piles of crap is about as reasonable as describing custom web sites as piles of crap. Some are better than others. The hard thing for many programmers to get their head around is that when you need to analyze a business process in order to get a good idea of what's going on:
a. you really don't want to have to marshall a few million java objects
b. you really don't want to have key-value pair tables in your model
c. you really don't want to think of data as merely persisted objects
d. you really don't want to have to use an oo database
e. you are much more likely to drive mysql into the ground due to its lack of parallelism
f. if you run the reports on a typical transactional data model you'll do millions of joins and the sql will suck
g. back to f, and the performance will suck, and your server will die
g. back to f, and the functionality of your analysis will suck due to lack of historical data and lack of integrated data from other systems
Reporting is a discipline with its own best practices, patterns and anti-patterns. Unfortunately, most non-reporting people still think that painting reports via Crystal Reports is a neat idea. Which it hasn't been since around 1995.
> The "cost" of licensing linux is one you chose to pay, there are many linux distributions you can get without paying a license fee.
Right, but many software products only support a few distributions. While you can run them on debian, etc - you're up a creek if you start having odd problems with the product. Since the cost of diagnosing problems can quickly exceed the os licensing savings most servers in a commercial setting just go with suse or redhat.
> Linux will run on the POWER5 platforms, IBM actively support it on these platforms too.
Yep, i'm assuming that eventually i'll be running linux on power5. But since we'll probably have to re-compile most source ourselves on that platform I'm dragging my feet on that.
> And as for patches, this is also distribution specific, and it`s also easier to remove what you dont use from linux (consequently you don`t need to install a patch for
> something you don`t have installed)
Again, it costs too much in labor to customize our distributions too much: I'm standardizing on a few packages and leaving it at that. So, for example, if one server doesn't need python, perl, php it'll get them anyway - because it so much simpler to keep everything as consistent as possible. Of course, this means extra patching tho. But again even here - I don't want to have to spend hours researching a patch to determine whether or not it applies. Too many servers too little time...
> The ad was targeting commercial Unix, and asserting that it's expensive and inflexible compared to Wintel. In fact, the ad was right.
i've got servers on both aix and redhat (3 & 4). A couple of observations:
1. the aix boxes require fewer patches - this dramatically reduces maintenance cost
2. the aix boxes cost less to license the os
3. the aix boxes are more flexible for managing storage dynamically
4. the power5 (aix) platforms are more expensive, but not necessarily much more
5. the power5 platforms support extremely dynamic lpars which in turn allow me to fit multiple logical servers onto a single physical server with relatively little overhead
6. the power5 platform has all hardware quickly reporting any problems via notification services. The number of servers that have failed or crashed is minimal.
So, i'm actually seeing cost and flexibility benefits to aix over linux. These benefits would be magnified when compared to windows - with its even more frequent patch schedule and reliability limitations (when compared to aix).
Note that I'm not saying this as an aix zealot - I generally prefer command-line productivity on linux to aix/solaris/etc. Nor did I expect to find aix and power5 as competitive as we have. But I guess it's a cool surprise to see that there's still considerable life left in the old unixes. And it's good news (to me) for linux - it means that over the next five years I should expect to see many of these same benefits in the linux world.
> Netezza is a completely parallel architecture. The appliance is essentially a PC (the host) connected to a large number of disk drives that each have
:-)
> their own FPGA computer attached. The host runs a heavily modified version of PostgreSQL. Data is partitioned (hashed) across all disk drives
> (with mirroring for redundancy). There are essentially no indexes, at least how they work with traditional databases, though they do have
> some automatic indexing built into the hashing scheme. Queries are executed by programming each disk drive to only return the data that the
> host needs. This saves a tremendous amount of bandwidth and processing time. They take the approach since many / most queries require a full
> table scan, make full table scans as cheap as possible.
yep, that's an approach commonly used on db2 & oracle: i normally rely on partitioning for most index-like functionality. And spreading the data across multiple share-nothing nodes has been standard for informix and db2 since around 1995. I haven't used that architecture for a while though - it's something I head towards when a single node isn't sufficient: since it takes more labor to administer (when changes to data distribution or nodes occur).
> I don't know what you mean by memory management or storage management. There is essentially zero administration with the Netezza appliance.
> You buy an appliance of a certain size. There are no tuning parameters (as far as I know). The only tuning is that you decide how to partition
> the data (you can optionally hash on a specific column).
Hmmm, that's cool in a way - if you don't have labor on staff to help with tuning. I've found that manual tuning is always far more effective than automated tuning - it can save you a bundle. On the other hand, you've got to get a good dba.
> Very true, though I haven't seen a database as easy as Netezza. It's standard SQL (based on PostgreSQL), so there isn't much to learn.
> The tools are based on the PostgreSQL tools, so those are easy, especially if you've used PostgreSQL.
Yep, very nice little database. But doesn't support query parallelism, and some of the memory management discussed. This means that you'd generally want to parallelize with 1-cpu boxes. Which is ok...except managing a few four-ways is much cheaper than managing 16 1-way cpus. Tuning is easier, patching the boxes is far easier, etc.
> Your IBM solution sounds very interesting and I wish I had the chance to learn about such things. I'd love to hear more. I'm not saying that Netezza
> is the best solution out there. Coming from experience with open source databases and Oracle at a small company on PC hardware, Netezza was very
> impressive.
ya, sounds like a cool solution. I'm fond of postgresql, so maybe I'll check out one of those web marketing sessions they invite everyone to.
Thanks for the info - it's always better to hear from a user than just another marketing person!
> Netezza. From my brief experience with the product, I can say that it really does live up to their claim
> of 10-50 times the performance at half the cost. It is absolutely unsuitable for OLTP, but that's no
> surprise as it is designed for data warehousing with adhoc queries. Nothing else on the market comes
> close for the price or ease of use.
> Bizgres looks like a clone of their architecture and it might be a competitor at some point. Though, it
> won't be nearly as easy to use. Having a single machine delivered ready to go is a lot easier than
> setting up a cluster yourself.
the benefits of getting a warehouse appliance are that it's already tuned & sized. The drawbacks include:
- inability to repurpose existing hardware for the data warehouse. This is a serious drawback, since some of the best projects start out very iteratively on borrowed hardware, colocated on existing servers, etc until they prove themselves. Once they've proved the design, then it's much easier to get funding to buy a larger server. If you have to get the funding before the pilot is in production you can spend 6-12 months wasting your time (depending on the organization) documenting, meeting, etc.
- it doesn't sound like netezza or bizgres have all memory management, storage management, or parallel capabilities of db2, oracle, informix, or teradata. From just an MPP perspective I haven't seen anything offered in these products that looks beyond where db2/informix were in 1995.
- it's sometimes very useful to have a more general-purpose solution. you can save quite a bit of money sometimes. This month I'm rolling out a warehouse and two marts on a P570 with separate lpars - in which the marts can fail over to one another. This allows me to reboot a mart without a loss in availability - and since they share computing resources - without having to buy double the machine.
- another set of technology to master - it's generally easier to be an expert at one database manager than two or three.
Maybe these are best positioned for shops that lack the skills to handle oracle or db2, and don't yet have those products in-house?
> Those production DBA's that do not have their head in the sand after the release of 10g know the score.
> Production DBAs are a dying breed. Oracle databases will eventually be self-managing and self-tuning in
> the very near future.
Yep, they're trying to get there. It's a long haul from where they started, but the production dba role will be 95% automated in the near future.
It's kind of like IT operations staff. In the old days we always had operators to manage the daily manual tasks of IT. Now this is viewed as a crutch - things should be *so* automated that operators aren't necessary any more.
> What ticks me off is the "know-it-all" Oracle production DBA that tries to tell everyone what he/she needs, rather than the other way around.
I think this is a common symptom of administrators who are too far away from the business. Whether system, network, database, or applications - they guys often start to think of themselves at the center of the universe - instead of serving some business need.
> All the things you mention are normal everyday DBA issues, not-specific to Oracle.
right
> The issues you're referring to are mostly problems for the development DBA, and that's where all the smarts are necessary.
This might be part of the confusion: don't separate production vs development dba roles. You're right - that a production dba role can be very simple. So why even bother with it? Unless you've got complex security requirements for it, you are far better off without that role.
So then, back to the dba role. Yep, you need to be able to write complex sql, design a data model, tune the database, select the hardware, and plan and administer a data recovery procedure. And yes, it is a hard job.
> Actually, I'm interested, to tell the truth I haven't had the misfortune of dealing with badly designed databases and I would like to know what it takes for people to make them unable to
> provide answers while the information is there.
here's a list of major screw-ups:
1. objects are stored as binaries instead of separate columns
2. data is organized in massive and simple key-value pair columns rather than as separate and dedicated columsn
3. data is kept as strings that must be parsed with application code instead of with sql
lesser screw-ups would include:
4. overly normalized model
5. missing data
6. lack of versioning or history
7. data versioning inconsistently applied
8. data quality inconsistent due to all validation being performed at application layer
9. etc
Many of these issues don't prevent users from getting their data back out the application that wrote it. The problem is that when they have a simple question like, well, how many transactions per day have we been getting for the past year, and is the rate increasing? That there's no way to answer that question through SQL, but more expensive application logic has to be built. Then you next discover that the application that may have performed fine for simple transactions *completely dies* when you try to have it scan a year's worth of data.
The root cause of all this is that many people are still building applications like it was 1985 - and just being able to CRUD a few objects is good enough. Well, it isn't. People generally expect to be able to get insights about a business process in addition to simply creating operational transactions within it. And if you don't know that up front, and then design the database in accordance with the latest fads in J2EE, etc then you *will* suffer once that application starts getting used.
> Try this: host 100 clients on a single server (+backups). 100*128MB vs 100*10MB... *bzzzt* Your solution simply doesn't scale.
Sorry, I really don't think I know of 100 trivial databases that will fit on your 486. I'm sure there are probably tons of little team bowling score databases, grandma recipe databases, etc. But I just can't put my finger on more than a couple.
I don't work on oracle on daily basis, but I do work on db2 which is similar (at least when compared to mysql). And I can tell you that if you did want to put 100 tiny databases on a single db2 server with 1 gbyte of memory you could make it fit. I suspect you could do it almost as easily in Oracle.
Also, in the case of DB2 I suspect that you could probably put *more* databases on db2 for the same amount of memory than mysql. I don't know this to be a fact, but I do know that db2 offers you far more memory & database controls than mysql does.
>> Try this:
>> - assuming that your company firewall is getting 5,000,000 rows/day
>> - query: "is the current accept & deny traffic for any given protocol and port 5-minute interval more than 3 standard deviations from the mean for its time of day and day of week over
>>last 90 days?"
>Ugh, and you really do that in SQL? You're using an abysmal tool for this job. By using the right data
> structure, you can answer these queries in real time on any hardware, with a minimal memory footprint
> and O(1) access time. And while you can say that you have a modern server doing this task, your
> solution doesn't scale, while mine does.
The difference is that I'm not targeting discarded computers, but I am targeting low-labor. Hardware is cheap compared to labor. And that query query can be written and promoted into production in 1 hour against an existing database. Meanwhile, you'll spend another $10k in labor to create a real-time monitor that'll probably have to be reset every time your box is rebooted - all to save $5k on hardware.
Databases have been used for these purposes for ten years now. Unfortunately, your understanding of their capabilities may have been limited by the use of a limited tool incapable of this kind of (increasingly common) analytical processing.
> Oracle works better for clusters.
> MySQL works better for a single machine.
sorry, no
> MySQL is a lot faster. Oracle takes distributed processing a lot better.
again, no
- oracle has parallel capabilities that on a single box (2-4 way) allow it to completely kill mysql in performance comparisons
- oracle has an optimizer that can handle joins of more than 5 tables without collapsing
- oracle has the ability to rewrite your query to hit an automatically-maintained summary table
The only validity to your claim is that mysql might be a better choice if you're running 1995-era hardware, or are buying new hardware but want less than 128 mbytes of memory (for some unfathomable reason). In this case, ya, mysql can run on a lower memory footprint than oracle.
> on the other hand, on my firewall (486, 32MB ram) MySQL can handle Apache logs (only about 200k hits, though) taking a split second for any reasonable query.
ah, yeah, 1995-era equipment.
ok, now, how about a real-world situation. Try this:
- assuming that your company firewall is getting 5,000,000 rows/day
- query: "is the current accept & deny traffic for any given protocol and port 5-minute interval more than 3 standard deviations from the mean for its time of day and day of week over last 90 days?"
I write queries like that all the time on db2, and it handles them easily - even spanning hundreds of millions of rows. And this is on a single server, not a distributed one. And I could do the same on oracle. But mysql? Not a chance.
> If MySQL will do what you want, then you don't need Oracle.
Not necessarily true, the idea of 'just use the best tool for the job' is very myopic. For exammple, in a 500-person organization with say, twenty database servers, has a need for consistency. Why? Because otherwise they waste a lot of time:
- training dbas and developers on sql extentions and limitations
- training dbas on multiple database backup and restore methods, issues, and management
- managing multiple license types
- training dbas on performance tuning on multiple databases
- etc, etc
In fact, you can often save money this way while paying surprisingly high database licensing costs - through reduced administrative and development labor costs. Don't get me wrong, I still like site-licensed or free databases since it makes architecting solutions so much simpler: you can create a new database without waiting months for funding & procurement. And I'm not really a fan of Oracle.
While it's true that keeping a tiny oracle vanilla database running can be very simple - especially if you don't care much about the data, and aren't developing on it (canned app).
It is not simple to admin if:
- you've got development trying a lot of different ideas - and you're spending all your time researching issues
- you've got a ton of data
- you've got business critical data and need to configure for maximum reliability
- you get into a slightly off-the-path data restoration scenario
And the work of a dba isn't trivial if you are trying to do more than just keep the server running. In that case you should be:
- handling data quality issues
- handling data concurrency issues
- handling multiple apps using the same data elements
- handling replication issues
- designing reporting models
- often implementing the ETL process between operational and reporting models
- performance-tuning databases that are constantly changing over time, often selecting the new hardware, then migrating them to new hardware.
Figuring out where to spend your money to get performance you need: more memory? more cpus? more/faster disk?
- writing queries for developers that don't know how to write simple sql
- writing queries for advanced developers that aren't sure how to get the best performance out of complex queries
- creating automatically-maintained summary tables
- researching fixpacks and upgrades to figure out which you need, and what they'll break; then doing
emergency un-installs or work-arounds when you're wrong.
If your dba job is simple you're probably doing about 1% of what dbas do at our location
> But I still think the deliberately crippled nature of the product makes it unattractive relative to the
> open source contenders, in terms that even PHB's will understand: "Boss, if we go with 'free' Oracle,
> we're going to run into that disk space limit pretty fast, and then we'll have to pay $$$."
Actually, it looks to me like it will be extremely attractive to quite a few people:
- can easily handle small or departmental databases
- very cheap (free)
- completely consistent
- same codebase as oracle 10g
- should enable easy portability for oracle-enabled applications
- should be able to easily leverage existing oracle & developer skills
While it is true that this version has extreme limitations (1 CPU, 1 gbyte memory, 4 gbytes data), and if this is your *only* database server it may not be a good choice. But on a 3ghz server with 15k disk and good disk cache it could be extremely fast for small projects. Meanwhile, if you've got or need a dozen databases in-house, ranging from small to large, this solution could keep the small database costs down - while also eliminating the labor costs associated with managing a variety of products. Further, if the occasional project needs to go up from 1 CPU to 2 - the licensing costs are fairly cheap. Going from 1 CPU to 64 is a completely different situation (could cost a ton), but then again think of the massive labor savings.
I'm currently involved in leveraging this exact same strategy with db2 (their cheap express product came out about two years ago): we try to migrate all databases to db2 to use consistency to keep labor costs down. Then we choose the database license that'll be the cheapest to purchase and manage. Conversions from mysql are sometimes a pain in the butt due to weirdness of that product - then again, its data quality problems usually make it worthwhile anyway. Postgresql is more complex - the quality of this product is fine, the only benefit to conversion is administrative consistency.
Anyhow, is this product a mysql-killer? No, but could definitely cut into its market:
- ISVs with small databases could use this instead of mysql with no problems
- it is freer than mysql
- ISVs that need more power in their product beyond this can easily pick a more powerful version of Oracle that can completely outpace mysql anyway
(assuming you use parallel capabilities). This will cut down on hardware and development costs.
- mysql's future is now completely up in the air since Oracle has bought out the best part of mysql (innodb)
Of course, postgresql and firebird are still out there. These products are healthy and steadily improving, just lack the market share to compete well with oracle right now.
sorry about the 'exclusive salvage' - I was tired. I meant 'exclusive salvation' - the concept that you can only be saved through the belief of a single god. This concept is primarily pushed by the big-three religions, I suppose this is probably a religious innovation that can be chaulked up to judaism. Anyhow, the net effect is that it enables & encourages people to see others as inferior to themselves, to discard the value of their lives, etc, etc. Contrast this to the religons of the greeks - in which they were happy to accept anyone else's gods.
Thanks for a civil discourse on this subject. I think however that we're unlikely to see eye-to-eye on this subject. It seems to me that you base a lot of faith on the words of one book. And there have been thousands of gods with hundreds of books. Of them all, this may be the most significant book, but it offers no proof, just a plea to believe. Nothing really distinctive that the others don't offer as well. Meanwhile, I feel that the concept of god is a matter of obsolete and primitive philosophy. And I will not discard simple logic for hearsay or propaganda - whether it is passed to me by word of mouth or via a two thousand year old book.
I once wrote a list of reasons for my children why I don't believe in god - one of them touched on your point:
- Why would an all-knowing and all-powerful got bother to create us? After all, he already knows everything we're going to do. Why bother testing us? Assuming there are such a thing as "souls" he could just create them with an artificial past history and put them in his little heaven or hell as appropriate. Why would he do that? Well, yeah - the idea of an all-knowing & all-powerful god contradicts the concept of the christian/jewish/islamic creation.
I don't think god is evil. I think that the concept of god is a primitive idea useful to people who are reaching out for explanations of the world around them. We can do *far* better today than to invent invisible and imaginary creatures to explain the world.
>> 1. so many "followers of the book" are quite eager to kill one another over differrences in their interpretation of it
> The proportion of people claiming to be Christians and willing to kill each other over matters of doctrine is fairly small.
The killing of jews by christians is hardly a small matter, the killing of moslems and christians by one another is hardly a small matter, the killings of the hugeunots by the catholics, the persecution of catholics by luther and his followers, the fighting in irelands, etc, etc, etc. I wouldn't call it a small matter.
side note: how many religious wars can you think of between two sets of polytheistic people? The romans and greeks never fought "holy wars", I can't remember any religious wars between greeks and egyptions. This concept of killing people because they believe in a different god seems to be largely a result of the concept of "exclusive salvage" found in the bible.
>> 3. there is any value in graduate studies in christian theology - if the book was "understandable"
>> and not subject to so much interpretation then it would read like a printer repair manual.
> False dichotomy. Just because it is easy to understand what is necessary for salvation and a lot of
> other stuff as well doesn't mean that there aren't bits requiring more thought and training and deeper
> understanding and insight available through further study.
If this was true you wouldn't have hundreds of conflicting formulas for salvage: whether or not you can dance, drink, work on the sabbath, use religious icons and symbols, divorce, etc, etc, etc. And of course, you can say that those that disagree with you are misguided, or aren't christians BUT they call themselves christians are are following the same book. The problem is that the book is SO subject to interpretation. Again, this cuts away at the entire concept of a small god - you'd think he would write something people could agree upon, rather than something that causes wars.
>> Beyond the question of whether or not the thousands of christian, jewish, and islamic sects would >>agree with you in your interpretation...please, if there is a god that is omnipotent and omniscient I >>would sincerely hope that he would communicate a little more clearly than via one person's dreams from >>2,000 years ago. Heck, it sounds like something out of a Diskworld no
> You're being a little silly there, reducing the Bible to 'one person's dreams.' It contains thousands
> of years of God clearly intervening in history and making himself known, sometimes in dramatic,
> obvious ways, sometimes in more subtle ways, but most significantly in the person of Jesus. To claim
> that God has not communicated clearly when we have a document such as the Bible with a historicity far
> surpassing that of other documents from antiquity, an event such as the crucifixion which divides time
> in two and a life such as Jesus' which few have not heard of, is ludicrous.
Rather than determine the validity of this book through the eyes of a biased convert, take a look at it through the eyes of an objective observer: so, we're supposed to be swayed because the events in the book are documented...in the book itself. That's circular reasoning, that gives it no validity over Homer's Illiad, or any number of religious texts that have survived to today.
Your concept of the significance of events such as the crucifixion is derrived from the book itself, this event isn't documented elsewhere. Plus, much of the bible was written many years after the original events occured, and is a combination of various documents that the editors felt would be handy to read together. Documents from the same period that didn't lend themselves to the editorial vision were discarded - but have been found since. What a *mess*.
Again, if god is a smart guy I think he could communicate the basic message in about 50 pages. Hmmm,
> You read the Onion article on Intelligent Falling theory, didn't you?
a rthsociety.htm
:-)
yup:
flying spaghetti monster: http://www.venganza.org/
intelligent falling theory: http://www.theonion.com/content/node/39512
flat earth society: http://www.alaska.net/~clund/e_djublonskopf/Flate
sadly, no link for modern support of bible-based cosmology in which the sun circles the earth? suggestions?
oh, and btw, I do know how to spell ridicule