What Capacities Do Databases Have?
razor69 asks: "Having programmed PHP in conjunction with MySQL for some time now, I'm quite happy with the whole setup. However, I'm now working up to some more serious clients, which require larger and larger databases. Looking for database comparisons, I'm stuck with the stats MySQL give me and none of the database manufacturers tell me what the maximum amount of records their system can handle is. Now I know that Oracle is unlimited, but so is their price. Which database program is suitable for what price, how many records can they hold (practically as well as theoretically) and how do they all REALLY compare?"
I think any large DB project should be treated as an infrastructure project...
When you're dealing with clients who have (relatively) massive data requirements (ie., data requirements that outstrip the smaller DB services such as MySQL that can be hosted on cheap, commodity hardware), you've got to step back and take a second look at the requirements of their business.
Before you start looking at particular DB packages, and their relative merits, you should really start by identifying the business requirements for the database. For example, what are the uptime and recoverability requirements? If your client requires 24x365 uptime, and also wants a hot standby site, you've got to start spec'ing out the system at the hardware and network level.
Identify the infrastructure that will sustain the requirements of your client, and then begin the process of selecting a DB package that works well within the constraints of that environment. When dealing with larger clients, always keep in mind that you are fundamentally delivering a DB infrastructure, not simply a DB.
A couple things you might want to consider:
Availability - does the system have to be redundant? Does it have to support automatic failover between redundant server?
Scalability - does the DB need to grow beyond the constraints of a single server? Can you forsee this data existing on a large array (ala EMC), and do all nodes of the DB complex need simultaneous access at some level?
Middleware - If it's a large DB, possibly being accessed from multiple physical locations, does it look like a queued middleware (ala IBM MQ) is needed, or perhaps appropriate?
Disaster recovery - If the client is a 24x7 shop, or even if they aren't, are you going to have a sufficient window to back this beast up? Are they going to ask you to architect an off-site recovery strategy? If so, does it have to be a "hot" standby site? If so, does your hardware/network infrastructure support dynamic mirroring over a WAN?
Once you've hammered out these kind of questions (and there are lots more..), then you are in a position to start evaluating DB packages that will work well withing the infrastructure you've designed.
As with any project, getting the requirements defined out front is going to save you a lot of time and agony, and is going to save your client a lot of money and aggravation.