When is Database Muscle Too Much?
DBOrNotDB asks: "At some of the places I've worked in the past, there have been DBAs who generally insisted that given accurate specifications and enough hardware and software, you could stuff nearly anything you wanted to into a database, manipulate it, and pull it back out again in a reasonable time. The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project. This seems like a violation of many major software engineering principals (e.g. reuse) to me. My question is, what kind of success or horror stories does the community have about trying put different projects into databases? Numbers (# of rows, tables, total data storage, cost, etc) would be nice, but even just anecdotes would be helpful."
I'm running a moderately large site that deals heavily with images and at first I thought it would be great store all the images in the database along with the incidentals. Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did:
* Size: I didn't anticipate the user would upload 5 gig of images
* Access: You need some sort of extraneous code to pull those images from the database
* Communication: your code must know how to fake being a proper image (right headers, creation/change times, etc)
* Size 2: A lot of databases can't store high-res images (read: large images) in a database without serious penalties (like chopping them up into little bits to fit into a MySQL bigblob)
Speed can be addressed in the perl world via FastCGI or Mod_perl and similar ways on other platforms, but you'll still have to do disk reads to get the data and you'll pay a price for having one more often used script in memory.
Storage size can be counteracted with clever tricks or "professional grade" (read: expensive) DB Engines like Oracle and DB2 which have binary data storage as one of their features, but you'll need to pay money and have a big muscly machine to run 'em.
For my money, 90% image serving can and should be done from filesystem because that's what web servers are made for. The other 10% are weird meta things like this that could use the file system but are designed to use database.
My big thing is using your head and ask yourself what will be easier in the long run? Sure, binary data slung around with Perl DBI sounds convenient, but how conevneint is it to run "ismcheck" on a 100 GB database, eh?
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Sure, there are times when writing an RDBMS-based solution seems like a big overhead. But there's a good reason for using RDBMS on projects that are likely to mutate and add new features over time, and/or have to interoperate with other programs and systems.
On the other hand, if you just want to stores a small array of data that fits in a 100 line text file, and the program is completely closed and self contained, there's no need for the flexibility of a RDBMS.
Imagine a business that has to "send and receive stuff"...
If you're moving two or three little packages to nearby local area businesses, only, you can get by with a small car.
But imagine your regularly ship objects large and small to locations local and international... Then you need an intermodal transportation system. Sure, your interface might be "the shipping guy", but the backbone of the transportation is heavy duty...
There is always a case for normalization or denormalization. I have developed many different databases of various sizes. ( 10 terabytes) and as a rule I try to be a s normalized as possible. I have worked on some data conversions that were rathy messy due to denormalization. However, I can't think of a single database where there wasn't some type of denormalization for various reasons. From speeding query results to the client's demands.
Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?
When you want speed and flexibility and scalability and reliability and extendability and particularly developer productivity you will undoubtedly end up shooting yourself in the foot later if you avoid some form of DBMS up front.
Where you have a particularly well-defined, narrow functionality, and performance in a small footprint is a requirement, an RDBMS may not be such a good choice, but DB libraries like berkeley db can still be very useful.
And with PostgreSQL, Firebird , MySQL and so many other free, open-source projects out there covering such a broad spectrum of needs for a database, why would you not use that expert work?