When is Database Muscle Too Much?

← Back to Stories (view on slashdot.org)

When is Database Muscle Too Much?

Posted by Cliff on Tuesday October 29, 2002 @09:35AM from the databases-vs-flat-files dept.

DBOrNotDB asks: "At some of the places I've worked in the past, there have been DBAs who generally insisted that given accurate specifications and enough hardware and software, you could stuff nearly anything you wanted to into a database, manipulate it, and pull it back out again in a reasonable time. The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project. This seems like a violation of many major software engineering principals (e.g. reuse) to me. My question is, what kind of success or horror stories does the community have about trying put different projects into databases? Numbers (# of rows, tables, total data storage, cost, etc) would be nice, but even just anecdotes would be helpful."

12 of 93 comments (clear)

Min score:

Reason:

Sort:

More of a "dilbert" story by dacarr · 2002-10-29 09:41 · Score: 3, Funny

Slightly off topic as it were, but I've noticed that a lot of people seem to think that Excel works very nicely as a database. In some cases this might be true, but the bigger you get the more problems you have, and I just can't seem to convince those of a less-than-technical mind (read: management) otherwise.

--
This sig no verb.
1. Re:More of a "dilbert" story by leviramsey · 2002-10-29 10:02 · Score: 4, Funny
  
  Doesn't Google run off of a huge Excel spreadsheet?
afaik... by pizza_milkshake · 2002-10-29 09:52 · Score: 3, Interesting

i used to be a big php fan (still a fan, but not a big one) and i was always surprised/dismayed that everyone wanted to store images for web-based applications (catalog-style images, user-submitted icons/graphics, banners, etc) in a database.
i was always under the impression that the filesystem was a better place for this, assuming the directory structure was simple or fixed (i.e. you wouldn't be creating thousands of subdirs dynamically). why store all your banners in a table as BLOBs when you can simply have a web-accessible directory and store them there?
i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?
1. Re:afaik... by Xunker · 2002-10-29 10:12 · Score: 4, Insightful
  
  I'm running a moderately large site that deals heavily with images and at first I thought it would be great store all the images in the database along with the incidentals. Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did:
  
  * Size: I didn't anticipate the user would upload 5 gig of images
  * Access: You need some sort of extraneous code to pull those images from the database
  * Communication: your code must know how to fake being a proper image (right headers, creation/change times, etc)
  * Size 2: A lot of databases can't store high-res images (read: large images) in a database without serious penalties (like chopping them up into little bits to fit into a MySQL bigblob)
  
  Speed can be addressed in the perl world via FastCGI or Mod_perl and similar ways on other platforms, but you'll still have to do disk reads to get the data and you'll pay a price for having one more often used script in memory.
  
  Storage size can be counteracted with clever tricks or "professional grade" (read: expensive) DB Engines like Oracle and DB2 which have binary data storage as one of their features, but you'll need to pay money and have a big muscly machine to run 'em.
  
  For my money, 90% image serving can and should be done from filesystem because that's what web servers are made for. The other 10% are weird meta things like this that could use the file system but are designed to use database.
  
  My big thing is using your head and ask yourself what will be easier in the long run? Sure, binary data slung around with Perl DBI sounds convenient, but how conevneint is it to run "ismcheck" on a 100 GB database, eh?
  
  --
  Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
2. Re:afaik... by Rick+the+Red · 2002-10-29 10:31 · Score: 4, Informative
  
  But please, for the love of God, don't listen to those "real world experience" morons who say denormalization is key. That's all well and good until you've got the same information replicated to hell and gone and records start to disagree: Is the billing address 123 North Main or 1313 Mockingbird Lane? Half the invoices for this customer show one, half the other.
  If your normalized tables take a performance hit, buy a bigger box. If you munge the data with replication, you're screwed.
  
  --
  If all this should have a reason, we would be the last to know.
3. Re:afaik... by GLHMarmot · 2002-10-29 11:30 · Score: 3, Insightful
  
  But please, for the love of INSERT DIETY, don't listen to those polarized-my-way-is-the-only-way INSERT INSULT's.
  There is always a case for normalization or denormalization. I have developed many different databases of various sizes. ( 10 terabytes) and as a rule I try to be a s normalized as possible. I have worked on some data conversions that were rathy messy due to denormalization. However, I can't think of a single database where there wasn't some type of denormalization for various reasons. From speeding query results to the client's demands.
  Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?
Understand your needs and then decide... by toybuilder · 2002-10-29 10:12 · Score: 4, Insightful

Sure, there are times when writing an RDBMS-based solution seems like a big overhead. But there's a good reason for using RDBMS on projects that are likely to mutate and add new features over time, and/or have to interoperate with other programs and systems.

On the other hand, if you just want to stores a small array of data that fits in a 100 line text file, and the program is completely closed and self contained, there's no need for the flexibility of a RDBMS.

Imagine a business that has to "send and receive stuff"...

If you're moving two or three little packages to nearby local area businesses, only, you can get by with a small car.

But imagine your regularly ship objects large and small to locations local and international... Then you need an intermodal transportation system. Sure, your interface might be "the shipping guy", but the backbone of the transportation is heavy duty...
We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 10:18 · Score: 5, Interesting

I write a screen scrapping application, that downloads lots (100K + web pages a week). We write absolutely everything page into the database. 5GB is enough to hold about two days worth of pages. So we can't keep it in our production database. Especially because 95% of the data doesn't need to be stored except for auditting, but we keep it around in case a mistake is found, or a new tread in the pages happens.
The reason we use the database instead of the filesystem is deathly simple. The database is god-like. I can do point it time recovery, and guarunttee that the database is completely consistant with the recovery point. If I had all that in a filesystem it'd be harder. It means I have hot-rollover capaibility from server to server, without having to duplicate my filesystem from machine to machine, I just let the hot archive logs deal with that. It means I have one backup system, and one failure recovery plan. It means all I have to monitor is the Oracle tablespace to ensure I don't run out of space. It means when I say "commit", I can hold Oracle accountable for ensuring the data is there, rather then having myself held accountable by the management.
If I was a good little boy and swallowed all the kool-aid, I'd use iFS (Oracle's Internet Filesystem) and it'd be all good. However, I don't I just use a huge array of blob's in my Oracle database.
Now that said, I have a remote filesystem that all of this data gets spooled to. Once spooled there it gets written to CD. Once the CD's are written, they are then used to find, compare, and if they match delete the blobs out of the Database. The CD then deletes the files out of the spool. Duplicate the CD, compare the two, send one offsite.
The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky...
Kirby
1. Re:We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 19:29 · Score: 5, Interesting
  
  What I thought was interesting, is that you say that you can hold Oracle accountable for missing data. What's to keep Oracle from claiming that you're a dumb ass and messed it up all by yourself? Doesn't the click through on the license basically keep them from being held liable? What do you think management is going to say when Oracle tells them the employee is the one who screwed up?
  I've submitted a request into the Oracle iTAR (Technical Assistance Request) system, at 3:00 AM, gotten a call back by 3:10AM, and a resolution by 3:45AM on a relatively esoteric bug. When I say, I get to hold Oracle accountable, I mean, I can hold Oracle accountable to get me up and running pronto with as much data as I'm going to get. It costs an arm and a leg, but the only people I've ever heard of who have lost their data using Oracle just didn't do backups properly. Oracle is pretty serious about keeping your data around. If you stay with stable tested versions, you'll be fine with Oracle.
  I've had Oracle help me on de-supportted platforms, using non-standard configurations, doing crazyness of my own making. They always help me when I contact them for support. For production machines, I stay on the tried and true, but Oracle has never let me down in weird situations.
  Oracle as a general rule, has *never* told me buzz off it's your fault. They stayed with me, and found it was my fault, and then showed me the doc's where it says I'm doing something that won't work. For all their faults, Oracle has *NEVER* failed me in any way when it comes to support. When it's my fault, I stand there and take it on the chin. However, as a backing store, Oracle will whoop anything I write eight ways to Sunday for speed, reliability, portability, quality, documentation and support. Hand's down. If you can afford it, there is no excuse for writting your own custom storage manager. If you can't afford it, try PostGreSQL. Oracle or PostGreSQL *WILL* be better for 99.9% of the cases out there. Google is one of the few examples of a situation where writting your own is probably a good idea.
  For the record, out of the ~200 million records I've processed, I've lost 1, count'em 1 record using Oracle that was Oracle's fault (I've lost any number of them when Linux crashed, but that's my fault, not Oracle's). Even then Oracle clearly identified which one it was, and when it happened, so it was easy to recover.
  Kirby
Re:performance v fexibility by joto · 2002-10-29 10:30 · Score: 4, Interesting

Experience tells me exactly the opposite. A custom DB will perform better, until you actually start to fill it with lots of data. When that happens, you will find that the many man-years spent developing the expensive proprietary DBMS systems actually resulted in something better than what you could quickly hack together over a few days.
That being said, there are still lot's of valid reasons not to use a real DBMS for every small project. The most important is simplicity. Bringing in hundreds of megabytes of third-party software to store a few kilobytes of data is not only overkill, it's also a maintenance nightmare!
Why not do the whole project as a one-off? by Samrobb · 2002-10-29 16:17 · Score: 3, Informative

The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project.

Huh? Do you create a custom C library for every application as well? How about a custom UI toolkit? Custom preprocessor/compiler?

Sounds kind of silly, doesn't it.

So why do these folks think a "customized one-off data storage solution" sounds any better? It's the same problem - you can either use something that's already been debugged, tested, and tweaked for performance, or you can spend your own time and effort to create it yourself. That's time and effort that could go towards coding and testing the final product, but is instead spent elsewhere (probably because someone thinks that using a dabatbase for storage would make the application "bloated").

I think the problem is probably that when you mention using a "database", most people equate that term with "general purpose database server" (Oracle, SQL Server, Postgress, MySQL, etc.) There are libraries available that were specifically designed to offer programs lightweight database access without the pain of using a full-fledged RDBMS. Search Google for embedded database, xbase library, or open source database library to start... there are any number of toolktis that will allow you to create a very customized storage solution without having to create "one-off" code for each and every project.

--
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
Embedded applications by Karora · 2002-10-29 19:49 · Score: 4, Insightful

I have worked as an application developer / designer with DBMS backed applications for the last 17 years. There are reasons for not choosing a database, but not usually very good ones.
When you want speed and flexibility and scalability and reliability and extendability and particularly developer productivity you will undoubtedly end up shooting yourself in the foot later if you avoid some form of DBMS up front.
Where you have a particularly well-defined, narrow functionality, and performance in a small footprint is a requirement, an RDBMS may not be such a good choice, but DB libraries like berkeley db can still be very useful.
And with PostgreSQL, Firebird , MySQL and so many other free, open-source projects out there covering such a broad spectrum of needs for a database, why would you not use that expert work?

--

...heellpppp! I've been captured by little green penguins!