When is Database Muscle Too Much?
DBOrNotDB asks: "At some of the places I've worked in the past, there have been DBAs who generally insisted that given accurate specifications and enough hardware and software, you could stuff nearly anything you wanted to into a database, manipulate it, and pull it back out again in a reasonable time. The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project. This seems like a violation of many major software engineering principals (e.g. reuse) to me. My question is, what kind of success or horror stories does the community have about trying put different projects into databases? Numbers (# of rows, tables, total data storage, cost, etc) would be nice, but even just anecdotes would be helpful."
Slightly off topic as it were, but I've noticed that a lot of people seem to think that Excel works very nicely as a database. In some cases this might be true, but the bigger you get the more problems you have, and I just can't seem to convince those of a less-than-technical mind (read: management) otherwise.
This sig no verb.
One company I worked for contracted out an application we had to build to a provider who brought in this crazy, dishevled, brilliant Russian database engineer. I remember that with every issue that would come up he would say, "Is database application" and go off muttering to himself. Content management software needed? "Is database application." File system problems? "Is database application." new mouse drivers? "Is database application." What to order for lunch? "Is database application."
The moral of the story? Any computer application is built most quickly and easily using a database solution.
"Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02
i was always under the impression that the filesystem was a better place for this, assuming the directory structure was simple or fixed (i.e. you wouldn't be creating thousands of subdirs dynamically). why store all your banners in a table as BLOBs when you can simply have a web-accessible directory and store them there?
i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?
You obviously are working with morons. Very few data-oriented applications need to write their own data-stores. Almost anything you can imagine (complex relational data, object-oriented data, xml stuff, photos, video footage, 3/4D spatial data, etc, etc..), someone has written database software tuned for it. Use it and be happy.
11*43+456^2
A custom DB will be faster than a general purpose DB (by definition).
A relational DB offers great flexibility (pull any data from the database, add most any index to greatly improve the performance, etc).
I would base the decision mostly on the interfaces to the application. I've worked on applications where the company prefered to access the SQL database directly for reporting and I've worked on projects where the only interface to the application was via HTTP/XML. In the latter case, no one cared how we stored the data, so we dumped XML into a filesystem.
Joe
Joe Batt Solid Design
I have to agree that database reuse is among the most essential parts of running a profitible business. I've worked with all sorts of RDBMSes, from MS to Oracle to PostgresSQL, on everything from the lowliest hand-me-down Linux server to top-of-the-line Big Iron, and I can tell you that any modern database is going to be able to take whatever you can throw at it. I like being able to whip out whatever data we have, shove it in and pull it out again, repeatedly and at a moment's notice. It's this kind of flexibility that makes us keep coming to database systems in the first place.
Karma: Good (despite my invention of the Karma: sig)
Sure, there are times when writing an RDBMS-based solution seems like a big overhead. But there's a good reason for using RDBMS on projects that are likely to mutate and add new features over time, and/or have to interoperate with other programs and systems.
On the other hand, if you just want to stores a small array of data that fits in a 100 line text file, and the program is completely closed and self contained, there's no need for the flexibility of a RDBMS.
Imagine a business that has to "send and receive stuff"...
If you're moving two or three little packages to nearby local area businesses, only, you can get by with a small car.
But imagine your regularly ship objects large and small to locations local and international... Then you need an intermodal transportation system. Sure, your interface might be "the shipping guy", but the backbone of the transportation is heavy duty...
The reason we use the database instead of the filesystem is deathly simple. The database is god-like. I can do point it time recovery, and guarunttee that the database is completely consistant with the recovery point. If I had all that in a filesystem it'd be harder. It means I have hot-rollover capaibility from server to server, without having to duplicate my filesystem from machine to machine, I just let the hot archive logs deal with that. It means I have one backup system, and one failure recovery plan. It means all I have to monitor is the Oracle tablespace to ensure I don't run out of space. It means when I say "commit", I can hold Oracle accountable for ensuring the data is there, rather then having myself held accountable by the management.
If I was a good little boy and swallowed all the kool-aid, I'd use iFS (Oracle's Internet Filesystem) and it'd be all good. However, I don't I just use a huge array of blob's in my Oracle database.
Now that said, I have a remote filesystem that all of this data gets spooled to. Once spooled there it gets written to CD. Once the CD's are written, they are then used to find, compare, and if they match delete the blobs out of the Database. The CD then deletes the files out of the spool. Duplicate the CD, compare the two, send one offsite.
The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky...
Kirby
Perhaps it's just that I can't think "outside the square" or something, but I can't really think of ANY application I've worked on where you couldn't make a good case for storing the data in a database. In some cases I've used XML files (when there was only a very small amount of data to be stored) but anything bigger than that I've always stored in a database. Perhaps under some circumstances (that I can't think of right now) you _might_ want to roll your own storage system....but I think these sort of projects would be the exception (handled of course!) rather than the rule.
In one scenario, you maintain business logic, info storage, display, and all sorts of other crap. In the other, you only maintain logic and display. It's easier to force your customers into lockin with some proprietary mish mash, but there are others (like myself) who will turn you down flat.
Internal stuff is similar. Do you want to maintain EVERYTHING, or just half of it?
Jesus was all right but his disciples were thick and ordinary. -John Lennon
Pluses: the database takes care of synchronization, and nearly takes care of backup/recover. Very nice. Some DB rigor may rub off on your designers.
Minuses: the DBM is large (in MB, in install/config requirements, and in CPU usage) and your customer may not be running the DBM brand/version which you have tested your app with. Supporting multiple DB vendors is a pain. SQL is sort of standard, but the table definitions tend to vary. Ick.
If the data is not critical or can be easily re-created, then a filesystem will suffice.
But if the data is critical to the business, and/or not easily be re-created,
the data should go into a real DB that is Managed Properly(tm).
You are being MICROattacked, from various angles, in a SOFT manner.
But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.
Good DBMSes can break complex joins catching the criteria piece by piece. You can also create run time extracts which are used by real time / almost real time systems for read access. However your advice is simply terrible. Once you lose normal form you lose the associative law on your table algebra. That means join operations are not defined independently of order they are performed in and that is very bad. Rick mentions an example of this in terms of addresses but it can get far worse.
Sometimes, well-thought out denormalisation can make a huge, positive impact on an application. Yes, it can be difficult to make sure you don't have any anomalies in your data, but with a rigourous design and development methodology, these problems can be minimised.
Don't denormalise for the sake of denormalising - the trick is to know when to break the rules and to do so very carefully.
Denormalisation is only one tool that can be used to improve the performance of a system, and of course, other options like more memory, faster CPUs and better code should be addressed first.
But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.
Don't normalise your database at your own peril!
Learn how to properly normalise a database (3NF is usually good enough) and then learn how to write decent queries and tune your indexes appropriately. In some circumstances it may be worthwhile caching some data in extra tables which are refreshed periodically. In even rarer circumstances it may be necessary to denormalise the database, but always normalise it and only denormalise if you encounter some intractible performance issues.
We have some incredibly complex queries at my current client and we have managed to gain performance improvements by re-writing certain queries. Splitting complex queries up and using temp tables, derived tables, sub-queries, etc can help and you will have less of a problem with data anomalies which can creep into a denormalised structure.
It really depends on what data you are storing. How much data, how critical, what are you doing with it, etc...
Sometimes, Excel is good enough. Or XML. Or plain text files. Or a custom file format. etc...
If the people working on your project have only ever worked with databases, they will want to use databases for everything. Most stuff will fit into a database, but sometimes it is not apprpriate to do so - as other have mentioned here, storing images in a database is not always a good idea, but you would probably want to store the location of the images in a database.
If you need to be able to ship the data around to different machines/offices/clients/over the net/etc, then maybe an XML file will be best. Custom file formats may be appropriate in some cases too. (Though I'd lean towards a more open file format.)
Huh? Do you create a custom C library for every application as well? How about a custom UI toolkit? Custom preprocessor/compiler?
Sounds kind of silly, doesn't it.
So why do these folks think a "customized one-off data storage solution" sounds any better? It's the same problem - you can either use something that's already been debugged, tested, and tweaked for performance, or you can spend your own time and effort to create it yourself. That's time and effort that could go towards coding and testing the final product, but is instead spent elsewhere (probably because someone thinks that using a dabatbase for storage would make the application "bloated").
I think the problem is probably that when you mention using a "database", most people equate that term with "general purpose database server" (Oracle, SQL Server, Postgress, MySQL, etc.) There are libraries available that were specifically designed to offer programs lightweight database access without the pain of using a full-fledged RDBMS. Search Google for embedded database, xbase library, or open source database library to start... there are any number of toolktis that will allow you to create a very customized storage solution without having to create "one-off" code for each and every project.
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
When you want speed and flexibility and scalability and reliability and extendability and particularly developer productivity you will undoubtedly end up shooting yourself in the foot later if you avoid some form of DBMS up front.
Where you have a particularly well-defined, narrow functionality, and performance in a small footprint is a requirement, an RDBMS may not be such a good choice, but DB libraries like berkeley db can still be very useful.
And with PostgreSQL, Firebird , MySQL and so many other free, open-source projects out there covering such a broad spectrum of needs for a database, why would you not use that expert work?
Definitely not Dallas, but in a way I think these places are more a state of mind than a physical place ;)
All things in moderation; including moderation
What's so wrong about having the GUI stored in the database? It's not like you can't download once to the local client... Heck, AOL's been doing that for years!
Just because there's a database in the center of everything, doesn't mean the clients can't cache data locally....
When you can imagine querying the data you are entering - you can't easily query images, or other binary data (although I guess there must be someone working on this problem somewhere...). If you can't query it, you should usually find a better place to store it - NAS is usually fine - and maintain a pointer to it (e.g. a filename). Yes, it's something that can get un-synced, but most databases suck when it comes to actually dealing with binary data, and you can use that capacity a lot more effectively elsewhere.
When the structure of the data is likely to remain stable. If your application deals with well-understood entities, whose properties are unlikely to change over time, a database is a great solution.
Databases are, however, relatively change-resistant - it's typically a pain in the backside to change the datatype of a column, remove columns etc. So, if you're working in a domain where you continuously learn new things about your core entities, or if your development processes are highly iterative, you might be better off using an alternative data storage mechanism.
When more than a single user is likely to access the data - yes, you can create locking mechanisms yourself. You can also take your own garbage to the local dump. It's usually not a good use of your time, and the cost of not dealing with the issues involved are expensive, both for garbage and concurrent access to shared data.
When you require consistency accross transactions - the good old ACID (atomicity, consistency, isolation, durability) principles which become important for many non-trivial applications.
If you care about enforcing rules of referential integrity - do you want to ensure that all the tracks in your record collection can be tied back to a recording ? Do all orders have to have a customer ? Those things are far simpler to implement with an RDBMS than in code.
There are instances where using an RDBMS is not appropriate. Ones that spring to mind are :
- your business domain is not well understood or liable to rapid change. In this case, the cost of change for database objects is likely to be a problem - consider storing data in a self-describing format like XML.
- the application domain doesn't lend itself to being described in relational terms - image manipulation tools, word processors etc. which deal with mainly binary information probably should not use a relational model for their core data structures.
Alternatives exist - Object Oriented databases are becoming more and more popular. I have way too little experience with these to comment on their use.
It's all very well in practice, but it will never work in theory.