When is Database Muscle Too Much?

More of a "dilbert" story by dacarr · 2002-10-29 09:41 · Score: 3, Funny

Slightly off topic as it were, but I've noticed that a lot of people seem to think that Excel works very nicely as a database. In some cases this might be true, but the bigger you get the more problems you have, and I just can't seem to convince those of a less-than-technical mind (read: management) otherwise.

--
This sig no verb.

Re:More of a "dilbert" story by leviramsey · 2002-10-29 10:02 · Score: 4, Funny

Doesn't Google run off of a huge Excel spreadsheet?
Re:More of a "dilbert" story by floydigus · 2002-10-29 11:21 · Score: 2

In my experience this occurs in outfits where the people who hold the purse strings are not also the people who understand computers. They may have no experience of what real systems can do for them.
I used to work at a place where mission critical data was stored in a massive Paradox database to which all these crazy jarheads had admin rights. Just stupid. You ended up with thousands of copies of tables and no-one knew which one was the right one to use. Then they decided to go SQL Server and what did they do? Remodel the data structure? No. They just copied the damn Paradox tables one for one.
They used to back up the c:\ drives of every machine in the office every lunchtime in that place.
Dickheads.

--
All things in moderation; including moderation
Re:More of a "dilbert" story by arb · 2002-10-29 14:52 · Score: 2

Doesn't Google run off of a huge Excel spreadsheet?

No, they use pigeons!
Re:More of a "dilbert" story by sien · 2002-10-29 16:11 · Score: 2

Well, actually the ability of Excel to act as a primitive database was one of the reasons it has wound up being so popular.
Joel Spolsky wored with MS on the Excel team and points out that in user studies they did the ability of excel to record data in such a way was important in it's adoptance. Check out the chapter from his excellent book User Interface Design for programmers and search for excel.
Re:More of a "dilbert" story by sharkey · 2002-10-30 04:08 · Score: 2

but I've noticed that a lot of people seem to think that Excel works very nicely as a database.

Just the opposite of what we have here. Folks here seem to think that Access is just a wonderful spreadsheet program. (4 databases, 1 table each, 50+ columns)

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.

"Is database application" by RaboKrabekian · 2002-10-29 09:42 · Score: 2

One company I worked for contracted out an application we had to build to a provider who brought in this crazy, dishevled, brilliant Russian database engineer. I remember that with every issue that would come up he would say, "Is database application" and go off muttering to himself. Content management software needed? "Is database application." File system problems? "Is database application." new mouse drivers? "Is database application." What to order for lunch? "Is database application."

The moral of the story? Any computer application is built most quickly and easily using a database solution.

--
"Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02

Re:"Is database application" by Rick+the+Red · 2002-10-29 10:23 · Score: 2

"When all you have is a hammer, everything looks like a nail"

--
If all this should have a reason, we would be the last to know.

afaik... by pizza_milkshake · 2002-10-29 09:52 · Score: 3, Interesting

i used to be a big php fan (still a fan, but not a big one) and i was always surprised/dismayed that everyone wanted to store images for web-based applications (catalog-style images, user-submitted icons/graphics, banners, etc) in a database.

i was always under the impression that the filesystem was a better place for this, assuming the directory structure was simple or fixed (i.e. you wouldn't be creating thousands of subdirs dynamically). why store all your banners in a table as BLOBs when you can simply have a web-accessible directory and store them there?

i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?

Re:afaik... by ceejayoz · 2002-10-29 09:59 · Score: 2

I've heard anecdotal evidence saying that the filesystem is faster and less processor intensive than storing binary data in the database. That said, I usually store images in my databases. :-)
Re:afaik... by Xunker · 2002-10-29 10:12 · Score: 4, Insightful

I'm running a moderately large site that deals heavily with images and at first I thought it would be great store all the images in the database along with the incidentals. Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did:

* Size: I didn't anticipate the user would upload 5 gig of images
* Access: You need some sort of extraneous code to pull those images from the database
* Communication: your code must know how to fake being a proper image (right headers, creation/change times, etc)
* Size 2: A lot of databases can't store high-res images (read: large images) in a database without serious penalties (like chopping them up into little bits to fit into a MySQL bigblob)

Speed can be addressed in the perl world via FastCGI or Mod_perl and similar ways on other platforms, but you'll still have to do disk reads to get the data and you'll pay a price for having one more often used script in memory.

Storage size can be counteracted with clever tricks or "professional grade" (read: expensive) DB Engines like Oracle and DB2 which have binary data storage as one of their features, but you'll need to pay money and have a big muscly machine to run 'em.

For my money, 90% image serving can and should be done from filesystem because that's what web servers are made for. The other 10% are weird meta things like this that could use the file system but are designed to use database.

My big thing is using your head and ask yourself what will be easier in the long run? Sure, binary data slung around with Perl DBI sounds convenient, but how conevneint is it to run "ismcheck" on a 100 GB database, eh?

--
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Re:afaik... by Old+Uncle+Bill · 2002-10-29 10:16 · Score: 2, Interesting

Yes, this is definitely bad. Images are typically stored in the db as Binary Large OBjects (BLOBs) which no database system on the planet is good at retrieving quickly. Also, updating into the database can take longer.

The real killer in database performance comes in two places, large complex joins and full table scans. Eliminate these two things in your db and you should never have scaling problems. To do this, watch your long running queries and make sure they have the proper indexes on the tables. And make sure you keep your statistics updated. But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

--
Yes, I am an agent of Satan, but my duties are largely ceremonial.
Re:afaik... by Rick+the+Red · 2002-10-29 10:31 · Score: 4, Informative

But please, for the love of God, don't listen to those "real world experience" morons who say denormalization is key. That's all well and good until you've got the same information replicated to hell and gone and records start to disagree: Is the billing address 123 North Main or 1313 Mockingbird Lane? Half the invoices for this customer show one, half the other.
If your normalized tables take a performance hit, buy a bigger box. If you munge the data with replication, you're screwed.

--
If all this should have a reason, we would be the last to know.
Re:afaik... by GLHMarmot · 2002-10-29 11:30 · Score: 3, Insightful

But please, for the love of INSERT DIETY, don't listen to those polarized-my-way-is-the-only-way INSERT INSULT's.
There is always a case for normalization or denormalization. I have developed many different databases of various sizes. ( 10 terabytes) and as a rule I try to be a s normalized as possible. I have worked on some data conversions that were rathy messy due to denormalization. However, I can't think of a single database where there wasn't some type of denormalization for various reasons. From speeding query results to the client's demands.
Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?
Re:afaik... by delus10n0 · 2002-10-29 12:51 · Score: 2

Huh? As far as I know, picpix stores the images on the filesystem (and not in the database). The database is used to track/manage the images, though..

Same goes for similar applications like Gallery, which newest version (v2.0, in progress) will use SQL to drive the backend, and will rely on your filesystem/webserver to serve the images.

--
Not All Who Wander Are Lost
Re:afaik... by sql*kitten · 2002-10-29 22:02 · Score: 2

Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did

Why not do both? Oracle has a datatype called BFILE, in which the actual data is stored in the filesystem, and the row in the table contains a pointer to it. You have the best of both worlds, filesystem access to the image if you want it, or database access, and you can very easily integrate the image with the rest of your relational data.
Re:afaik... by Xunker · 2002-10-30 05:29 · Score: 2, Funny

Ahem.. you assume I can AFFORD Oracle! I had to sell a kidney to buy winter tires for my car this year!

--
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Re:afaik... by eric2hill · 2002-10-30 08:54 · Score: 2

Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?

I would use an Oracle Materialized View and get the speed increase without the extra dollars and without needing the extra hardware. Work smarter, not harder.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN
Re:afaik... by MattRog · 2002-10-31 10:06 · Score: 2

The only reason why I could see a RDBMS advantage to image storage is that you can (usually pretty easily) change the block/page size for storage/retrieval to improve fetching of large chunks of data. Usually you can also have finer-grained memory control to ensure they are forced in memory.

That said, I think it is far wiser and much more cost effective to store 'dumb' (non-relational) content like images on the web/app server.

--

Thanks,
--
Matt
Re:afaik... by rycamor · 2002-10-31 13:58 · Score: 2

i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?
The only intelligent thought I can add to this is: sometimes performance isn't the only factor. Yes, of course--all things being equal-- the filesystem is faster, but filesystems can't enforce data integrity constraints. I agree that it is ludicrous to store non-critical images in the database, such as standard website graphics, photo albums, etc... BUT, if you are creating a document management system, and you want to make sure that scanned document X cannot be deleted unless user Y has viewed it, then a database is often the most sensible solution. Otherwise, your code will have to enforce constraints in two places. And what happens if the IT dept wants another application to play with the same data? If it's all in the database, and your constraints are in the database, then you sleep a little easier.
Re:afaik... by rodgerd · 2002-10-31 14:33 · Score: 2

That's why your database is an excellent place to store the metadata - such as captions, which page they appear on, what the thumbnail is, and so forth - with a pointer to a location on the file system where the image resides. Best of both worlds.

Get a new job by photon317 · 2002-10-29 09:57 · Score: 2

You obviously are working with morons. Very few data-oriented applications need to write their own data-stores. Almost anything you can imagine (complex relational data, object-oriented data, xml stuff, photos, video footage, 3/4D spatial data, etc, etc..), someone has written database software tuned for it. Use it and be happy.

--
11*43+456^2

performance v fexibility by battjt · 2002-10-29 09:57 · Score: 2

A custom DB will be faster than a general purpose DB (by definition).

A relational DB offers great flexibility (pull any data from the database, add most any index to greatly improve the performance, etc).

I would base the decision mostly on the interfaces to the application. I've worked on applications where the company prefered to access the SQL database directly for reporting and I've worked on projects where the only interface to the application was via HTTP/XML. In the latter case, no one cared how we stored the data, so we dumped XML into a filesystem.

Joe

--
Joe Batt Solid Design

Re:performance v fexibility by joto · 2002-10-29 10:30 · Score: 4, Interesting

Experience tells me exactly the opposite. A custom DB will perform better, until you actually start to fill it with lots of data. When that happens, you will find that the many man-years spent developing the expensive proprietary DBMS systems actually resulted in something better than what you could quickly hack together over a few days.
That being said, there are still lot's of valid reasons not to use a real DBMS for every small project. The most important is simplicity. Bringing in hundreds of megabytes of third-party software to store a few kilobytes of data is not only overkill, it's also a maintenance nightmare!
Re:performance v fexibility by larien · 2002-10-29 22:05 · Score: 2

The maintenance nightmare is the best point; why spend x man-hours writing a custom data store when there are thousands of man-hours of experience in Oracle/PostgreSQL/MySQL/MS SQL/whatever? You have access to tried & tested code under a huge variety of circumstance.
Take the time saved by using an out of the box solution (i.e. the DB) and buy a bigger box to handle any inefficiencies you get by using a DB.
Re:performance v fexibility by joto · 2002-10-29 23:55 · Score: 2

Because then you'd have to understand Oracle/PostgreSQL/MySQL/MS SQL/whatever to be able to fix it. Furthermore, you need it to be available to deploy it. And you rely on continued support from the company/individuals in question for it to last more than a few years. If it can be done with a simple text-file, or as a hierarchical database using the file-system, that will make the product simpler, smaller, and less reliant on third-party software. Sometimes reinventing the wheel is a good thing, if the wheel is sufficiently simple...
Re:performance v fexibility by rycamor · 2002-10-31 14:06 · Score: 2

Having played with each of the methods you describe above, I can't imagine how to make data storage simpler than SQL (except for a better relational language, perhaps). Heirarchical data storage is a disaster waiting to happen. (And I mean XML, too).

I mean, of course anything can be argued to the point of stupidiy. If you just want to store a few flat lists of items, then sure: use a text file, but once you go beyond that (and you always will), standardization is your friend.

And we're not talking about hundreds of megabytes of 3rd-party software. Really, the PostgreSQL install footprint is a few megabytes. Or try Interbase/Firebird, which is only about a 3 MB download these days. Both of these DBMS's use ANSI standard SQL, so if you just learn SQL once, you pretty much can use the basics anywhere.

Of course, different DBMS's have different advanced/complex capabilities, but by the time you start needing advanced capabilities, it is still quicker to learn a DBMS than create your own advanced data storage.
Re:performance v fexibility by rodgerd · 2002-10-31 15:11 · Score: 2

I'd rather rely on the availability of Oracle DBAs than the availability of a contractor who hacked together thier own DB substitute.

If the problem is sufficiently simple that a full RDBMS is overkill, some db variant is perfect and ships on every *ix.

i agree with the poster by tps12 · 2002-10-29 10:01 · Score: 2

I have to agree that database reuse is among the most essential parts of running a profitible business. I've worked with all sorts of RDBMSes, from MS to Oracle to PostgresSQL, on everything from the lowliest hand-me-down Linux server to top-of-the-line Big Iron, and I can tell you that any modern database is going to be able to take whatever you can throw at it. I like being able to whip out whatever data we have, shove it in and pull it out again, repeatedly and at a moment's notice. It's this kind of flexibility that makes us keep coming to database systems in the first place.

--

Karma: Good (despite my invention of the Karma: sig)

Understand your needs and then decide... by toybuilder · 2002-10-29 10:12 · Score: 4, Insightful

Sure, there are times when writing an RDBMS-based solution seems like a big overhead. But there's a good reason for using RDBMS on projects that are likely to mutate and add new features over time, and/or have to interoperate with other programs and systems.

On the other hand, if you just want to stores a small array of data that fits in a 100 line text file, and the program is completely closed and self contained, there's no need for the flexibility of a RDBMS.

Imagine a business that has to "send and receive stuff"...

If you're moving two or three little packages to nearby local area businesses, only, you can get by with a small car.

But imagine your regularly ship objects large and small to locations local and international... Then you need an intermodal transportation system. Sure, your interface might be "the shipping guy", but the backbone of the transportation is heavy duty...

We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 10:18 · Score: 5, Interesting

I write a screen scrapping application, that downloads lots (100K + web pages a week). We write absolutely everything page into the database. 5GB is enough to hold about two days worth of pages. So we can't keep it in our production database. Especially because 95% of the data doesn't need to be stored except for auditting, but we keep it around in case a mistake is found, or a new tread in the pages happens.

The reason we use the database instead of the filesystem is deathly simple. The database is god-like. I can do point it time recovery, and guarunttee that the database is completely consistant with the recovery point. If I had all that in a filesystem it'd be harder. It means I have hot-rollover capaibility from server to server, without having to duplicate my filesystem from machine to machine, I just let the hot archive logs deal with that. It means I have one backup system, and one failure recovery plan. It means all I have to monitor is the Oracle tablespace to ensure I don't run out of space. It means when I say "commit", I can hold Oracle accountable for ensuring the data is there, rather then having myself held accountable by the management.

If I was a good little boy and swallowed all the kool-aid, I'd use iFS (Oracle's Internet Filesystem) and it'd be all good. However, I don't I just use a huge array of blob's in my Oracle database.

Now that said, I have a remote filesystem that all of this data gets spooled to. Once spooled there it gets written to CD. Once the CD's are written, they are then used to find, compare, and if they match delete the blobs out of the Database. The CD then deletes the files out of the spool. Duplicate the CD, compare the two, send one offsite.

The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky...

Kirby

Re:We use the DB for nearly everything by pmineiro · 2002-10-29 15:23 · Score: 2

The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky.

I think this gets to the point.

Namely, the hierarchical tree model of filesystems pales in comparison to the relational model of modern databases (first generation databases, way back in the day, were hierarchical).

-- p
Re:We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 19:29 · Score: 5, Interesting

What I thought was interesting, is that you say that you can hold Oracle accountable for missing data. What's to keep Oracle from claiming that you're a dumb ass and messed it up all by yourself? Doesn't the click through on the license basically keep them from being held liable? What do you think management is going to say when Oracle tells them the employee is the one who screwed up?
I've submitted a request into the Oracle iTAR (Technical Assistance Request) system, at 3:00 AM, gotten a call back by 3:10AM, and a resolution by 3:45AM on a relatively esoteric bug. When I say, I get to hold Oracle accountable, I mean, I can hold Oracle accountable to get me up and running pronto with as much data as I'm going to get. It costs an arm and a leg, but the only people I've ever heard of who have lost their data using Oracle just didn't do backups properly. Oracle is pretty serious about keeping your data around. If you stay with stable tested versions, you'll be fine with Oracle.
I've had Oracle help me on de-supportted platforms, using non-standard configurations, doing crazyness of my own making. They always help me when I contact them for support. For production machines, I stay on the tried and true, but Oracle has never let me down in weird situations.
Oracle as a general rule, has *never* told me buzz off it's your fault. They stayed with me, and found it was my fault, and then showed me the doc's where it says I'm doing something that won't work. For all their faults, Oracle has *NEVER* failed me in any way when it comes to support. When it's my fault, I stand there and take it on the chin. However, as a backing store, Oracle will whoop anything I write eight ways to Sunday for speed, reliability, portability, quality, documentation and support. Hand's down. If you can afford it, there is no excuse for writting your own custom storage manager. If you can't afford it, try PostGreSQL. Oracle or PostGreSQL *WILL* be better for 99.9% of the cases out there. Google is one of the few examples of a situation where writting your own is probably a good idea.
For the record, out of the ~200 million records I've processed, I've lost 1, count'em 1 record using Oracle that was Oracle's fault (I've lost any number of them when Linux crashed, but that's my fault, not Oracle's). Even then Oracle clearly identified which one it was, and when it happened, so it was easy to recover.
Kirby
Re:We use the DB for nearly everything by eric2hill · 2002-10-30 08:44 · Score: 2

I have an Oracle contract in front of me. Name one other vendor that has the following clause in their software agreement *by default*...
"If Oracle cannot substantially correct a breach of Oracle's warranties in a commercially reasonable manner, you may end your program license, technical support, or other services and recover the license fees, technical support fees or other services fees paid to Oracle under this agreement..."
This no-nonsense agreement was a selling point for my company. If Oracle can't fix the fuck-up, they'll give you your money back. Period. In the contract. No verbal agreements. It's in writing.
I have never lost data using Oracle. Not once. I've had bad sectors on different hard drives crop up and Oracle has ALWAYS been recoverable. Yes, it's more expensive than a solid-gold custom-molded toilet seat with a free midget butt-wiper, but for data integrity you just can't beat it.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN

RDBMSs Rock by MrBlack · 2002-10-29 11:38 · Score: 2

Perhaps it's just that I can't think "outside the square" or something, but I can't really think of ANY application I've worked on where you couldn't make a good case for storing the data in a database. In some cases I've used XML files (when there was only a very small amount of data to be stored) but anything bigger than that I've always stored in a database. Perhaps under some circumstances (that I can't think of right now) you _might_ want to roll your own storage system....but I think these sort of projects would be the exception (handled of course!) rather than the rule.

Maintenance by gmhowell · 2002-10-29 11:38 · Score: 2

In one scenario, you maintain business logic, info storage, display, and all sorts of other crap. In the other, you only maintain logic and display. It's easier to force your customers into lockin with some proprietary mish mash, but there are others (like myself) who will turn you down flat.

Internal stuff is similar. Do you want to maintain EVERYTHING, or just half of it?

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

some pluses, minuses by clem.dickey · 2002-10-29 11:43 · Score: 2

Pluses: the database takes care of synchronization, and nearly takes care of backup/recover. Very nice. Some DB rigor may rub off on your designers.

Minuses: the DBM is large (in MB, in install/config requirements, and in CPU usage) and your customer may not be running the DBM brand/version which you have tested your app with. Supporting multiple DB vendors is a pain. SQL is sort of standard, but the table definitions tend to vary. Ick.

Can you afford to lose the data? by SpaceLifeForm · 2002-10-29 11:59 · Score: 2

If the data is not critical or can be easily re-created, then a filesystem will suffice.
But if the data is critical to the business, and/or not easily be re-created,
the data should go into a real DB that is Managed Properly(tm).

--
You are being MICROattacked, from various angles, in a SOFT manner.

3rd normal form by jbolden · 2002-10-29 12:04 · Score: 2

But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

Good DBMSes can break complex joins catching the criteria piece by piece. You can also create run time extracts which are used by real time / almost real time systems for read access. However your advice is simply terrible. Once you lose normal form you lose the associative law on your table algebra. That means join operations are not defined independently of order they are performed in and that is very bad. Rick mentions an example of this in terms of addresses but it can get far worse.

Denormalisation by arb · 2002-10-29 14:59 · Score: 2

Sometimes, well-thought out denormalisation can make a huge, positive impact on an application. Yes, it can be difficult to make sure you don't have any anomalies in your data, but with a rigourous design and development methodology, these problems can be minimised.

Don't denormalise for the sake of denormalising - the trick is to know when to break the rules and to do so very carefully.

Denormalisation is only one tool that can be used to improve the performance of a system, and of course, other options like more memory, faster CPUs and better code should be addressed first.

Normalisation by arb · 2002-10-29 15:06 · Score: 2

But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

Don't normalise your database at your own peril!

Learn how to properly normalise a database (3NF is usually good enough) and then learn how to write decent queries and tune your indexes appropriately. In some circumstances it may be worthwhile caching some data in extra tables which are refreshed periodically. In even rarer circumstances it may be necessary to denormalise the database, but always normalise it and only denormalise if you encounter some intractible performance issues.

We have some incredibly complex queries at my current client and we have managed to gain performance improvements by re-writing certain queries. Splitting complex queries up and using temp tables, derived tables, sub-queries, etc can help and you will have less of a problem with data anomalies which can creep into a denormalised structure.

Re:Normalisation by SirSlud · 2002-10-29 18:05 · Score: 2

Amen. What the parent poster didnt realize is that when youre dealing with 3 tables, one of which has 52 million records in it, joins are the non-option. 4 day queries are not permitted. We get around it denormalizing to certain degrees (certainly not rigorously all the way to 3NF), using tmp tables, being as religious as possible about indexing .. all with mysql, baby.

> a million records

Thats chump change! Try 50 million and then we'll talk database ;)

--
"Old man yells at systemd"
Re:Normalisation by arb · 2002-10-29 18:36 · Score: 2

What the parent poster didnt realize is that when youre dealing with 3 tables, one of which has 52 million records in it, joins are the non-option. 4 day queries are not permitted.

Just a quick check of one our systems shows a table with 1.5 million records. Not a large table, but not trivial either. Due to the nature of the data and the structure of the database, it is necessary to execute queries containing 10-12 self-joins routinely. One report requires 16 joins in total. The slowest of these queries takes about 30 seconds, most are well under 2 seconds.

Sure, we could denormalise our data to make the queries simpler (multi-table joins scare DBAs for some reason) but we would lose a lot of the flexibility that our design affords us. We have no need to denormalise this database because we wrote more efficient queries and have paid careful attention to our indexes.

The largest table in our production database is under 10 million rows. We have simulated much larger sizes in our test environments though, and found our joins are not a problem. The performance with 10 times the data is still well within acceptable range for our requirements. (Most stored procedures execute under 5 seconds with this size.) If you have problems with a three table join, head back to school and learn how to do it properly! ;-)

You say you are as religious as possible about indexing, but what about the structure of your queries? Are you indexing the right columns? You are using temp tables? Try getting rid of them - in many cases they slow things down. Make sure your queries are sargable. Make sure you are limiting the intermediate result-sets to be as small as possible. Understand the query execution plan. Make sure your server has enough memory! Are your server's settings tuned for your particular application? There are many, meny things to pay attention to, but you should not be having any problems with a 52 million row table in a three-table join!

When you start doing real joins with 10 or more tables (each with >1 million rows) then we'll talk about how to denormalise the data to improve performance.

8-)
Re:Normalisation by MattRog · 2002-10-31 10:03 · Score: 2

We routinely perform joins on 150+ million row, 10 to 20GB tables. They generally perform very well... Then again we use Sybase ASE 12.5.

--

Thanks,
--
Matt
Re:Normalisation by rodgerd · 2002-10-31 14:53 · Score: 2

Been there, done that. Joins aren't a problem with the right indices.

Storing data... by arb · 2002-10-29 15:14 · Score: 2

It really depends on what data you are storing. How much data, how critical, what are you doing with it, etc...

Sometimes, Excel is good enough. Or XML. Or plain text files. Or a custom file format. etc...

If the people working on your project have only ever worked with databases, they will want to use databases for everything. Most stuff will fit into a database, but sometimes it is not apprpriate to do so - as other have mentioned here, storing images in a database is not always a good idea, but you would probably want to store the location of the images in a database.

If you need to be able to ship the data around to different machines/offices/clients/over the net/etc, then maybe an XML file will be best. Custom file formats may be appropriate in some cases too. (Though I'd lean towards a more open file format.)

Why not do the whole project as a one-off? by Samrobb · 2002-10-29 16:17 · Score: 3, Informative

The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project.

Huh? Do you create a custom C library for every application as well? How about a custom UI toolkit? Custom preprocessor/compiler?

Sounds kind of silly, doesn't it.

So why do these folks think a "customized one-off data storage solution" sounds any better? It's the same problem - you can either use something that's already been debugged, tested, and tweaked for performance, or you can spend your own time and effort to create it yourself. That's time and effort that could go towards coding and testing the final product, but is instead spent elsewhere (probably because someone thinks that using a dabatbase for storage would make the application "bloated").

I think the problem is probably that when you mention using a "database", most people equate that term with "general purpose database server" (Oracle, SQL Server, Postgress, MySQL, etc.) There are libraries available that were specifically designed to offer programs lightweight database access without the pain of using a full-fledged RDBMS. Search Google for embedded database, xbase library, or open source database library to start... there are any number of toolktis that will allow you to create a very customized storage solution without having to create "one-off" code for each and every project.

--
"Great men are not always wise: neither do the aged understand judgement." Job 32:9

Re:Why not do the whole project as a one-off? by Carpathius · 2002-10-30 03:56 · Score: 2

It really depends on what data is being stored and how much. For many, many applications a simple flat file using random access methods works just fine. For managing more data there are libraries such as you mention. And for some projects a full fledged RDBMS is the right way to go.

The proper way to go about this is to analyse what needs to be stored and choose the solution that provides the best match based upon needed functionality, system use, and programming time.

Sean.

Embedded applications by Karora · 2002-10-29 19:49 · Score: 4, Insightful

I have worked as an application developer / designer with DBMS backed applications for the last 17 years. There are reasons for not choosing a database, but not usually very good ones.

When you want speed and flexibility and scalability and reliability and extendability and particularly developer productivity you will undoubtedly end up shooting yourself in the foot later if you avoid some form of DBMS up front.

Where you have a particularly well-defined, narrow functionality, and performance in a small footprint is a requirement, an RDBMS may not be such a good choice, but DB libraries like berkeley db can still be very useful.

And with PostgreSQL, Firebird , MySQL and so many other free, open-source projects out there covering such a broad spectrum of needs for a database, why would you not use that expert work?

--

...heellpppp! I've been captured by little green penguins!

Re:Lol, I think I worked in the same place! by floydigus · 2002-10-30 05:22 · Score: 2

Definitely not Dallas, but in a way I think these places are more a state of mind than a physical place ;)

--

All things in moderation; including moderation

Re:DB based GUI + Satellite = Horror by toybuilder · 2002-10-30 08:34 · Score: 2

What's so wrong about having the GUI stored in the database? It's not like you can't download once to the local client... Heck, AOL's been doing that for years!

Just because there's a database in the center of everything, doesn't mean the clients can't cache data locally....

When to use a relational database : by PinglePongle · 2002-10-31 06:17 · Score: 2

When you can imagine querying the data you are entering - you can't easily query images, or other binary data (although I guess there must be someone working on this problem somewhere...). If you can't query it, you should usually find a better place to store it - NAS is usually fine - and maintain a pointer to it (e.g. a filename). Yes, it's something that can get un-synced, but most databases suck when it comes to actually dealing with binary data, and you can use that capacity a lot more effectively elsewhere.

When the structure of the data is likely to remain stable. If your application deals with well-understood entities, whose properties are unlikely to change over time, a database is a great solution.
Databases are, however, relatively change-resistant - it's typically a pain in the backside to change the datatype of a column, remove columns etc. So, if you're working in a domain where you continuously learn new things about your core entities, or if your development processes are highly iterative, you might be better off using an alternative data storage mechanism.

When more than a single user is likely to access the data - yes, you can create locking mechanisms yourself. You can also take your own garbage to the local dump. It's usually not a good use of your time, and the cost of not dealing with the issues involved are expensive, both for garbage and concurrent access to shared data.

When you require consistency accross transactions - the good old ACID (atomicity, consistency, isolation, durability) principles which become important for many non-trivial applications.

If you care about enforcing rules of referential integrity - do you want to ensure that all the tracks in your record collection can be tied back to a recording ? Do all orders have to have a customer ? Those things are far simpler to implement with an RDBMS than in code.

There are instances where using an RDBMS is not appropriate. Ones that spring to mind are :
- your business domain is not well understood or liable to rapid change. In this case, the cost of change for database objects is likely to be a problem - consider storing data in a self-describing format like XML.
- the application domain doesn't lend itself to being described in relational terms - image manipulation tools, word processors etc. which deal with mainly binary information probably should not use a relational model for their core data structures.

Alternatives exist - Object Oriented databases are becoming more and more popular. I have way too little experience with these to comment on their use.

--
It's all very well in practice, but it will never work in theory.

Slashdot Mirror

When is Database Muscle Too Much?

52 of 93 comments (clear)