When is Database Muscle Too Much?

More of a "dilbert" story by dacarr · 2002-10-29 09:41 · Score: 3, Funny

Slightly off topic as it were, but I've noticed that a lot of people seem to think that Excel works very nicely as a database. In some cases this might be true, but the bigger you get the more problems you have, and I just can't seem to convince those of a less-than-technical mind (read: management) otherwise.

--
This sig no verb.

Re:More of a "dilbert" story by sydney · 2002-10-29 10:00 · Score: 1

My company is the same way! EVERYTHING is in an Excel spreadsheet, its absolutely ridiculous. After showing them what Access was and how it wasn't too difficult to use, they thought I was brilliant. Of course, a few key people thought it was too "complicated" and back to spreadsheets we go
Re:More of a "dilbert" story by leviramsey · 2002-10-29 10:02 · Score: 4, Funny

Doesn't Google run off of a huge Excel spreadsheet?
Re:More of a "dilbert" story by Methuseus · 2002-10-29 11:14 · Score: 1

I used to work at the pension co for a big name company. they had an excel spreadsheet for each person. Each spreadsheet was 9 MB compressed with winzip, and something like 15 MB or so uncompressed, I think. I thought they were sorta stupid for not using databases, too, and it seems like I was right.

--
Two things are infinite: the universe and human stupidity, though I'm not yet sure about the universe. - A Einstein
Re:More of a "dilbert" story by floydigus · 2002-10-29 11:21 · Score: 2

In my experience this occurs in outfits where the people who hold the purse strings are not also the people who understand computers. They may have no experience of what real systems can do for them.
I used to work at a place where mission critical data was stored in a massive Paradox database to which all these crazy jarheads had admin rights. Just stupid. You ended up with thousands of copies of tables and no-one knew which one was the right one to use. Then they decided to go SQL Server and what did they do? Remodel the data structure? No. They just copied the damn Paradox tables one for one.
They used to back up the c:\ drives of every machine in the office every lunchtime in that place.
Dickheads.

--
All things in moderation; including moderation
Re:More of a "dilbert" story by arb · 2002-10-29 14:52 · Score: 2

Doesn't Google run off of a huge Excel spreadsheet?

No, they use pigeons!
Re:More of a "dilbert" story by Anonymous Coward · 2002-10-29 15:22 · Score: 0

I too have worked for a number of companies where Excel is rife. It seems that because it has such a low learning curve to get started.

Even the so called technical types used excel. I felt duty bound to show Access and MySQL to them. But NO! Excel is where they stand and they say it does all it needs to do.

My question, is "What do you do when your data/reporting needs to change?" They say - "It won't Change".

Sure enough, 6 months later when management wants a new type of report the shit hit the fan and, rather than fixing the problem with a db, they go and redesign the excel sheet.

One sheet they had, no shit, took 45 mins to load, cause it had graphs on all 30 odd worksheets.

I resigned from the company.
Re:More of a "dilbert" story by sien · 2002-10-29 16:11 · Score: 2

Well, actually the ability of Excel to act as a primitive database was one of the reasons it has wound up being so popular.
Joel Spolsky wored with MS on the Excel team and points out that in user studies they did the ability of excel to record data in such a way was important in it's adoptance. Check out the chapter from his excellent book User Interface Design for programmers and search for excel.
Re:More of a "dilbert" story by sharkey · 2002-10-30 04:08 · Score: 2

but I've noticed that a lot of people seem to think that Excel works very nicely as a database.

Just the opposite of what we have here. Folks here seem to think that Access is just a wonderful spreadsheet program. (4 databases, 1 table each, 50+ columns)

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
Re:More of a "dilbert" story by Anonymous Coward · 2002-10-30 08:08 · Score: 0

One good 'dilbert' story deserves another
Our warehouse people decided that the SAP warehouse module was either too much effort to learn or something. They went off the reservation and started tracking inventory in an Excel spreadsheet.
A month went by and they called me complaining that Excel was 'loading slow'. It sure was - d'ya know how LONG it takes to load an .xls file that is 21mb huge?

"Is database application" by RaboKrabekian · 2002-10-29 09:42 · Score: 2

One company I worked for contracted out an application we had to build to a provider who brought in this crazy, dishevled, brilliant Russian database engineer. I remember that with every issue that would come up he would say, "Is database application" and go off muttering to himself. Content management software needed? "Is database application." File system problems? "Is database application." new mouse drivers? "Is database application." What to order for lunch? "Is database application."

The moral of the story? Any computer application is built most quickly and easily using a database solution.

--
"Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02

Re:"Is database application" by Rick+the+Red · 2002-10-29 10:23 · Score: 2

"When all you have is a hammer, everything looks like a nail"

--
If all this should have a reason, we would be the last to know.
Re:"Is database application" by Anonymous Coward · 2002-10-29 16:18 · Score: 0

"When all you've had for the past 30 years is a hammer, you become really fucking good at using a hammer for all problems"
Re:"Is database application" by chthon · 2002-10-30 03:16 · Score: 1

It has less to do with databases, than with tables. A database is just a very structured and very good way to handle tables.
If you can describe things in tables, put it in a database.
I think that the main reason that people use Excel, is because you need no programmer in between. If you want to use a database, then you need to create new interfaces for every table and view that you add. This asks for someone who understands databases and interface design, and knows how to get information out of people, etc...
Btw, isn't the S/400 filesystem a DB/2 database ?
Re:"Is database application" by RobertEdwards · 2002-10-30 05:10 · Score: 1

"Btw, isn't the S/400 filesystem a DB/2 database ?"

Well, to oversimplify, yes. But only part of it.

AS400 / I-Series computers actually have multiple file systems, arranged heiraraclly under what they call an "Integrated File System". Some newer file systems will hold any old blob of data in a file, whereas the original file system is basically inherited from the old S38 library system.

Basically, under ROOT there's a branch that follows OS2 or NT file naming conventions, under QOPENSYS a branch that emulates Unix file naming system, etc. These have directories and files, as you'd expect in most OS filesystems. But the QSYS.LIB is special. It's a library, not a directory. On the AS400 libraries are quite a bit more structured. Among other things, a library can act as a database "collection".

Some objects in the libraries are DB2 tables, views, indexes, and transaction journals. Others are stored procedures, queries and application programs.

All other libraries live in QSYS.LIB, and no other library can contain another library.

Now, since the S38 predated the DB2 product, and for that matter SQL, there are many database tools on an AS400 that no other DB2 platform has. And these tools are useful. There used to be a couple of logical file tricks (indexed views) you can do with DDS that SQL didn't do for example. But most tools from the mainframe and unix version have been ported to the I Series.

Neat boxes, those AS400. Single source as all hell though.

Re:More of a "dilbert" story <OT> by Anonymous Coward · 2002-10-29 09:50 · Score: 0

Speaking of Excel ... I work at an engineering firm where a couple of the manger types seem to think that excel is superior to matlab (which they also have installed on their machines ... but never use). Sometimes I am amazed (although usually disgusted ;) by the things that the excel spreadsheets they hand me do.

They will set up sheets to produce input data sets for decent sized monte carlo runs and then hand them to me to do the actual runs ... I then have to reboot into w2k in order to extract the data into a text file, reboot into linux and then do the real work <sigh> they are management so I can't complain tooo loudly.

And don't get me started on the BUT UGLY graphs that excel produces ... excel should NEVER be used to plot more than about 10 data points ... EVER!

afaik... by pizza_milkshake · 2002-10-29 09:52 · Score: 3, Interesting

i used to be a big php fan (still a fan, but not a big one) and i was always surprised/dismayed that everyone wanted to store images for web-based applications (catalog-style images, user-submitted icons/graphics, banners, etc) in a database.

i was always under the impression that the filesystem was a better place for this, assuming the directory structure was simple or fixed (i.e. you wouldn't be creating thousands of subdirs dynamically). why store all your banners in a table as BLOBs when you can simply have a web-accessible directory and store them there?

i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?

Re:afaik... by ceejayoz · 2002-10-29 09:59 · Score: 2

I've heard anecdotal evidence saying that the filesystem is faster and less processor intensive than storing binary data in the database. That said, I usually store images in my databases. :-)
Re:afaik... by Anonymous Coward · 2002-10-29 10:00 · Score: 0

File system really is better for performance (really, static files almost always are).

The real advantage I could see in using a database for images is permissions. In a shared web environment, you're rather limited, and something like this might make management easier.
Re:afaik... by UberChuckie · 2002-10-29 10:01 · Score: 0

Most projects that I have been involved with stored the path to the images (the images on the filesystem) in the database.
Re:afaik... by inerte · 2002-10-29 10:04 · Score: 1

Filesystem is faster. Mysql's page about optimization also says this. I guess database makers don't want you to know this because they want their database used more.

--
Buy a Nintendo DS Lite
Re:afaik... by Xunker · 2002-10-29 10:12 · Score: 4, Insightful

I'm running a moderately large site that deals heavily with images and at first I thought it would be great store all the images in the database along with the incidentals. Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did:

* Size: I didn't anticipate the user would upload 5 gig of images
* Access: You need some sort of extraneous code to pull those images from the database
* Communication: your code must know how to fake being a proper image (right headers, creation/change times, etc)
* Size 2: A lot of databases can't store high-res images (read: large images) in a database without serious penalties (like chopping them up into little bits to fit into a MySQL bigblob)

Speed can be addressed in the perl world via FastCGI or Mod_perl and similar ways on other platforms, but you'll still have to do disk reads to get the data and you'll pay a price for having one more often used script in memory.

Storage size can be counteracted with clever tricks or "professional grade" (read: expensive) DB Engines like Oracle and DB2 which have binary data storage as one of their features, but you'll need to pay money and have a big muscly machine to run 'em.

For my money, 90% image serving can and should be done from filesystem because that's what web servers are made for. The other 10% are weird meta things like this that could use the file system but are designed to use database.

My big thing is using your head and ask yourself what will be easier in the long run? Sure, binary data slung around with Perl DBI sounds convenient, but how conevneint is it to run "ismcheck" on a 100 GB database, eh?

--
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Re:afaik... by Old+Uncle+Bill · 2002-10-29 10:16 · Score: 2, Interesting

Yes, this is definitely bad. Images are typically stored in the db as Binary Large OBjects (BLOBs) which no database system on the planet is good at retrieving quickly. Also, updating into the database can take longer.

The real killer in database performance comes in two places, large complex joins and full table scans. Eliminate these two things in your db and you should never have scaling problems. To do this, watch your long running queries and make sure they have the proper indexes on the tables. And make sure you keep your statistics updated. But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

--
Yes, I am an agent of Satan, but my duties are largely ceremonial.
Re:afaik... by Rick+the+Red · 2002-10-29 10:31 · Score: 4, Informative

But please, for the love of God, don't listen to those "real world experience" morons who say denormalization is key. That's all well and good until you've got the same information replicated to hell and gone and records start to disagree: Is the billing address 123 North Main or 1313 Mockingbird Lane? Half the invoices for this customer show one, half the other.
If your normalized tables take a performance hit, buy a bigger box. If you munge the data with replication, you're screwed.

--
If all this should have a reason, we would be the last to know.
Re:afaik... by GLHMarmot · 2002-10-29 11:30 · Score: 3, Insightful

But please, for the love of INSERT DIETY, don't listen to those polarized-my-way-is-the-only-way INSERT INSULT's.
There is always a case for normalization or denormalization. I have developed many different databases of various sizes. ( 10 terabytes) and as a rule I try to be a s normalized as possible. I have worked on some data conversions that were rathy messy due to denormalization. However, I can't think of a single database where there wasn't some type of denormalization for various reasons. From speeding query results to the client's demands.
Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?
Re:afaik... by Anonymous Coward · 2002-10-29 12:10 · Score: 0

I agree with the general concept of what you are saying and on most systems you are 100% correct on all four of your points.

But I make an exception for Oracle. Oracle using RAW disk access can intelligently read/write images, store them intelligently for fast access and if you use an extention like InterMedia it can use wavelet encoding of images that make them take up alot less space.

My company stores around 11-45 megs of images per minute... we are reading and writing from 15-75 gigs of image data everyday and we got large preformance increase moving from Ext3 to Oracle with RAW, so the old rule that the filesystem will always be faster than the DB is only true when you put the DB ontop of the filesystem rather than the raw hardware.

This doesn't make you points less valid, we still do have todo alot of special handling because of how we work with the images, but in this case it was worth it.
Re:afaik... by delus10n0 · 2002-10-29 12:51 · Score: 2

Huh? As far as I know, picpix stores the images on the filesystem (and not in the database). The database is used to track/manage the images, though..

Same goes for similar applications like Gallery, which newest version (v2.0, in progress) will use SQL to drive the backend, and will rely on your filesystem/webserver to serve the images.

--
Not All Who Wander Are Lost
Re:afaik... by Xunker · 2002-10-29 13:44 · Score: 1

Ah, I could've been wrong - I haven't followed fotobilder dev for quite a while and I know in the early early days a pure database version was on the table but it makes sense, with the amount of images it's designed to store, that it uses the file system for the actually binary data.

--
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Re:afaik... by JohnFluxx · 2002-10-29 13:53 · Score: 1

Why are there so much anti-university posts on slashdot?
In our database course we were taught very clearly why and why not to normalise, and how much.

Are US universities really so bad as most /.'ers make out? Move to england if so :)

I think the reason is that you have
a) The person who is good at computers, so skips uni and goes to work
b) The person who isn't that good, goes to uni then goes to work
c) The person who is good at computers, goes to uni, and then goes to work.

It seems ppl in group A keep meeting ppl in group B. Perhaps this might be because ppl in group C work in jobs that ppl in group A don't, so they never meet.

Thoughts?
Re:afaik... by phagstrom · 2002-10-29 19:34 · Score: 1

any intelligent thoughts?

They are preparing for the new SQL based FS from MS....
Re:afaik... by vadim_t · 2002-10-29 20:52 · Score: 1

I work on a program with a quite denormalized database. The thing is simple. To retrieve some kinds of data you need to do some fairly long SELECTs, so that data is duplicated. This is things like the last vendor who shipped a product, and the last date of shipment, for example. Individually this isn't that slow, of course, but we need to generate this information for every product sometimes. All this data can be easily recalculated, and I have a program exactly for that. So really there's no consistency issue unless there's a bug, and a bug fix and a pass of the program should take care of that.
Re:afaik... by Anonymous Coward · 2002-10-29 21:59 · Score: 0

I think I'm stuck with group A even though I'm from group C and the group A's are defensive. I keep showing them up and they keep crapping on me to make up for it. Wait a minute... maybe I'm in group B after all... nah, I'm definately group C material.
Re:afaik... by sql*kitten · 2002-10-29 22:02 · Score: 2

Happily I had pause for thought because the site went big and decided to go with file system storage for the images instead, and I'm glad I did

Why not do both? Oracle has a datatype called BFILE, in which the actual data is stored in the filesystem, and the row in the table contains a pointer to it. You have the best of both worlds, filesystem access to the image if you want it, or database access, and you can very easily integrate the image with the rest of your relational data.
Re:afaik... by chthon · 2002-10-30 03:18 · Score: 1

postgreSQL supports BLOB's, but the organisation suggests to store the names in a table and the binary objects in the filesystem.
Re:afaik... by Xunker · 2002-10-30 05:29 · Score: 2, Funny

Ahem.. you assume I can AFFORD Oracle! I had to sell a kidney to buy winter tires for my car this year!

--
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
Re:afaik... by eric2hill · 2002-10-30 08:54 · Score: 2

Yes, I could have recommended that my clients buy bigger hardware but when a summary table can be used instead of spending $X thousand dollars, what do you think you would do?

I would use an Oracle Materialized View and get the speed increase without the extra dollars and without needing the extra hardware. Work smarter, not harder.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN
Re:afaik... by Anonymous Coward · 2002-10-31 06:01 · Score: 0

"those who can, do, those who can't, teach"
Re:afaik... by MattRog · 2002-10-31 10:06 · Score: 2

The only reason why I could see a RDBMS advantage to image storage is that you can (usually pretty easily) change the block/page size for storage/retrieval to improve fetching of large chunks of data. Usually you can also have finer-grained memory control to ensure they are forced in memory.

That said, I think it is far wiser and much more cost effective to store 'dumb' (non-relational) content like images on the web/app server.

--

Thanks,
--
Matt
Re:afaik... by rycamor · 2002-10-31 13:58 · Score: 2

i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?
The only intelligent thought I can add to this is: sometimes performance isn't the only factor. Yes, of course--all things being equal-- the filesystem is faster, but filesystems can't enforce data integrity constraints. I agree that it is ludicrous to store non-critical images in the database, such as standard website graphics, photo albums, etc... BUT, if you are creating a document management system, and you want to make sure that scanned document X cannot be deleted unless user Y has viewed it, then a database is often the most sensible solution. Otherwise, your code will have to enforce constraints in two places. And what happens if the IT dept wants another application to play with the same data? If it's all in the database, and your constraints are in the database, then you sleep a little easier.
Re:afaik... by rodgerd · 2002-10-31 14:33 · Score: 2

That's why your database is an excellent place to store the metadata - such as captions, which page they appear on, what the thumbnail is, and so forth - with a pointer to a location on the file system where the image resides. Best of both worlds.
Re:afaik... by JohnFluxx · 2002-10-31 23:07 · Score: 1

:)

Actually our university got the highest points in terms of research. It means that our lecturers are (generalising) not that good at lecturing, but are good in knowledge. That suits me okay tho.

Get a new job by photon317 · 2002-10-29 09:57 · Score: 2

You obviously are working with morons. Very few data-oriented applications need to write their own data-stores. Almost anything you can imagine (complex relational data, object-oriented data, xml stuff, photos, video footage, 3/4D spatial data, etc, etc..), someone has written database software tuned for it. Use it and be happy.

--
11*43+456^2

performance v fexibility by battjt · 2002-10-29 09:57 · Score: 2

A custom DB will be faster than a general purpose DB (by definition).

A relational DB offers great flexibility (pull any data from the database, add most any index to greatly improve the performance, etc).

I would base the decision mostly on the interfaces to the application. I've worked on applications where the company prefered to access the SQL database directly for reporting and I've worked on projects where the only interface to the application was via HTTP/XML. In the latter case, no one cared how we stored the data, so we dumped XML into a filesystem.

Joe

--
Joe Batt Solid Design

Re:performance v fexibility by joto · 2002-10-29 10:30 · Score: 4, Interesting

Experience tells me exactly the opposite. A custom DB will perform better, until you actually start to fill it with lots of data. When that happens, you will find that the many man-years spent developing the expensive proprietary DBMS systems actually resulted in something better than what you could quickly hack together over a few days.
That being said, there are still lot's of valid reasons not to use a real DBMS for every small project. The most important is simplicity. Bringing in hundreds of megabytes of third-party software to store a few kilobytes of data is not only overkill, it's also a maintenance nightmare!
Re:performance v fexibility by larien · 2002-10-29 22:05 · Score: 2

The maintenance nightmare is the best point; why spend x man-hours writing a custom data store when there are thousands of man-hours of experience in Oracle/PostgreSQL/MySQL/MS SQL/whatever? You have access to tried & tested code under a huge variety of circumstance.
Take the time saved by using an out of the box solution (i.e. the DB) and buy a bigger box to handle any inefficiencies you get by using a DB.
Re:performance v fexibility by joto · 2002-10-29 23:55 · Score: 2

Because then you'd have to understand Oracle/PostgreSQL/MySQL/MS SQL/whatever to be able to fix it. Furthermore, you need it to be available to deploy it. And you rely on continued support from the company/individuals in question for it to last more than a few years. If it can be done with a simple text-file, or as a hierarchical database using the file-system, that will make the product simpler, smaller, and less reliant on third-party software. Sometimes reinventing the wheel is a good thing, if the wheel is sufficiently simple...
Re:performance v fexibility by rycamor · 2002-10-31 14:06 · Score: 2

Having played with each of the methods you describe above, I can't imagine how to make data storage simpler than SQL (except for a better relational language, perhaps). Heirarchical data storage is a disaster waiting to happen. (And I mean XML, too).

I mean, of course anything can be argued to the point of stupidiy. If you just want to store a few flat lists of items, then sure: use a text file, but once you go beyond that (and you always will), standardization is your friend.

And we're not talking about hundreds of megabytes of 3rd-party software. Really, the PostgreSQL install footprint is a few megabytes. Or try Interbase/Firebird, which is only about a 3 MB download these days. Both of these DBMS's use ANSI standard SQL, so if you just learn SQL once, you pretty much can use the basics anywhere.

Of course, different DBMS's have different advanced/complex capabilities, but by the time you start needing advanced capabilities, it is still quicker to learn a DBMS than create your own advanced data storage.
Re:performance v fexibility by rodgerd · 2002-10-31 15:11 · Score: 2

I'd rather rely on the availability of Oracle DBAs than the availability of a contractor who hacked together thier own DB substitute.

If the problem is sufficiently simple that a full RDBMS is overkill, some db variant is perfect and ships on every *ix.

i agree with the poster by tps12 · 2002-10-29 10:01 · Score: 2

I have to agree that database reuse is among the most essential parts of running a profitible business. I've worked with all sorts of RDBMSes, from MS to Oracle to PostgresSQL, on everything from the lowliest hand-me-down Linux server to top-of-the-line Big Iron, and I can tell you that any modern database is going to be able to take whatever you can throw at it. I like being able to whip out whatever data we have, shove it in and pull it out again, repeatedly and at a moment's notice. It's this kind of flexibility that makes us keep coming to database systems in the first place.

--

Karma: Good (despite my invention of the Karma: sig)

Oracle verses Perl by mauryisland · 2002-10-29 10:03 · Score: 1

When you work in an environment where dog + uncle is either an Oracle DBA, MCSE or VP programmer, you wind up with projects like this: import random length string data from an embedded device into Oracle table, write some really convoluted stored procedures to parse the string data, write a nifty VB program to fetch the results for display to the users, who don't edit the data, just view it.

In short, you get some expensive licenses and a lot of work to manage what could be done with a short shell script.

Anybody ever hear about: grep, cut, sed, awk, sort, uniq... God forbid Perl?

When your only tool is a hammer...

Understand your needs and then decide... by toybuilder · 2002-10-29 10:12 · Score: 4, Insightful

Sure, there are times when writing an RDBMS-based solution seems like a big overhead. But there's a good reason for using RDBMS on projects that are likely to mutate and add new features over time, and/or have to interoperate with other programs and systems.

On the other hand, if you just want to stores a small array of data that fits in a 100 line text file, and the program is completely closed and self contained, there's no need for the flexibility of a RDBMS.

Imagine a business that has to "send and receive stuff"...

If you're moving two or three little packages to nearby local area businesses, only, you can get by with a small car.

But imagine your regularly ship objects large and small to locations local and international... Then you need an intermodal transportation system. Sure, your interface might be "the shipping guy", but the backbone of the transportation is heavy duty...

We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 10:18 · Score: 5, Interesting

I write a screen scrapping application, that downloads lots (100K + web pages a week). We write absolutely everything page into the database. 5GB is enough to hold about two days worth of pages. So we can't keep it in our production database. Especially because 95% of the data doesn't need to be stored except for auditting, but we keep it around in case a mistake is found, or a new tread in the pages happens.

The reason we use the database instead of the filesystem is deathly simple. The database is god-like. I can do point it time recovery, and guarunttee that the database is completely consistant with the recovery point. If I had all that in a filesystem it'd be harder. It means I have hot-rollover capaibility from server to server, without having to duplicate my filesystem from machine to machine, I just let the hot archive logs deal with that. It means I have one backup system, and one failure recovery plan. It means all I have to monitor is the Oracle tablespace to ensure I don't run out of space. It means when I say "commit", I can hold Oracle accountable for ensuring the data is there, rather then having myself held accountable by the management.

If I was a good little boy and swallowed all the kool-aid, I'd use iFS (Oracle's Internet Filesystem) and it'd be all good. However, I don't I just use a huge array of blob's in my Oracle database.

Now that said, I have a remote filesystem that all of this data gets spooled to. Once spooled there it gets written to CD. Once the CD's are written, they are then used to find, compare, and if they match delete the blobs out of the Database. The CD then deletes the files out of the spool. Duplicate the CD, compare the two, send one offsite.

The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky...

Kirby

Re:We use the DB for nearly everything by pmineiro · 2002-10-29 15:23 · Score: 2

The other reason we use the database, that it's easier to deal with in our application, because writting a join against the filesystem is tricky.

I think this gets to the point.

Namely, the hierarchical tree model of filesystems pales in comparison to the relational model of modern databases (first generation databases, way back in the day, were hierarchical).

-- p
Re:We use the DB for nearly everything by Anonymous Coward · 2002-10-29 15:34 · Score: 0

That's pretty nifty thing that you're doing. I don't really take exception to that.

What I thought was interesting, is that you say that you can hold Oracle accountable for missing data. What's to keep Oracle from claiming that you're a dumb ass and messed it up all by yourself? Doesn't the click through on the license basically keep them from being held liable?

What do you think management is going to say when Oracle tells them the employee is the one who screwed up?
Re:We use the DB for nearly everything by ComputerSlicer23 · 2002-10-29 19:29 · Score: 5, Interesting

What I thought was interesting, is that you say that you can hold Oracle accountable for missing data. What's to keep Oracle from claiming that you're a dumb ass and messed it up all by yourself? Doesn't the click through on the license basically keep them from being held liable? What do you think management is going to say when Oracle tells them the employee is the one who screwed up?
I've submitted a request into the Oracle iTAR (Technical Assistance Request) system, at 3:00 AM, gotten a call back by 3:10AM, and a resolution by 3:45AM on a relatively esoteric bug. When I say, I get to hold Oracle accountable, I mean, I can hold Oracle accountable to get me up and running pronto with as much data as I'm going to get. It costs an arm and a leg, but the only people I've ever heard of who have lost their data using Oracle just didn't do backups properly. Oracle is pretty serious about keeping your data around. If you stay with stable tested versions, you'll be fine with Oracle.
I've had Oracle help me on de-supportted platforms, using non-standard configurations, doing crazyness of my own making. They always help me when I contact them for support. For production machines, I stay on the tried and true, but Oracle has never let me down in weird situations.
Oracle as a general rule, has *never* told me buzz off it's your fault. They stayed with me, and found it was my fault, and then showed me the doc's where it says I'm doing something that won't work. For all their faults, Oracle has *NEVER* failed me in any way when it comes to support. When it's my fault, I stand there and take it on the chin. However, as a backing store, Oracle will whoop anything I write eight ways to Sunday for speed, reliability, portability, quality, documentation and support. Hand's down. If you can afford it, there is no excuse for writting your own custom storage manager. If you can't afford it, try PostGreSQL. Oracle or PostGreSQL *WILL* be better for 99.9% of the cases out there. Google is one of the few examples of a situation where writting your own is probably a good idea.
For the record, out of the ~200 million records I've processed, I've lost 1, count'em 1 record using Oracle that was Oracle's fault (I've lost any number of them when Linux crashed, but that's my fault, not Oracle's). Even then Oracle clearly identified which one it was, and when it happened, so it was easy to recover.
Kirby
Re:We use the DB for nearly everything by eric2hill · 2002-10-30 08:44 · Score: 2

I have an Oracle contract in front of me. Name one other vendor that has the following clause in their software agreement *by default*...
"If Oracle cannot substantially correct a breach of Oracle's warranties in a commercially reasonable manner, you may end your program license, technical support, or other services and recover the license fees, technical support fees or other services fees paid to Oracle under this agreement..."
This no-nonsense agreement was a selling point for my company. If Oracle can't fix the fuck-up, they'll give you your money back. Period. In the contract. No verbal agreements. It's in writing.
I have never lost data using Oracle. Not once. I've had bad sectors on different hard drives crop up and Oracle has ALWAYS been recoverable. Yes, it's more expensive than a solid-gold custom-molded toilet seat with a free midget butt-wiper, but for data integrity you just can't beat it.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN

Can the data be lost? by PersonalizedSpam · 2002-10-29 11:13 · Score: 1

That's the first question that you should ask yourself when looking at going outside of the datbase for persistent storage. I would imagine that the data is quite important, and unless you want to write all the necessary functaionality in your one-off data storage solution (which may or may not be slower than the a RDBMS or ODBMS solution) you should have a pretty good argument for why a real database should be used. It sounds like you're working with a pretty uninformed technical staff, to be honest.

All-in-memory anti-databases by Anonymous Coward · 2002-10-29 11:28 · Score: 0

I'm curious about all-in-memory "anti-databases" like Prevayler (http://www.prevayler.org). Check it out; it sounds stupid at first, but I think the arguments for it are coherent and sensible.

Obviously, these won't be a good fit for massive terabytes of data, but for applications with small to medium data storage needs, hmmm? After all, you can always port to a database later....

RDBMSs Rock by MrBlack · 2002-10-29 11:38 · Score: 2

Perhaps it's just that I can't think "outside the square" or something, but I can't really think of ANY application I've worked on where you couldn't make a good case for storing the data in a database. In some cases I've used XML files (when there was only a very small amount of data to be stored) but anything bigger than that I've always stored in a database. Perhaps under some circumstances (that I can't think of right now) you _might_ want to roll your own storage system....but I think these sort of projects would be the exception (handled of course!) rather than the rule.

Re:RDBMSs Rock by Roadmaster · 2002-10-29 11:55 · Score: 1

There are some types of data that just can't be adequately stored in a RDBMS without serious kludging and even then it won't function optimally. Bibliographical data is an example; some people I know wrote a book cataloguing app and from all the research they did they found out using a RDBMS wouldn't cut it. There's a data format specially created for this purpose and that's what they used, though I can't really remember what it was.

Maintenance by gmhowell · 2002-10-29 11:38 · Score: 2

In one scenario, you maintain business logic, info storage, display, and all sorts of other crap. In the other, you only maintain logic and display. It's easier to force your customers into lockin with some proprietary mish mash, but there are others (like myself) who will turn you down flat.

Internal stuff is similar. Do you want to maintain EVERYTHING, or just half of it?

--
Jesus was all right but his disciples were thick and ordinary. -John Lennon

some pluses, minuses by clem.dickey · 2002-10-29 11:43 · Score: 2

Pluses: the database takes care of synchronization, and nearly takes care of backup/recover. Very nice. Some DB rigor may rub off on your designers.

Minuses: the DBM is large (in MB, in install/config requirements, and in CPU usage) and your customer may not be running the DBM brand/version which you have tested your app with. Supporting multiple DB vendors is a pain. SQL is sort of standard, but the table definitions tend to vary. Ick.

My Expirience by droyad · 2002-10-29 11:53 · Score: 1

I have found from the projects I worked on for my company that business applications almost always need some sort of database. Databases are just easy to work with and are fast.

For example a recent project required that it stores 500 000 records (relating to cotton btw) of around 15 columns each. Now if think of that number and how long it would find a particular row or change a set of data. the MS SQL database we used on modest hardware was damn quick, quicker than could be imagined in fact.

Anyway SQL is like so usefull. With a proper structured SQL query you can do some amazing stuff, very quickly and with very little effort.

Can you afford to lose the data? by SpaceLifeForm · 2002-10-29 11:59 · Score: 2

If the data is not critical or can be easily re-created, then a filesystem will suffice.
But if the data is critical to the business, and/or not easily be re-created,
the data should go into a real DB that is Managed Properly(tm).

--
You are being MICROattacked, from various angles, in a SOFT manner.

3rd normal form by jbolden · 2002-10-29 12:04 · Score: 2

But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

Good DBMSes can break complex joins catching the criteria piece by piece. You can also create run time extracts which are used by real time / almost real time systems for read access. However your advice is simply terrible. Once you lose normal form you lose the associative law on your table algebra. That means join operations are not defined independently of order they are performed in and that is very bad. Rick mentions an example of this in terms of addresses but it can get far worse.

It isn't just storage by Chacham · 2002-10-29 13:27 · Score: 1

Databases are more than just storage. Disregarding storing and retrieval, a good database has good design.

If the database won't be designed properly (as in many just-get-the-job-done small businesses) then a specific application may be better. But, if someone will spend the time doing design, the database forces logic and structure onto the system. While this may be an annoyance to sloppy coders, this helps ease usage (because of strict guidelines) and understanding. Yes, understanding. There are times that data is to some extent known, but to a lesser extent understood. A decent database layout increases understanding as the objects and relations must be logical.

I speak this as a DBA. And, as a DBA, for good or for bad, there is hardly a project that wouldn't benefit from clear data definitions.

--
Have you read my journal today?

Denormalisation by arb · 2002-10-29 14:59 · Score: 2

Sometimes, well-thought out denormalisation can make a huge, positive impact on an application. Yes, it can be difficult to make sure you don't have any anomalies in your data, but with a rigourous design and development methodology, these problems can be minimised.

Don't denormalise for the sake of denormalising - the trick is to know when to break the rules and to do so very carefully.

Denormalisation is only one tool that can be used to improve the performance of a system, and of course, other options like more memory, faster CPUs and better code should be addressed first.

Normalisation by arb · 2002-10-29 15:06 · Score: 2

But please, for the love of God, don't listen to those moron college profs who say normalization is key. That's all good and fun until you have a million records in each of four tables you need to join to provide a solution. That, or your data model charts take up 300 sq. ft of wall space.

Don't normalise your database at your own peril!

Learn how to properly normalise a database (3NF is usually good enough) and then learn how to write decent queries and tune your indexes appropriately. In some circumstances it may be worthwhile caching some data in extra tables which are refreshed periodically. In even rarer circumstances it may be necessary to denormalise the database, but always normalise it and only denormalise if you encounter some intractible performance issues.

We have some incredibly complex queries at my current client and we have managed to gain performance improvements by re-writing certain queries. Splitting complex queries up and using temp tables, derived tables, sub-queries, etc can help and you will have less of a problem with data anomalies which can creep into a denormalised structure.

Re:Normalisation by SirSlud · 2002-10-29 18:05 · Score: 2

Amen. What the parent poster didnt realize is that when youre dealing with 3 tables, one of which has 52 million records in it, joins are the non-option. 4 day queries are not permitted. We get around it denormalizing to certain degrees (certainly not rigorously all the way to 3NF), using tmp tables, being as religious as possible about indexing .. all with mysql, baby.

> a million records

Thats chump change! Try 50 million and then we'll talk database ;)

--
"Old man yells at systemd"
Re:Normalisation by arb · 2002-10-29 18:36 · Score: 2

What the parent poster didnt realize is that when youre dealing with 3 tables, one of which has 52 million records in it, joins are the non-option. 4 day queries are not permitted.

Just a quick check of one our systems shows a table with 1.5 million records. Not a large table, but not trivial either. Due to the nature of the data and the structure of the database, it is necessary to execute queries containing 10-12 self-joins routinely. One report requires 16 joins in total. The slowest of these queries takes about 30 seconds, most are well under 2 seconds.

Sure, we could denormalise our data to make the queries simpler (multi-table joins scare DBAs for some reason) but we would lose a lot of the flexibility that our design affords us. We have no need to denormalise this database because we wrote more efficient queries and have paid careful attention to our indexes.

The largest table in our production database is under 10 million rows. We have simulated much larger sizes in our test environments though, and found our joins are not a problem. The performance with 10 times the data is still well within acceptable range for our requirements. (Most stored procedures execute under 5 seconds with this size.) If you have problems with a three table join, head back to school and learn how to do it properly! ;-)

You say you are as religious as possible about indexing, but what about the structure of your queries? Are you indexing the right columns? You are using temp tables? Try getting rid of them - in many cases they slow things down. Make sure your queries are sargable. Make sure you are limiting the intermediate result-sets to be as small as possible. Understand the query execution plan. Make sure your server has enough memory! Are your server's settings tuned for your particular application? There are many, meny things to pay attention to, but you should not be having any problems with a 52 million row table in a three-table join!

When you start doing real joins with 10 or more tables (each with >1 million rows) then we'll talk about how to denormalise the data to improve performance.

8-)
Re:Normalisation by MattRog · 2002-10-31 10:03 · Score: 2

We routinely perform joins on 150+ million row, 10 to 20GB tables. They generally perform very well... Then again we use Sybase ASE 12.5.

--

Thanks,
--
Matt
Re:Normalisation by rodgerd · 2002-10-31 14:53 · Score: 2

Been there, done that. Joins aren't a problem with the right indices.

Database AND filesystem combo by alfaiomega · 2002-10-29 15:09 · Score: 1

i never really found a clear reason on which was better performance-wise, though i suspect the filesystem-based way is. i also found it to be less of a hassle to implement. any intelligent thoughts?

Serving images from the filesystem is always faster than serving them from a database, since the database is also on the filesystem. (If you think that the DB could cache the results and therefore be faster, just serve your most frequently requested images from a filesystem in RAM (a RAM-disk) -- of course you can't cache more than what would fit in RAM, no matter if you do it yourself or with the DB.)

Using DB for serving images can make sense when you serve different images in different cases like banners, where you want to have simpler CGI scripts (it's because of convenience, not because of performance). But using database doesn't have to mean storing images in a database.

If I found out that serving some of my images would be easier (and that means less error prone, better to maintain, etc.) if I used a database, I would do something like this: I'd store the digests of my images in the DB (it could be a binary MD5 for example -- just 16 bytes per image). Then, when my CGI script gets the digest, it would use it to find image on the filesystem.

So, if it gets 9743a66f914cc249efca164485a19c5c it serves /images/97/43/a66f/914cc249efca164485a19c5c.png (splitting a digest to get different directories depth would depend on the filesystem and number of images of course, this is just an example; also one could use less than 128 bits of the digest if it would be enough to have 96 or 64 (depanding on the number of images) to have shorter paths). The .png suffix could be also stored in the database to allow easy use of few different image formats.

(The DB could store more human-friendly paths instead of message digests of images, but would need more human interaction -- it's probably a matter of taste).

This way you still can have a cluster of very simple statical image serving servers in the future, while having the benefits of databse. Also the database traffic is much much lower. The only difference with your scripts is that you send a redirect instead of actual content, which is even easier.

The most important benefit, however, is that you can have statical images, database and CGI frontends split into three independent machines or even groups of machines, when your traffic become to high for an all-in-one sollution. Because with the DB-only BLOB images you better have lots of money for a database cluster (and DB-CGI bandwidth of redundant internal traffic). Using a database-stored images when all you need is easier searching with SQL queries is in my opinion just using database as an expensive filesystem.

--

root@aio:~# nmap -sX -iR -p1- # Ho, ho, ho! Merry Xmas, everyone!

Re:Database AND filesystem combo by Anonymous Coward · 2002-11-01 05:58 · Score: 0

Hey, this is a great idea.
I am designing an automated banner service, where people will be
able to upload multiple banners and choose when they are to be shown,
which one to show on which page, after which one, etc.
I was going to serve them from the database, but I think now I'll
use your idea. I have already a light server for serving statical
graphics with thttpd which won't work with the rest of the mod_perl
scripts, which need heavy apache, but now the logic can be on the
heavy server and the actual content on the light one, I actually
haven't thought about it at all.
And best of all it will be better for our already very busy Postgres
server.
You said that not all of the 128-bit checksum can be used. Wouldn't
that mean that some files would have the same checksum, that is the
checksum would not be unique now? I mean, when two files have checksum
which have first 64 bits the same and the second 64 bits different
and I use only the first 64 bits, then I have two files with the
same conflicting checksum, am I right? Or am I missing something.
Other than that, thanks for your comment! I'm going to use
your ideas in my project. Greetings.

Re:afaik... [And that's not very far...] by Hank+Reardon · 2002-10-29 15:10 · Score: 1

But please, for the love of God, don't listen to those moron college profs who say normalization is key.

This is a horrible approach to analyzing any problem. Normailized and denormalized data both have their place in today's RDBMS-driven world. Take any one of the large ERP packages available today: Oracle Financials, PeopleSoft, SAP, etc.

When working with specific data, for example Accounts Payable data, you really don't want to duplicate all of that customer data again and again and again for each row in the database, hence you normalize it. Yes, you pay a bit of a speed penalty when joining against the CUSTOMERS, CUSTOMER_ADDRESSES , INVOICES and INVOICE_LINES. In reality, that difference is never larger than a few seconds for large (read: 10 million+ record tables) when using a properly optimized (read: good index scheme) set of normalized tables.

RDBMS' are work very well when finding the 10 rows out of those 10 million that fit your search request. Where they puke is trying to manipulate 50% or more of the data contained in multiple tables: the Data Warehouse/Data Mart.

Those same Accounts Payable tables make reporting a real pain when they're normalized, so you go through a denormalization (or summarization) procedure to fill out your reporting infrastructure. Pre-summarize your data into a single row with multiple 'buckets' for every strange query procedure you want to view the numbers by. Duplicate data on every row and get those 10+ millon records per month down to a few hundred thousand at most.

If you have to write custom programs in C, Java, Perl or your language of choice in order to operate on an exported version of the data, so much the better; the database won't perform as well when acting on every row in the database as a program optimized to summarize it. Once that's done, load it back in.

By using this approach, you're able to use the best of both the normalized and denormalized approaches and satisfy both the data entry clerks -- because the data is entered quickly -- and the managers -- because they can get virtually any report in under 30 seconds.

Sweeping comments that suggest that one method or another is bad in all cases just screams: "Look at me! I don't know what I'm doing but I'm going to tell you that what you're doing is wrong, anyway!"

--
There's so little difference between politics and jihad lately...

Storing data... by arb · 2002-10-29 15:14 · Score: 2

It really depends on what data you are storing. How much data, how critical, what are you doing with it, etc...

Sometimes, Excel is good enough. Or XML. Or plain text files. Or a custom file format. etc...

If the people working on your project have only ever worked with databases, they will want to use databases for everything. Most stuff will fit into a database, but sometimes it is not apprpriate to do so - as other have mentioned here, storing images in a database is not always a good idea, but you would probably want to store the location of the images in a database.

If you need to be able to ship the data around to different machines/offices/clients/over the net/etc, then maybe an XML file will be best. Custom file formats may be appropriate in some cases too. (Though I'd lean towards a more open file format.)

Why not do the whole project as a one-off? by Samrobb · 2002-10-29 16:17 · Score: 3, Informative

The feeling at my current workplace seems to be that very few projects lend themselves to database usage and that a customized one-off data storage solution should be developed for each project.

Huh? Do you create a custom C library for every application as well? How about a custom UI toolkit? Custom preprocessor/compiler?

Sounds kind of silly, doesn't it.

So why do these folks think a "customized one-off data storage solution" sounds any better? It's the same problem - you can either use something that's already been debugged, tested, and tweaked for performance, or you can spend your own time and effort to create it yourself. That's time and effort that could go towards coding and testing the final product, but is instead spent elsewhere (probably because someone thinks that using a dabatbase for storage would make the application "bloated").

I think the problem is probably that when you mention using a "database", most people equate that term with "general purpose database server" (Oracle, SQL Server, Postgress, MySQL, etc.) There are libraries available that were specifically designed to offer programs lightweight database access without the pain of using a full-fledged RDBMS. Search Google for embedded database, xbase library, or open source database library to start... there are any number of toolktis that will allow you to create a very customized storage solution without having to create "one-off" code for each and every project.

--
"Great men are not always wise: neither do the aged understand judgement." Job 32:9

Re:Why not do the whole project as a one-off? by Carpathius · 2002-10-30 03:56 · Score: 2

It really depends on what data is being stored and how much. For many, many applications a simple flat file using random access methods works just fine. For managing more data there are libraries such as you mention. And for some projects a full fledged RDBMS is the right way to go.

The proper way to go about this is to analyse what needs to be stored and choose the solution that provides the best match based upon needed functionality, system use, and programming time.

Sean.
Re:Why not do the whole project as a one-off? by Samrobb · 2002-10-30 15:32 · Score: 1

The proper way to go about this is to analyse what needs to be stored and choose the solution that provides the best match based upon needed functionality, system use, and programming time.

Absolutely. They way the question was worded, though, makes me think that their typical case is probably beyond the scope of doing anything easily with just flat files... otherwise, why the debate? You'd have to be a truly hardcore DB fanatic to argue that a DB is the ideal solution in all circumstances.

--
"Great men are not always wise: neither do the aged understand judgement." Job 32:9

filesystem is a database by wotevah · 2002-10-29 16:30 · Score: 1

I would just like to point out that the filesystem is a particular case of specialized database so this whole debate fs versus db does not make much sense. Some of its implementations may suck, but it's still a database that has a well-known, familiar interface (access mechanism). One can build a file store using a filesystem and replace it with a database-backed "file system" later if need be.

And we should also keep in mind that while some databases do give killer performance, you have to pay a lot for it, while the ol' filesystem comes with the OS already.

Now, for any other use than filesystem-related stuff, it does not make sense to try to invent your own small-scale storage mechanism when there are so many good, cheap/free database servers out there well-suited for the job.

Go for the extra procs. by millisa · 2002-10-29 16:59 · Score: 1

I didn't see this mentioned anywhere, but just because you have a multi proc system, it doesn't mean you have to run MSSQL on all the procs. You can purchase a single processor license of sql server, but be running it on a quad proc box. To be compliant, you just need to be sure you are setting the processors the service can use (er, its on the 'Processor' tab under the sql server properties).

This isn't necessarily a bad thing to do either. When you are having to be conservative with your cash, a lot of the times these boxes have to serve multiple purposes. Having the sql server running on only procs 3 & 4 would leave 1 & 2 available to do 'other stuff' (web services? perl scripts?).

With SQL2k you can even have the development and the production sql server be the same system and generally not effect each other performance-wise when you are thrashing the procs. You just need to setup multiple instances of the service and assign each to separate processors (of course, they won't be completely autonomous since they *are* on the same box, but at least you won't get competition for the processor)

In any case, I'd go with the quad proc box. Only get one CPU if you want. You can always add to it later and purchase further licenses *if* you need them.

Argh. Posted to the wrong DB Thread! by millisa · 2002-10-29 17:08 · Score: 1

(-7 Offtopic, Posted to wrong ask slashdot database topic for the day. Hoohah)

Embedded applications by Karora · 2002-10-29 19:49 · Score: 4, Insightful

I have worked as an application developer / designer with DBMS backed applications for the last 17 years. There are reasons for not choosing a database, but not usually very good ones.

When you want speed and flexibility and scalability and reliability and extendability and particularly developer productivity you will undoubtedly end up shooting yourself in the foot later if you avoid some form of DBMS up front.

Where you have a particularly well-defined, narrow functionality, and performance in a small footprint is a requirement, an RDBMS may not be such a good choice, but DB libraries like berkeley db can still be very useful.

And with PostgreSQL, Firebird , MySQL and so many other free, open-source projects out there covering such a broad spectrum of needs for a database, why would you not use that expert work?

--

...heellpppp! I've been captured by little green penguins!

DB based GUI + Satellite = Horror by Anonymous Coward · 2002-10-29 23:41 · Score: 0

A customer of ours is currently considering installing an ERP system an all locations, which are connected via satellite link with a main location, where the main database server is running.

The ERP software has even its' GUI in the database, which is simply plain stupid when it comes to bandwidth (although it's very flexible on the other hand). The ERP people want to have 100MBit bandwidth, but the satellite has only 1.5MBit :-) Which means one has to go :-) But it won't be the satellite ;-)

Re:DB based GUI + Satellite = Horror by toybuilder · 2002-10-30 08:34 · Score: 2

What's so wrong about having the GUI stored in the database? It's not like you can't download once to the local client... Heck, AOL's been doing that for years!

Just because there's a database in the center of everything, doesn't mean the clients can't cache data locally....

Lol, I think I worked in the same place! by BoomerSooner · 2002-10-30 02:34 · Score: 1

Was it in Dallas?

Lol, probably not but it's amazing what you see in the "real world" of corporate IT departments.

Re:Lol, I think I worked in the same place! by floydigus · 2002-10-30 05:22 · Score: 2

Definitely not Dallas, but in a way I think these places are more a state of mind than a physical place ;)

--
All things in moderation; including moderation

Things you don't want to learn about your local .. by Anonymous Coward · 2002-10-30 02:43 · Score: 0

Things you don't want to learn about your local prison:

1. They use excel for tracking inmates
2. They have no less than 20 people in the same copy of said spreadsheet

huh? by Anonymous Coward · 2002-10-30 06:17 · Score: 0

What am I missing here? A catalogue of bibliographical info seems to be the perfect application for a database. Like most applications the data seeems quite simple on the surface each entry will have:

an author,
book title,
publisher,
published date,
and more.

You could easily roll your own file format to store this data. Maybe some sort of xml.

But what happens if your users decide they need more functionality? The bibliography has grown from 50 entries to 50,000 and now they want to search by Author's last name, publisher, and publishing date. You could write your own search methods, and create indexes for your data but why bother? The kind developers working on firebird, postres, and mySQL have already worked through some sleepless nights to solve this problem for you, and are giving their solution away for free.

I always look for intersting, elegant ways to build a project so that program is as small, and fast as possible. Writing a filesystem that is perfect for the task at hand often looks like a good idea at the start of a project but when additional features are added and we start talking about delivery times using a database looks more attractive.

When to use a relational database : by PinglePongle · 2002-10-31 06:17 · Score: 2

When you can imagine querying the data you are entering - you can't easily query images, or other binary data (although I guess there must be someone working on this problem somewhere...). If you can't query it, you should usually find a better place to store it - NAS is usually fine - and maintain a pointer to it (e.g. a filename). Yes, it's something that can get un-synced, but most databases suck when it comes to actually dealing with binary data, and you can use that capacity a lot more effectively elsewhere.

When the structure of the data is likely to remain stable. If your application deals with well-understood entities, whose properties are unlikely to change over time, a database is a great solution.
Databases are, however, relatively change-resistant - it's typically a pain in the backside to change the datatype of a column, remove columns etc. So, if you're working in a domain where you continuously learn new things about your core entities, or if your development processes are highly iterative, you might be better off using an alternative data storage mechanism.

When more than a single user is likely to access the data - yes, you can create locking mechanisms yourself. You can also take your own garbage to the local dump. It's usually not a good use of your time, and the cost of not dealing with the issues involved are expensive, both for garbage and concurrent access to shared data.

When you require consistency accross transactions - the good old ACID (atomicity, consistency, isolation, durability) principles which become important for many non-trivial applications.

If you care about enforcing rules of referential integrity - do you want to ensure that all the tracks in your record collection can be tied back to a recording ? Do all orders have to have a customer ? Those things are far simpler to implement with an RDBMS than in code.

There are instances where using an RDBMS is not appropriate. Ones that spring to mind are :
- your business domain is not well understood or liable to rapid change. In this case, the cost of change for database objects is likely to be a problem - consider storing data in a self-describing format like XML.
- the application domain doesn't lend itself to being described in relational terms - image manipulation tools, word processors etc. which deal with mainly binary information probably should not use a relational model for their core data structures.

Alternatives exist - Object Oriented databases are becoming more and more popular. I have way too little experience with these to comment on their use.

--
It's all very well in practice, but it will never work in theory.

Digest collisions (Re:Database AND filesystem...) by alfaiomega · 2002-11-02 00:36 · Score: 1

You said that not all of the 128-bit checksum can be used. Wouldn't that mean that some files would have the same checksum, that is the checksum would not be unique now? I mean, when two files have checksum which have first 64 bits the same and the second 64 bits different and I use only the first 64 bits, then I have two files with the same conflicting checksum, am I right?

Well, yes and no. Yes, you're right that the 64-bit parts of MD5 digests are not unique, but so are the full 128-bit digests. Any n-bit digest, provided it's randomly distributed, will be the same statistically every 2^n times, since there are only 2^n different results. Now it's up to you if you think 281474976710656 (48 bits) different digests is enough for you, or you need 590295810358705651712 (69 bits) or full MD5 340282366920938463463374607431768211456 (128 bits).

You have to use enough bits to make sure (well, you never can be sure, like you can't be sure that you won't win a lottery 1000 times in a row -- you get the idea) that two files having the same digest is practically impossible (because it's always theoretically possible, however unlikely). It depends on the number of files you have. For n-bit digest and m files there are 2^nm different results and (2^n)!/(2^n - m)! good results (i.e. those results without collisions).

So, the probability of not having any collisions is (2^n)!/(2^nm (2^n - m)!) but since calculating (2^128)! is not what you want to do (trust me -- a 1000 teraflop supercomputer would need half a million times more time than the age of our universe, provided it would have so much RAM and could handle so long numbers, which I don't even dare estimating). You better write this from the command line, it's a little Perl one-liner I just hacked out of boredom -- yes, I know, I should take my medicine and get some sleep:

perl -le'($n,$m)=@ARGV; for($w=$z=2**$n,++$_;$m;--$m,--$z){$_*=$z/$w} print' n m

It will compute (2^n)!/(2^nm (2^n - m)!) (rounded to your floating point resolution) i.e. it will give you the probability of not having any collisions using n-bit digests with m files (-0 means it's impossible and 1 means it's sure or so possible that almost sure). If anyone asks how does it work -- it's magic. Copyright © 2002 alfaiomega. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. There is NO warranty; not even for MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE or READABILITY.

Other than that, thanks for your comment! I'm going to use your ideas in my project. Greetings.

Great, I thought no one would read my comment with Score:1. That's good to hear that someone liked it more than the moderators.

--

root@aio:~# nmap -sX -iR -p1- # Ho, ho, ho! Merry Xmas, everyone!

Re:Digest collisions (Re:Database AND filesystem.. by Anonymous Coward · 2002-11-02 13:17 · Score: 0

perl -le'($n,$m)=@ARGV; for($w=$z=2**$n,++$_;$m;--$m,--$z){$_*=$z/$w} print' n m

It will compute (2^n)!/(2^nm (2^n - m)!) (rounded to your floating point resolution) i.e. it will give you the probability of not having any collisions using n-bit digests with m files

you are a god, dude. i've no idea why, but it works. i'm starting to learn perl today...

Slashdot Mirror

When is Database Muscle Too Much?

93 comments