Good Database Design Books?
OneC0de writes "I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
I'm sure I'm not the only code monkey who shudders at the implications of this statement.
Some of us coders are well-nigh insufferable already, because they think they're Superprogrammer and can leap tall mainframes in a single bounds check. The problem comes when those types find themselves in management -- but won't let go of the programming reins. Now, the IT guy with a new idea doesn't just have to worry about it getting shot down by the hotshot who knows everything -- he has to worry about whether his *career* will get shot down by Mr. Hotshot Coder-Manager.
There's a great chance that the article poster isn't like that. But I'm worried, because Ask Slashdot isn't who he should be asking... he should be asking the coders he manages how to design and/or restructure the database. The fact that he's asking Slashdot tells me that he's not comfortable letting someone else do the work, possibly because he's Superprogrammer and always knows what's best.
But in case that's not the situation -- maybe he's in a company that simply grew faster than its staffing could handle -- I'll answer the question. He wants a good book to explain general database design techniques. My answer: buy a good book on management techniques, because *that* is your job. Let the people you manage come up with a database design, because that is *their* job.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
I'm a bit unclear about what you want to achieve:
- easier end-user interface
- more reliability (backups, journalling, redundancy...)
- more speed
- more security
- more complicated data massaging (multi tables, statistics...)
- better vizualization (reports, graphs...)
I'm not sure a single book can cover all that.
The Cloud - because you don't care if your apps and data are up in the air.
"I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
There is more to RDBMS than tables and SQL. Your developers should understand data normalization first and foremost, at least 1NF, 2NF and 3NF.
http://en.wikipedia.org/wiki/Database_normalization
http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Second_normal_form
http://en.wikipedia.org/wiki/Third_normal_form
The examples in the URLs above should suffice for getting a general understanding on how to start with a relational model. As for books, I'd suggest these:
http://www.amazon.com/Relational-Database-Design-Implementation-Third/dp/0123747309/ref=sr_1_4?ie=UTF8&s=books&qid=1278630155&sr=8-4
http://www.amazon.com/Information-Modeling-Relational-Databases-Management/dp/0123735688/ref=sr_1_3?ie=UTF8&s=books&qid=1278630306&sr=1-3
I would also suggest C.J. Date's "Database in Depth: Relational Theory for Practitioners", but I can imagine the local penny arcade l33t-hax0r-wannabe crowd going batshit crazy about studying relational algebra and relational database theory in depth. To each his own. Most problems that arise in poorly designed relational database models arise from not understanding data normalization
:
Back when those books were written, disk was expensive and not cached, RAM was very expensive, and machines had terrible I/O bottlenecks. Normalization is critical under these circumstances for maximum performance.
Normalization has _nothing_ to do with performance. In relational DB design, performance is usually considered only after you have a normalized model at which point you it's common to denormalize for performance and other implementation-specific reasons.
The parent's first link gives a good description of the purposes of normalization.
Today, these normalization techniques will increase performance but not as much as you might think. Really it is best to concentrate efforts elsewhere, especially for a one-person shop.
As the submitter looks to be using RDBMSs, a knowledge of normalization and relational database design should be required I 'd have thought. However, if their systems were designed around ORDBMS (your posts hint that this is your background) the DB design issues would be different, but the summary doesn't suggest this is the case.
Since you're Director of IT, I'd recommend you to start from http://en.wikipedia.org/wiki/Database_normalization
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
The one that I find still surprises people is actually 1NF. 2NF and 3NF are pretty easy to recognize. But think about CUSTOMER(CUST_ID, NAME, STREET_LINE_1, STREET_LINE_2, CITY, STATE, ZIP). Is it really in 1NF? Sure, if you're printing on envelopes. But maybe you need the customer's first name for a personalized letter. Maybe you need the house number and street separated for a GPS application. Or maybe you need the ZIP and ZIP+4 broken out separately for your postage software.
Any time you have the external application parsing your data fields into component values, you've failed at 1NF. But where do you stop? First and last names? Middle initial? Title? Honorific? Nickname? Pronunciation key? Phonetic spelling? You can go ape over-analyzing the data, and still easily miss something your users encounter the first week you deploy; effectively denormalizing the table simply by adopting a convention such as sticking "MR/MS/MRS" as the rightmost characters of the last name field.
I don't have an answer, it's just not fair. :-(
John
"The best method" is probably a really vague concept.
I disagree completely. The "best" method varies widely, because it is specific to the RDBMS you're using.
OP says he understands how databases work, but it seems to be limited to how to put data in & get data out. A database is (or can be) more than a bit bucket. If it's taking too long to fetch records from a 10 million record table, there are some serious performance issues here. It could be any combination of bad data modeling, improper indexing, underpowered hardware, poorly configured concurrency/consistency controls, inefficient I/O subsystem, OLTP workload on a system configured for OLAP etc...
Understanding concepts like transactions, locking mechanisms, disaster recovery, table structure (heap vs clustered), index structure, temp tables, stored procedures/packages/functions, data types, etc. is important, but it's equally important to understand how your chosen RDBMS implements these so that you can fully exploit them. Armed with this knowledge and the needs of your business, you'll be able to refactor your data and/or applications appropriately
I think you can get at the concepts through practice - particularly since you're working with more than 1 RDBMS, but "getting" databases in general won't get you what you're looking for. Depending on the urgency here, you may find yourself hiring a gun to help sort out your situation, as the amount of research & testing required to get the knowledge you need to do this may not be practicable for an IT Director.
I hope this helps!
Take off every 'sig' for great justice.