Good Database Design Books?
OneC0de writes "I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
We have some tables that have million of rows, and I'd like to know the best method of designing these tables.
I'm a developer, not a database expert. But it seems that every now and then I have to get my hands dirty with data modeling. "The best method" is probably a really vague concept. If you have serious hardware constraints than the best method changes from an easily maintainable system to something more complex. There's give and take in database design and I guess a million rows is really something that a traditional relational database should be able to handle. So I'd suggest any book that teaches data modeling will suit you here. I happened to learn on Data Modeling Essentials which was decent but not great. I have heard good things about Len Silverston's growing series that concentrates more on patterns. But really what you're going to want is a book on data modeling or analysis that teaches you the orders of normal form, when to use cross reference tables, etc so you can get a better idea of good modeling standards. At a million rows, you might not find the need to refactor if you read about the new best practices but perhaps you could make a business case to eventually migrate.
Now there are other topics that require entirely separate books because they are such a diverging path from relational databases. It's not common but your database can be based on something other than an object or table. If you consider the internals of Google, perhaps BigTable is the most prolific database implementation out there and while interesting, it is sort of a very specific proprietary database implementation. You could take this approach to tailor your company's database to be precisely what you need but this would clearly be overkill in your case. You don't talk about any bottlenecks or impending loads that need to be carefully considered so instead of treading down this path, I suggest you first take a course on MySQL or get the de facto book on whatever database you use and play around with fine tuning on a test system. A lot of DBs out there allow you to tune them through a configuration file so that your particular needs are met more closely. If you're looking for this sort of continuing education just out of curiosity, pick up a book on database design and start to tinker. But it requires a lot of knowledge and effort to start a database technology from scratch and compete with vanilla out of the box technologies like MySQL and PostgreSQL.
From what information you provide in your question, I'd suggest this book to help you understand database designs more via industry proven patterns. That assumes you have all the basic database design practices covered.
My work here is dung.
Database in Depth: Relational Theory for Practitioners
Publisher: O'Reilly Media; 1 edition (May 1, 2005)
Language: English
ISBN-10: 0596100124
ISBN-13: 978-0596100124
Best DB book i have ever owned/read/seen!
Well, it depends on the size of the company does it not? Perhaps they employ fifteen to twenty staff with an IT department of 2 or 3, mostly focused on hardware and user support. Then it would be much more reasonable for the Director of IT to be a coder who is also taking management responsibility.
You're right that if the company grows, management should be the focus and a decent DBA employed, but until then like many small companies the poster may have to be a jack of all trades. At least they're showing incentive in seeking to master at least one of their areas of responsibility.
O'Reilly books are your friend. The "... in a Nutshell" books are a good place to start, and then proceed into the more advanced books. They have 25 titles related to MySQL and 53 titles related to Microsoft SQL. There are usually a few to browse through at the large chain book stores.
Serious? Seriousness is well above my pay grade.
.. either that or he's the only programmer in the company and can thus effectively call himself whatever he wants.
I'm the article poster. Our company is relatively small, with an IT staff of less than 5, and total company size less than 50. I write all the code, simply because none of our other IT pros are comfortable enough writing it. If there were "coders" under me to ask, please believe I would use them as a resource first.
Why assume that there are any coders that OP manages? He's "Director of IT" for a "small/medium" company that isn't a software (or even technology) company. It's quite possible that OP manages, if anyone, a handful of desktop support technicians that aren't programmers.
In fact, I would hope that something like that is the case, as that's really the only explanation for a Director of IT that, as OP describes, personally "writes the code" (note: not "writes some of the code") for a company's applications, since otherwise he is managing coders that don't actually write any code, which would be unimaginably wasteful.
Certainly, I've known of small companies in non-computing fields where the "Director of IT" was also the whole IT department.
I'm a bit unclear about what you want to achieve:
- easier end-user interface
- more reliability (backups, journalling, redundancy...)
- more speed
- more security
- more complicated data massaging (multi tables, statistics...)
- better vizualization (reports, graphs...)
I'm not sure a single book can cover all that.
The Cloud - because you don't care if your apps and data are up in the air.
I would say you are being a little paranoid. There is such a thing as a good boss, you know. I find that these are the guys who are still heavily involved in some sort of 'research'. Which is probably what he/she is doing. Probably a smart cookie, does some coding but by no means all of it. Knows enough to recognise a good text to buy for his group so they can all learn together.
I put it to you that I'd prefer to work with this guy than with your paranoid self. Do you have meetings of the secret type?
.
Does and doesn't. Shouldn't be making up titles that don't fit an IT department size of 2 or 3. How about "I run the IT department". That's like me in my one-person company calling myself CEO, COO, CIO, Chairman, etc. It's BS. Someone asking for help should leave out the fake title crap and avoid these type of responses.
He was asking for a book, not your stupid criticism.
I'm not sure I'd trust a book to teach this subject as comprehensively as a good university course on the subject. Frequently, you can sit a class quite inexpensively if you're not going for credit.
For that matter, isn't MIT or someone allowing free not-for-credit access to their eLearning materials?
IMHO: Joe Celko's SQL for Smarties (http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/0123693799/ref=sr_1_2?ie=UTF8&s=books) has shown itself to be very nice book when the need to go beyond the basics to a little deeper understanding of SQL is needed.
There are many other books on the subject all the way to source material from Date and Dodd but Celko seems to be well informed and writes fairly well, I think.
I see no evidence that he made up a title. Typically in business, Director has a meaning - that the holder is on the board responsible for running the company. In a small business they will often be one of the shareholders too.
Titles like CIO would indeed be superfluous in a small company, but Director has a specific meaning and its use could be entirely appropriate.
"I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
There is more to RDBMS than tables and SQL. Your developers should understand data normalization first and foremost, at least 1NF, 2NF and 3NF.
http://en.wikipedia.org/wiki/Database_normalization
http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Second_normal_form
http://en.wikipedia.org/wiki/Third_normal_form
The examples in the URLs above should suffice for getting a general understanding on how to start with a relational model. As for books, I'd suggest these:
http://www.amazon.com/Relational-Database-Design-Implementation-Third/dp/0123747309/ref=sr_1_4?ie=UTF8&s=books&qid=1278630155&sr=8-4
http://www.amazon.com/Information-Modeling-Relational-Databases-Management/dp/0123735688/ref=sr_1_3?ie=UTF8&s=books&qid=1278630306&sr=1-3
I would also suggest C.J. Date's "Database in Depth: Relational Theory for Practitioners", but I can imagine the local penny arcade l33t-hax0r-wannabe crowd going batshit crazy about studying relational algebra and relational database theory in depth. To each his own. Most problems that arise in poorly designed relational database models arise from not understanding data normalization
:
If you are designing anything bigger than a couple of gigabytes, you are in for some fun (or your users are). ;-)
To be a good designer, there is no substitute for a thorough understanding of the subject matter. And you are a self-confessed n00b. Get an expert. Or study. Hard.
Database in Depth: Relational Theory for Practitioners.
HTH.
I did an exam on SQL and database design recently and used The Manga Guide to Databases as part of my studies. If you don't want something too rigorous it's very good indeed - I found it a lot better at making stuff sink in than a dry, stuffy book. It gives a reasonably good idea of things like the first, second and third normal forms. Don't be put off by the fact that it looks a bit childish - the storytelling idea really works well. It probably won't work for everyone, but it did work well for me (I passed the exam with flying colours).
Do you know relational algebra? If you don't, then I highly recommend:
Codd, E.F. (1990). The Relational Model for Database Management (Version 2 ed.). Addison Wesley Publishing Company. ISBN 0-201-14192-2.
It's MUCH better to know the fundamentals of database systems and then try to figure out details than vice-versa.http://ask.slashdot.org/story/10/07/08/2142211/Good-Database-Design-Books?art_pos=1#
My university course on databases used the text book A First Course in Database Systems by Jeff Ullman and Jennifer Widom. I rather enjoyed the book, and plan to have it above my desk in case any sort of database design or maintenance project comes up for me. The book's page is here; links to purchase are at the bottom.
"I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications"
Translation - I'm a one man IT department
Bud, start with the truth. A "Director of IT" does not write code. You could have equally said you were CIO, just as truthful.
That's an stupid, novice assumption. I've seen smaller operations with an IT director having to get down and dirty because of downsizing or lack of resources. Coding is done on top of the functions of IT managements.
So before you let your projecting ego go about correcting people's titles without knowing the specific circumstances, maybe you can try something more useful, like, oh I dunno, maybe answer the question and suggest a good relational modeling book. Crazy, I know!
...and improve your quality and maintainability?
Back in the 70's and early 80's we learned a methodology called, "Data Structured Systems Design" and the fundamental presupposition was that everything could be expressed logically and accurately by describing it as relationships in set theory. I have not seen anything since that surpasses the quality and maintainability of database applications and systems.
Someone already mentioned Joe Celko's book "SQL for Smarties" and I would recommend you first read his, "Thinking in Sets" before any of his other books.
I would also suggest some earlier books by Ken Orr and Jean Dominique Warnier. If you learn the Warnier-Orr approach to DESIGNING the system before doing any coding, you will reduce the time necessary for maintaining the system. I have seen hundreds of small IT shops like yours, and much of the time Systems Analysis and Design is neglected and performed "off-the-cuff" by programmers who can't wait to get to the coding. I didn't originally believe Ken Orr's assertion that spending twice as much time designing the system would result in a sharp time reduction for overall project completion, but through experience and observation I became a believer.
"The mind works quicker than you think!"
Huh? I'm employed by an S&P 500 and director is the title above manager and below VP. Looking at the definition of IT Director in the first dozen hits on Google seems to match that.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
These three lessons may not all be in any one book, but they can help in the real world:
1) Learn what SQL Injection is and how to defend against it. It will ruin your day and could severely damage your current employment situation.
2) Abstract your schema from your front-end applications. Stored procedures are easy to write and can provide security and if well written stop injection attacks. They will let you change your database design without breaking your deployed apps. Just update the internal code in the P. Middleware and objects can do this, too.
3) Bergstrom's law of sailing says: "You can get away with anything in less than 5 knots of wind." Similarly, any little box or blade with 2 to 4 gs of RAM can easily handle 5 to 10 million row tables. Dedicate the server to MySQL or MS SQL so they can cache and buffer efficiently and they will outperform much bigger boxes trying to run too many schemas and DBs concurrently. Learn to index. Don't be too puritanical about normalization. Returning a customer address should require 6 joins. And remember that moving that moving large recordsets across the LANWAN may take much more time than the server query.
You probably already know all this... but maybe someone else reading this doesn't.
"Knowing everything doesn't help..."
In a small a Director is usually someone who sits on the board.
In a large company like an S&P500 one, a director is usually a management position with responsibility for a specific business area.
The titles are the same but the meaning different. I'm assuming from the size of business the poster described (50 employees) that he is in the former category of Director.
http://en.wikipedia.org/wiki/Corporate_title describes both types of director.
At very least you need an understanding of what normalisation gives you, particularly in terms of ensuring data consistency, before you should consider denormalization. I see all to many database "designers" bad-mouthing normalisation when i suspect they simply can't be bothered to normalise, so they go on not to bother thinking about data consistency either. Yes, there are cases where denormalization is an advantage, but there are a lot more cases where denormalization is laziness that leads to database problems.
Quidnam Latine loqui modo coepi?
He's lame then.
If I had the flexibility to call myself whatever I wanted, I'd make damn sure that my business cards all said, "Batman" on them.
Non impediti ratione cogitationus.
Actually, that's rarely the case. Even as Director, or even VP, you usually can't just say "I want to hire someone", and then go do it. Does your budget allow for hiring another employee? Would another employee on staff change the company position for taxes, insurance, or regulatory concerns?
A decision such as that usually goes up to the COO or CEO (depending on the company structure). Upon tentative approval, it would go to accounting to ensure the budget is available to sustain the prospective employee, and then over to human resources.
It can be that a Director or VP already has the authorization to add employees, which simply means it's already gone through the other steps, and then he or she can hire as needed. It would be very reasonable to believe that a Director or VP would have authorization to hire X employees as needed.
Maybe your company works in such a loose manner that the brass can hire and fire at will, but a well run organization will actually plan for such changes.
Serious? Seriousness is well above my pay grade.
Back in the day when they were their own company they used to recommend
Designing Quality Databases with IDEF1X Information Models
I found the book VERY informative
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
Check out Database Design for Mere Mortals... It's a pretty good book for beginning database design.
The fact that he's asking Slashdot tells me that he's not comfortable letting someone else do the work, possibly because he's Superprogrammer and always knows what's best.
The fact that he's asking Slashdot tells me he's willing to listen to the lunatic ravings of people on Slashdot (such as myself), indicating that he's aware that he doesn't know best. If he thought he knew best, he wouldn't ask.
Our CIO has a programming background and once fixed some database code we were having problems with. This is a 10,000 person organization with an IT staff of around 300. It's not hard to imagine a small company where the IT director takes on some programming tasks.
In the modern days of cheap disk, big disk caches, and large ram, proper modelling is more important than strict normalization.
Back when those books were written, disk was expensive and not cached, RAM was very expensive, and machines had terrible I/O bottlenecks.. Normalization is critical under these circumstances for maximum performance.
Today, these normalization techniques will increase performance but not as much as you might think. Really it is best to concentrate efforts elsewhere, especially for a one-person shop.
All of that normalization work requires coding changes and it will undoubtedly make the code much less readable and maintainable.
<facepalm/> Performance? I wasn't even thinking of that as a reason to understand normalization. I'm thinking data integrity at least in the conceptual model.
The value of normalization is not so much in performance but in considering and planning what a decent data/information model should look like.
You normalize your *model*, your blueprint, and de-normalize as you see fit with the resources available. The actual tables and tablespaces might not (will not) look 1-to-1 to the model, but you still have a model of information.
And when you do deviate from your model, you do so consciously; you know exactly where you are deviating from the model; and you know why. Without a model, or worse, with a badly crapped model, you don't know what you have.
It is good to exploit the hardware capabilities we have now, BUT without at least having a conceptual understanding of normalization, this is almost always a sure way to get into a corner where the only option is to throw more hardware to the problem.
The distinction here now is that people deploy hardware strategically, but because they have no choice: shit won't run without. Hardware is cheap. Operational costs are not. Understanding normalization is to (relational) data modeling and building what modularity and structure are to OO design (and software building in general.)
Normalization is not about performance (even if its immediate effects in the past were performance related.) It is about reduction of unnecessary data redundancy that compromises data integrity.
Not performance. Data Integrity.
Even with the hardware that we have today, I still have to see a well-designed model that does not in great part enforce the 2nd normal form and most attributes of 1st normal form (in particular about avoiding duplication of rows and maintaining regular columns.)
The Data Modeling Handbook : http://www.amazon.com/Data-Modeling-Handbook-Best-Practice-Approach/dp/0471052906/ref=sr_1_58?s=books&ie=UTF8&qid=1278645029&sr=1-58 It is not new, but relational theory hasn't changed much in the last 20 years either. I have been designing, developing, implementing and fixing relational databases and data warehouses for the last 15 years. The book above was one of the most useful things I read early on in my career. In my opinion, data integrity is one of the most valuable functions that a database can provide, and a high quality data model is the most important first step in ensuring that. Understanding tuples, understanding relationships and understanding how to translate your business model and business requirements into a functional and correct data model is a very valuable process. Skipping this step, or attempting it with a limited understanding of the theory behind it is a major mistake.
Database Modeling and Design: Logical Design, 4th Edition. Its ISBN is 0126853525. It taught me a lot about how databases work "under the hood". If you want to know the performance implications of a b+ tree index vs. a b-tree, this book will help.
Back when those books were written, disk was expensive and not cached, RAM was very expensive, and machines had terrible I/O bottlenecks. Normalization is critical under these circumstances for maximum performance.
Normalization has _nothing_ to do with performance. In relational DB design, performance is usually considered only after you have a normalized model at which point you it's common to denormalize for performance and other implementation-specific reasons.
The parent's first link gives a good description of the purposes of normalization.
Today, these normalization techniques will increase performance but not as much as you might think. Really it is best to concentrate efforts elsewhere, especially for a one-person shop.
As the submitter looks to be using RDBMSs, a knowledge of normalization and relational database design should be required I 'd have thought. However, if their systems were designed around ORDBMS (your posts hint that this is your background) the DB design issues would be different, but the summary doesn't suggest this is the case.
Comment removed based on user account deletion
Database fundamentals haven't changed much. I don't know how much you know so far but this guy is pretty smart:
http://philip.greenspun.com/sql/
http://philip.greenspun.com/panda/
http://philip.greenspun.com/wtr/
Lots of the core stuff about RDBMSs goes back decades and even old stuff like this is still very relevant. Try reading this page (just a dozen printed pages) and see what you think. He covers a lot of the fundamentals well and his style of writing is pretty entertaining.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Since you're Director of IT, I'd recommend you to start from http://en.wikipedia.org/wiki/Database_normalization
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
The one that I find still surprises people is actually 1NF. 2NF and 3NF are pretty easy to recognize. But think about CUSTOMER(CUST_ID, NAME, STREET_LINE_1, STREET_LINE_2, CITY, STATE, ZIP). Is it really in 1NF? Sure, if you're printing on envelopes. But maybe you need the customer's first name for a personalized letter. Maybe you need the house number and street separated for a GPS application. Or maybe you need the ZIP and ZIP+4 broken out separately for your postage software.
Any time you have the external application parsing your data fields into component values, you've failed at 1NF. But where do you stop? First and last names? Middle initial? Title? Honorific? Nickname? Pronunciation key? Phonetic spelling? You can go ape over-analyzing the data, and still easily miss something your users encounter the first week you deploy; effectively denormalizing the table simply by adopting a convention such as sticking "MR/MS/MRS" as the rightmost characters of the last name field.
I don't have an answer, it's just not fair. :-(
John
the 3rd member of the staff, hired by a friend who was the second member of the staff. Eventually we wound up with nearly 2 dozen people, many better than me or my friend.
But even when I was Application Development manager, I designed table structures and wrote custom queries to reply to FOIA requests for data.
I took some graduate school classes after getting my BSCS, so as to have access to a computer while looking for my first job, which tells you something about when this was. The best class was Relational Data Base using "An Introduction to Database Systems" by C. J. Date. ISBN 0-201-14471-9.
Mr. Date, along with Mr. Codd, invented relational calculus, including normal forms. In later classes at work we were strongly advised to use 3rd normal form, as even mainframes of the day couldn't really support 4th or 5th. That instructor had participated in a project to rebuild a 5th normal form system into 3rd for Westinghouse, whose mainframe choked on the small (low column count) tables
and huge keys required by 5th normal form.
The book covers other styles of databases, network and hierarchical, but both are antique now. So I'd skip or at most skim those chapters. They show how Relational DB design grew out of experience with shortcomings of Multics and IMS, early network and hierarchical DBs, respectively.
Other commentors are correct, which DB software you use isn't terribly important for good table structure design. Learning how to select keys for uniqueness and design tables to be non-redundant are not database-specific solutions.
Do good backups, and practise restoring from them regularly, it doesn't matter how well-deswigned a DB is if the hardware fails and you can't recover the data.
Think of the Irony!
Basically, none of your comment is right.
Are you adequate?
Disclaimer: I'm a developer on the SQL Server team at MS. You can get a lot more out of SQL Server and MySQL if you tailor your design to take advantage of the features that each has to offer. Properly tuning your database can help you avoid having to throw more hardware at the problem. I'm not sure what the best books are for MySQL, but for SQL Server be sure to check out the "Inside SQL Server 200?" series (there are editions for 2005 and 2008). If you read and understand those 5 books you will understand the best practices for designing SQL Server databases.
Yeah, until he's asked how many people he managed and the answer is "Well, it's really just me". He could lie, but without actual management experience, he'll fall flat quick.
Get books with lab materials. Element K has some great stuff, and Axzo press has good supporting materials, and publishes previews of their books online. Email either one of these publishers, tell them you are considering using their materials for in-house training. They'll send you an evaluation copy for free. The stuff we use in our classes works because after spacing out to what I say for an hour, the participants then have to build it using supporting activities. Reference books, or even college texts don't typically include that. Grab a lab manual on MS Access and skip all the access-specific stuff. Every training guide starts with the fundamentals of DB structure and normalization.
The Handbook of Relational Database Design, ISBN 0-201-11434-8, by Candace C. Fleming and Barbara von Halle is superb. A step by step handbook that tells you what to do and why.
Peace is easy to achieve, just surrender. Liberty is much harder get/keep.
Lots of good suggestions on how to learn what to do.
This is a good book to show you common ways you can get yourself in trouble and how to avoid them: http://pragprog.com/titles/bksqla/sql-antipatterns
J