Good Database Design Books?
OneC0de writes "I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
We have some tables that have million of rows, and I'd like to know the best method of designing these tables.
I'm a developer, not a database expert. But it seems that every now and then I have to get my hands dirty with data modeling. "The best method" is probably a really vague concept. If you have serious hardware constraints than the best method changes from an easily maintainable system to something more complex. There's give and take in database design and I guess a million rows is really something that a traditional relational database should be able to handle. So I'd suggest any book that teaches data modeling will suit you here. I happened to learn on Data Modeling Essentials which was decent but not great. I have heard good things about Len Silverston's growing series that concentrates more on patterns. But really what you're going to want is a book on data modeling or analysis that teaches you the orders of normal form, when to use cross reference tables, etc so you can get a better idea of good modeling standards. At a million rows, you might not find the need to refactor if you read about the new best practices but perhaps you could make a business case to eventually migrate.
Now there are other topics that require entirely separate books because they are such a diverging path from relational databases. It's not common but your database can be based on something other than an object or table. If you consider the internals of Google, perhaps BigTable is the most prolific database implementation out there and while interesting, it is sort of a very specific proprietary database implementation. You could take this approach to tailor your company's database to be precisely what you need but this would clearly be overkill in your case. You don't talk about any bottlenecks or impending loads that need to be carefully considered so instead of treading down this path, I suggest you first take a course on MySQL or get the de facto book on whatever database you use and play around with fine tuning on a test system. A lot of DBs out there allow you to tune them through a configuration file so that your particular needs are met more closely. If you're looking for this sort of continuing education just out of curiosity, pick up a book on database design and start to tinker. But it requires a lot of knowledge and effort to start a database technology from scratch and compete with vanilla out of the box technologies like MySQL and PostgreSQL.
From what information you provide in your question, I'd suggest this book to help you understand database designs more via industry proven patterns. That assumes you have all the basic database design practices covered.
My work here is dung.
I'm sure I'm not the only code monkey who shudders at the implications of this statement.
Some of us coders are well-nigh insufferable already, because they think they're Superprogrammer and can leap tall mainframes in a single bounds check. The problem comes when those types find themselves in management -- but won't let go of the programming reins. Now, the IT guy with a new idea doesn't just have to worry about it getting shot down by the hotshot who knows everything -- he has to worry about whether his *career* will get shot down by Mr. Hotshot Coder-Manager.
There's a great chance that the article poster isn't like that. But I'm worried, because Ask Slashdot isn't who he should be asking... he should be asking the coders he manages how to design and/or restructure the database. The fact that he's asking Slashdot tells me that he's not comfortable letting someone else do the work, possibly because he's Superprogrammer and always knows what's best.
But in case that's not the situation -- maybe he's in a company that simply grew faster than its staffing could handle -- I'll answer the question. He wants a good book to explain general database design techniques. My answer: buy a good book on management techniques, because *that* is your job. Let the people you manage come up with a database design, because that is *their* job.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
Don't be afraid of older books.. database theory really hasen't evolved that much.
For a good general overview you might check out "Database Design for Mere Mortals".
I'd also recommend you find a few books on specific areas
You'll probably want a book on normalization (preferably one not written by a normalization fanboy... there are times when de-normalized designs make sense) for sure as this speaks almost directly to the goal you described.
You'll probably want one on SQL tuning as well.. and one on modeling and documentation / diagramming.
Database in Depth: Relational Theory for Practitioners
Publisher: O'Reilly Media; 1 edition (May 1, 2005)
Language: English
ISBN-10: 0596100124
ISBN-13: 978-0596100124
Best DB book i have ever owned/read/seen!
A book about good designs or a good book about database design? Some meet one criteria but not the other.
This is my all time favorite book on database design.
http://www.amazon.com/Data-Model-Patterns-Conventions-Thought/dp/0932633293
O'Reilly books are your friend. The "... in a Nutshell" books are a good place to start, and then proceed into the more advanced books. They have 25 titles related to MySQL and 53 titles related to Microsoft SQL. There are usually a few to browse through at the large chain book stores.
Serious? Seriousness is well above my pay grade.
A director that still codes? What a novel concept. Good for you.
.
>"We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
fantastic. If you ever decide to get into house-building, send me pictures.
I'm a bit unclear about what you want to achieve:
- easier end-user interface
- more reliability (backups, journalling, redundancy...)
- more speed
- more security
- more complicated data massaging (multi tables, statistics...)
- better vizualization (reports, graphs...)
I'm not sure a single book can cover all that.
The Cloud - because you don't care if your apps and data are up in the air.
I'm not sure I'd trust a book to teach this subject as comprehensively as a good university course on the subject. Frequently, you can sit a class quite inexpensively if you're not going for credit.
For that matter, isn't MIT or someone allowing free not-for-credit access to their eLearning materials?
IMHO: Joe Celko's SQL for Smarties (http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/0123693799/ref=sr_1_2?ie=UTF8&s=books) has shown itself to be very nice book when the need to go beyond the basics to a little deeper understanding of SQL is needed.
There are many other books on the subject all the way to source material from Date and Dodd but Celko seems to be well informed and writes fairly well, I think.
Basic relational database design is about logic and structure. When compared with other areas of computing, I would argue that the the original materials worked out by Codd and Date have not changed nearly as dramatically. There are certainly exceptional sub-areas where there have been major changes (e.g. the introduction of the object model and development of XML and RDF, to name but two prominent examples), but if I were you, I would suggest doing two things:
1. Do some research into existing relational database platforms (e.g. SQL Server, Oracle, PostGRES), figure out which model will ultimately work best for you, and get a book on that implementation of SQL and RDMS. Every intro to databases book has a chapter on relational database design. That will get you started. You can also skip the platform selection process if you are already stuck with a database.
2. Read this: Date on Database: Writings 2000-2006 by C. J. Date
Codd (now deceased) and Date are still the authorities, IMHO, and this latter book will give you a good overview of major conceptual issues in relational database design.
The first principle is that you must not fool yourself - and you are the easiest person to fool. -Richard Feynman
MS Access has design wizards BUILT IN! Why would you need to ask /.?
The best book I know of on the subject is SQL Server 2008 Relational Database Design and Implementation by Louis Davidson. It's a complete reference for how to approach database design, with practical exmaples.
What is this book you speak of? Is it a new development platform? If so it must be new I haven't heard of it before. Is there a online tutorial on how to develop for this "book" technology?
"I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
There is more to RDBMS than tables and SQL. Your developers should understand data normalization first and foremost, at least 1NF, 2NF and 3NF.
http://en.wikipedia.org/wiki/Database_normalization
http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Second_normal_form
http://en.wikipedia.org/wiki/Third_normal_form
The examples in the URLs above should suffice for getting a general understanding on how to start with a relational model. As for books, I'd suggest these:
http://www.amazon.com/Relational-Database-Design-Implementation-Third/dp/0123747309/ref=sr_1_4?ie=UTF8&s=books&qid=1278630155&sr=8-4
http://www.amazon.com/Information-Modeling-Relational-Databases-Management/dp/0123735688/ref=sr_1_3?ie=UTF8&s=books&qid=1278630306&sr=1-3
I would also suggest C.J. Date's "Database in Depth: Relational Theory for Practitioners", but I can imagine the local penny arcade l33t-hax0r-wannabe crowd going batshit crazy about studying relational algebra and relational database theory in depth. To each his own. Most problems that arise in poorly designed relational database models arise from not understanding data normalization
:
6 easy steps.
1) normalize 2) normalize 3) normalize 4) denormalize 5) denormalize 6) denormalize
If you are designing anything bigger than a couple of gigabytes, you are in for some fun (or your users are). ;-)
To be a good designer, there is no substitute for a thorough understanding of the subject matter. And you are a self-confessed n00b. Get an expert. Or study. Hard.
Database in Depth: Relational Theory for Practitioners.
Apprentice & dismissing the "normalization fanboy"
-
As the head guy, it would probably be worthwhile for you to hire a skilled database person to sit down with you and pair-program over at least one design for a small project with you; this type of thing is the sort of thing you can pretty easily learn by doing/osmosis. Hiring a guy for a day is pretty cheap, as long as he understands he's there to help you get your sea legs, and that if you get really stuck, there might be a future day or so of work in it for him. In the long run, this will be better than a book.
-
When someone talks about normalization, they generally really mean "3rd normal form", i.e. without replication of non-key field data between records, or to put it in Bill Kent's words: "non-key fields in a record must provide a fact about the key, the whole key, and nothing but the key".
However, when you start talking about denormalization, you are actually probably talking about use of fourth and sometimes fifth normal form instead of third normal form. Unless you are data warehousing, you're not going to see any utility in sixth normal form.
This is still normalization, so you can't throw out the baby with the bathwater if third normal form doesn't apply to your problem.
Unless a database design book describes at least first through fifth normal forms, it's not a database book you want to buy. Scan the index of any book you might think you want to buy using Amazon or Google books, and if there isn't a section on 1NF-5NF normalization, then that book is not a candidate for you.
-- Terry
Make the data type as precise as possible.
Now, try to make it more precise.
Indexes are like friends, you can have too many.
Don't do * querys.
If user MYSQL do everything in a view.
--- I do not moderate.
It's not uncommon for the chief of IT at a very early stage startup to be very technical and do a significant amount of coding in addition to management. At my company, the first chief of IT was one of the founders, along with two full-time programmers (though an extra two were hired a few months later). He coded less than the full-time coders, but he still wrote some key pieces of the early codebase, plus took care of a bunch of tasks like getting the first automated build system going so that the full-time developers could concentrate on other stuff.
Are you adequate?
I did an exam on SQL and database design recently and used The Manga Guide to Databases as part of my studies. If you don't want something too rigorous it's very good indeed - I found it a lot better at making stuff sink in than a dry, stuffy book. It gives a reasonably good idea of things like the first, second and third normal forms. Don't be put off by the fact that it looks a bit childish - the storytelling idea really works well. It probably won't work for everyone, but it did work well for me (I passed the exam with flying colours).
I couldn't agree more. Database design isn't something you should learn from a book. You CAN of course, and it might work out, but you'd be much smarter to just take a class.
Let me put it this way. Programming is really mostly about data structures. Database structure tends to live for DECADES. Screw up the initial design, and you'll be hurting the business for decades to come. This isn't something to be taken lightly. A good DB design can pay off huge returns in the future when you have to add feature X. A bad design can be extremely expensive when you either can't implement whatever the new feature they want is, or to do so would be so difficult and compromise so much of the design and capabilities as to be useless.
AccountKiller
Do you know relational algebra? If you don't, then I highly recommend:
Codd, E.F. (1990). The Relational Model for Database Management (Version 2 ed.). Addison Wesley Publishing Company. ISBN 0-201-14192-2.
It's MUCH better to know the fundamentals of database systems and then try to figure out details than vice-versa.http://ask.slashdot.org/story/10/07/08/2142211/Good-Database-Design-Books?art_pos=1#
Depends on what you want to do with your database. You have two broad options:
Normalized database:
Application developers prefer this because you'll design your database where every discrete list has its own table. The main benefits are performance and maintainability. For example, if you are tracking a list of marketing promotions and they each have a status of "Started" or "Finished", the statuses "Started" and "Finished" would be in one table and the table holding the promotions would have foreign key relationships to it. Your developers could then, instead of using string matching for "Started", filter on the foreign key of 1 (or whatever the integer key is). There's a big performance boost there. Also, you'll be able to rename and/or add to those statuses without affecting the underlying data.
De-normalized database (data warehouse):
If you are talking about 1 million+ rows you're probably going to be interested in doing analytics and reporting. The data warehouse (look for Kimball: The datawarehouse toolkit) is designed for report writers and analysts in mind. For a marketing example, perhaps you have a Promotions "Dimension" where all the attributes about a certain promotion are described (region, name, type, current status, client, etc) and one or more "Fact" tables that describe the metrics you want to measure about Promotions. For example, length of promotion, units sold, etc. This type of database structure makes it easy for people who are not SQL experts to explore and analyze the data. Data warehouses are usually produced out of an ETL (extract, transfer, load) process that copies data for a normalized database (usually because that is what the application is using).
Hope that helps.
My university course on databases used the text book A First Course in Database Systems by Jeff Ullman and Jennifer Widom. I rather enjoyed the book, and plan to have it above my desk in case any sort of database design or maintenance project comes up for me. The book's page is here; links to purchase are at the bottom.
My experience shows that when you are dealing with very big tables, or very frequent requests, or whatever else extraordinary, then all the "standard" advises are simply NOT WORKING. The only solutions is to see how the other are doing it, or to hire some DB guru, even if it is for 3-6-12 months contract, and let him rearrange and restructure your DB. Just for example, if have very very big tables, the only working solutions that i am aware of is to forget about CONSTRAINS, FOREIGN KEYS, or whatever little else convenient extra that every db book could have advise you. In fact, you should use ONLY PRIMARY KEY. Also, you should use physical partitioning, based on some criteria (like Oracle is doing it), and finally, you should have some temporary tables, containing some useful pre-calculated data.
"I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications"
Translation - I'm a one man IT department
Bud, start with the truth. A "Director of IT" does not write code. You could have equally said you were CIO, just as truthful.
That's an stupid, novice assumption. I've seen smaller operations with an IT director having to get down and dirty because of downsizing or lack of resources. Coding is done on top of the functions of IT managements.
So before you let your projecting ego go about correcting people's titles without knowing the specific circumstances, maybe you can try something more useful, like, oh I dunno, maybe answer the question and suggest a good relational modeling book. Crazy, I know!
For many aspects of coding, you are barking up the wrong tree. You should really concentrate your design thinking at the object persistence level instead of the database level. Use Hibernate or something similar to hide the database from your code.
Then you can take a critical look at your choices of database vendors, and you will be easily able to migrate your code the one whose product best fits your needs. You will be able to run your own benchmarks with your own data and see what will work best for you. You may even find that an object-based database will give you far better performance for the same coding effort.
...and improve your quality and maintainability?
Back in the 70's and early 80's we learned a methodology called, "Data Structured Systems Design" and the fundamental presupposition was that everything could be expressed logically and accurately by describing it as relationships in set theory. I have not seen anything since that surpasses the quality and maintainability of database applications and systems.
Someone already mentioned Joe Celko's book "SQL for Smarties" and I would recommend you first read his, "Thinking in Sets" before any of his other books.
I would also suggest some earlier books by Ken Orr and Jean Dominique Warnier. If you learn the Warnier-Orr approach to DESIGNING the system before doing any coding, you will reduce the time necessary for maintaining the system. I have seen hundreds of small IT shops like yours, and much of the time Systems Analysis and Design is neglected and performed "off-the-cuff" by programmers who can't wait to get to the coding. I didn't originally believe Ken Orr's assertion that spending twice as much time designing the system would result in a sharp time reduction for overall project completion, but through experience and observation I became a believer.
"The mind works quicker than you think!"
...and other like-minded groups. You're going to learn more from interactions with other DBA's than from any book. I'm a dev at a place that can measure db growth in TB/week, and have learned a tremendous amount just from working with DBA's in our organization.
Jesus told him, "I am the way, the truth, and the life. No one can come to the Father except through me. - John 14:6 NLT
Take a look at Professional Petroleum Data Management (PPDM). They are a consortium of over 100 companies who created data models used throughout the petroleum industry. Their documentation and standards are top-notch:
http://www.ppdm.org/standards/model/lite_v1/archive.html
http://www.ppdm.org/standards/model/lite_v1/documentation/html/index.html
I found "Database Design for Mere Mortals" (ISBN 0-201-69471-9) to be an good/easy entry point for good database design methodology.
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
I work with databases a lot, and while I've read books on design, I don't think it's complex with a relational database since most of the design has already been done for you. The guiding principles I follow are simple:
1) Don't over-complicate your database. Only store data that you plan on using or you need to store, stay away from adding unnecessary tables or data. Don't try to build a fancy user interface with lots of code unless you expect the database to be used by a layperson, and only build such interfaces with a specific scope in mind. If the user wants to do something more complex, let them come to you. A knowledgeable user should be able to understand and use your database without an interface. If they can't you've probably done something wrong.
2) Don't abbreviate your field names. A modern relational database can handle spaces and long names. But a modern user still can not guess what your abbreviation meant. Yes, it makes the code a little longer, but you're better off in the end.
3) Don't use an arbitrary primary key unless you have a good reason to. If you are storing data that is meaningful, it will also be unique, so you may be able to find one or two fields that can serve as a primary key on their own. This makes it easier for people to navigate/maintain/update the database later on because it keeps the number of fields down.
4) Don't store data in more than one place, that way if you need to update it you only have to change it once.
5) Lock down your data. Have a good idea about all the possible data and potential field values and don't let people enter data that your database isn't designed to handle. If there's a field with a finite number of possible entries, store those possible values in a separate table and require the field to draw it's value from that table.
No, denormalization usually refers to the use of something that isn't 3NF (which rules out 4NF and higher normal forms, all of which are also 3NF), and often non-1NF schemas like star schemas.
4NF, 5NF, DKNF, 6NF are further normalizations beyond/within 3NF, not denormalizations.
In the modern days of cheap disk, big disk caches, and large ram, proper modelling is more important than strict normalization.
Back when those books were written, disk was expensive and not cached, RAM was very expensive, and machines had terrible I/O bottlenecks.. Normalization is critical under these circumstances for maximum performance.
Today, these normalization techniques will increase performance but not as much as you might think. Really it is best to concentrate efforts elsewhere, especially for a one-person shop.
All of that normalization work requires coding changes and it will undoubtedly make the code much less readable and maintainable.
As far as I'm concerned, if you wear the hat, you're entitled to the position. The entrepreneur who successfully pulls off multiple roles has hugely outperformed a company that requires a unique person in all of them. And s/he probably has a terrible headache, too, even if s/he loves the work, but that's a separate issue.
I've fallen off your lawn, and I can't get up.
These three lessons may not all be in any one book, but they can help in the real world:
1) Learn what SQL Injection is and how to defend against it. It will ruin your day and could severely damage your current employment situation.
2) Abstract your schema from your front-end applications. Stored procedures are easy to write and can provide security and if well written stop injection attacks. They will let you change your database design without breaking your deployed apps. Just update the internal code in the P. Middleware and objects can do this, too.
3) Bergstrom's law of sailing says: "You can get away with anything in less than 5 knots of wind." Similarly, any little box or blade with 2 to 4 gs of RAM can easily handle 5 to 10 million row tables. Dedicate the server to MySQL or MS SQL so they can cache and buffer efficiently and they will outperform much bigger boxes trying to run too many schemas and DBs concurrently. Learn to index. Don't be too puritanical about normalization. Returning a customer address should require 6 joins. And remember that moving that moving large recordsets across the LANWAN may take much more time than the server query.
You probably already know all this... but maybe someone else reading this doesn't.
"Knowing everything doesn't help..."
He's like the grandfather of relational database systems. Quel truly is the language of the Gods.
How to build a database?
1. Hire database guys.
hire a dba.
if you cant then you're not an it director, my friend.
Database design for Mere Mortals
http://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp/0201752840/ref=sr_1_1?ie=UTF8&s=books&qid=1278637201&sr=8-1
and
Sql Queries for Mere Mortals
http://www.amazon.com/SQL-Queries-Mere-Mortals-Hands/dp/0321444434/ref=pd_bxgy_b_text_b
Bothe are really good books! They take you from start to finish on SQL databases and good design principles
-joe
-- Many men would appreciate a woman's mind more if they could fondle it
You have to make sure you aren't doing full table scans on any big table. Indexes are your friend.
Sql hasn't changed much in say, the last ten years. Newer books aren't going to help more than older ones.
The less joins the better without taking other things into consideration.
Duplicating data in tables is a bad idea. Not necessarily for performance reasons, but because every time you change that data you will have to remember to change it in more than one place.
There are other databases out there other than mssql and mysql that may improve your performance more than what you can change with schemas and sql querys.
Finally, if it is fast enough just leave it. There is a big difference between queries that get hit once a minute and a thousand times a second.
Joe Celko writes fabulous SQL/database books! They are a joy.
Sorry,I don't know the method of designing these tables. gucci bags http://www.gucciusaoutlet.net/index.php?main_page=index&cPath=77&zenid=aa7cf3c1c03ba16b9ac00c8cd2a23ef2 http://www.gucciusaoutlet.net/index.php?main_page=index&cPath=72&zenid=aa7cf3c1c03ba16b9ac00c8cd2a23ef2
Back in the day when they were their own company they used to recommend
Designing Quality Databases with IDEF1X Information Models
I found the book VERY informative
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
Check out Database Design for Mere Mortals... It's a pretty good book for beginning database design.
One book that blew my mind was "The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling" by Ralph Kimball. It's pretty easy to read and offer some very interesting techniques for designing data warehouse schemas.
This is a great book that is platform agnostic: Database Design for Mere Mortals. It goes through each phase of developing a database using an evolving project from conception to final product. It focuses on design and less on SQL and specific database software. Highly recommend!
The best advice that I can offer is:
* Use the best tool for the job (don't use SQL for everything, key/value DBs are better for *many* tasks)
* Index smartly (don't index on the whole string if needed, try just the first few chars sometimes)
* Make sure your indexes fit in memory
* Other than that, just log long-running queries and optimize those.
- Steve
What books do you have in mind? I did a search for Ken Orr and Jean Dominique Warnier, and neither of them seemed to have a book that was 'best.'
Qxe4
Fundamentals of Database Systems by elmasri, navathe. Very detailed book on the theory and design of databases.
The Data Modeling Handbook : http://www.amazon.com/Data-Modeling-Handbook-Best-Practice-Approach/dp/0471052906/ref=sr_1_58?s=books&ie=UTF8&qid=1278645029&sr=1-58 It is not new, but relational theory hasn't changed much in the last 20 years either. I have been designing, developing, implementing and fixing relational databases and data warehouses for the last 15 years. The book above was one of the most useful things I read early on in my career. In my opinion, data integrity is one of the most valuable functions that a database can provide, and a high quality data model is the most important first step in ensuring that. Understanding tuples, understanding relationships and understanding how to translate your business model and business requirements into a functional and correct data model is a very valuable process. Skipping this step, or attempting it with a limited understanding of the theory behind it is a major mistake.
Database Modeling and Design: Logical Design, 4th Edition. Its ISBN is 0126853525. It taught me a lot about how databases work "under the hood". If you want to know the performance implications of a b+ tree index vs. a b-tree, this book will help.
Comment removed based on user account deletion
Open a database book's index. Look up "natural key". If the author presents natural keys as a viable alternative to surrogate primary keys, then the author doesn't know what he or she's talking about and you should move on to the next book. Especially if they pull the whole "meaning" schtick.
On the other hand, if the author gives you great real-world examples as to why you should use surrogate primary keys over natural keys in ever possible situation, then you've got a winner.
Database fundamentals haven't changed much. I don't know how much you know so far but this guy is pretty smart:
http://philip.greenspun.com/sql/
http://philip.greenspun.com/panda/
http://philip.greenspun.com/wtr/
Lots of the core stuff about RDBMSs goes back decades and even old stuff like this is still very relevant. Try reading this page (just a dozen printed pages) and see what you think. He covers a lot of the fundamentals well and his style of writing is pretty entertaining.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Skip the database part altogether. Just focus on the domain model. Why do the work twice.
http://www.amazon.com/dp/1934356557?tag=intltechventu-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=1934356557&adid=1DHT9ZQAG96YQNE50MNV&
Since you're Director of IT, I'd recommend you to start from http://en.wikipedia.org/wiki/Database_normalization
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
the 3rd member of the staff, hired by a friend who was the second member of the staff. Eventually we wound up with nearly 2 dozen people, many better than me or my friend.
But even when I was Application Development manager, I designed table structures and wrote custom queries to reply to FOIA requests for data.
I took some graduate school classes after getting my BSCS, so as to have access to a computer while looking for my first job, which tells you something about when this was. The best class was Relational Data Base using "An Introduction to Database Systems" by C. J. Date. ISBN 0-201-14471-9.
Mr. Date, along with Mr. Codd, invented relational calculus, including normal forms. In later classes at work we were strongly advised to use 3rd normal form, as even mainframes of the day couldn't really support 4th or 5th. That instructor had participated in a project to rebuild a 5th normal form system into 3rd for Westinghouse, whose mainframe choked on the small (low column count) tables
and huge keys required by 5th normal form.
The book covers other styles of databases, network and hierarchical, but both are antique now. So I'd skip or at most skim those chapters. They show how Relational DB design grew out of experience with shortcomings of Multics and IMS, early network and hierarchical DBs, respectively.
Other commentors are correct, which DB software you use isn't terribly important for good table structure design. Learning how to select keys for uniqueness and design tables to be non-redundant are not database-specific solutions.
Do good backups, and practise restoring from them regularly, it doesn't matter how well-deswigned a DB is if the hardware fails and you can't recover the data.
Think of the Irony!
Basically, none of your comment is right.
Are you adequate?
I've read a lot of database books in my time, and been around some of the biggest rdbms instances on the net. Here's probably my top three:
1) If you don't know SQL, O'Reilly's _Learning SQL_ is the best intro I've seen. This doesn't sound like what you're looking for, though.
2) If you know SQL reasonably well, but you want to get much better, I can't recommend O'Reilly's Theory In Practice book _The Art of SQL_ highly enough. I don't have it in front of me to remember precisely why I liked it so much, but it's outstanding. If you're going to get one book, get this one.
3) To really get the theory of databases, one of CJ Date's books is good. Someone else recommended _Databases in Depth: Relational Theory for Practitioners_, and I recall that looking pretty good when I skimmed it once. I would also recommend _Foundation for Object / Relational Databases: The Third Manifesto_, which I found to be very educational. You need to be careful with Date, though, because he tends to advocate how things "should work", not how databases actually work, and so you may find him advising you to do things that are actually bad ideas on your database of choice, so balance this off against good books for your specific DB
Also,
4) If you need to build large data warehouses (doesn't sound like you, yet), then Ralph Kimball's _The Data Warehouse Toolkit_ is all you will need to understand the theory. Unfortunately, effective warehousing is especially tied to your database of choice, so you'll want to hit the product manuals even harder here.
5) If you need to do OLAP (also not likely), there's only one generic book that's particularly good at all for the concepts, _OLAP Solutions_ by Erik Thomsen, and even that is not that generic. Unfortunately most OLAP and BI stuff is highly vendor-specific.
Joe Celko's books are also pretty good, in my experience, but I wouldn't buy them before the above.
A lot of responses are referencing some good books and you should give them due consideration.
One post spoke of set theory. Definitely look into that.
Now for the real world. DB Theory is great, it is a base of knowledge that needs to be obtained and digested. You should understand 1st norrnal form all the way to 6th normal form and that will take a bit to wrap your head around
Now that you have accomplished that, it is time to start breaking those rules and get some work done. Databases do one thing and ONE thing only, they store data for later retrieval and analysis, ie: reports. Now you can normalize your brains out and that will end you up with a database that it is close to impossible to write reports against without using the most insane joins that you have never even thought about..
Start with your most common report and start looking at how the data breaks out. Does it fit in an ultra normalized model or a hybrid model. Understand the differences between OLTP and OLAP ( if you don't know what those acronyms mean, loook them up it's important.
Database modeling is not an exact science and sadly much of it is inherent in a certain type of mind, some gots it, some don't. It is a lot like a puzzle, you look at it and then you have that "aha!" moment and start re-arranging tables and columns and suddently everything works correctly, which is to say you get your data out with a reasonable amount of effort instead of having to have roll-up tables and so forth and so on.
Lastly, you as a programmer should know better then anyone else that you can't proof your own work as effectively as someone else can so I suggest you hire in a database guy or gal to look over your shoulder now and then to give you a reality check.
Best of luck!
Hey KID! Yeah you, get the fuck off my lawn!
Disclaimer: I'm a developer on the SQL Server team at MS. You can get a lot more out of SQL Server and MySQL if you tailor your design to take advantage of the features that each has to offer. Properly tuning your database can help you avoid having to throw more hardware at the problem. I'm not sure what the best books are for MySQL, but for SQL Server be sure to check out the "Inside SQL Server 200?" series (there are editions for 2005 and 2008). If you read and understand those 5 books you will understand the best practices for designing SQL Server databases.
This book doesn't teach you what to put in each table and how to put them together. But, when you have large tables it's very important to have good indexes, and to know how to evaluate that your queries are using them correctly. There is a good explanation of using the Profiler here, as well as a nice tutorial-style approach which explains how to improve your indexes based on the execution plans that the Query Compiler spits out.
Don't log anything so you don't run out of disk space and lose data.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
This one was reviewed by /. some time back ( http://news.slashdot.org/article.pl?sid=06/06/07/1458232 ), and I've found it to be an invaluable tool for tackling precisely the sort of problems you're describing. It's a book on how to think about SQL, with a lot of good insight into how the DB engine will be doing it (so you can plan accordingly for the shortcomings). After reading this book, I felt much better prepared to tackle schema and query design, including all the big questions that come with large tables and projects such as "Should I denormalize this?" or "Should I add more indexes to this already heavily indexed table?" I found the writing to be quite accessible, and recommend it highly to others.
That green slime had it coming.
i would suggest http://books.google.com/books?id=3MxQAAAAMAAJ&dq=ramakrishnan%20database%20managment&source=gbs_slider_thumb. I have read the second edition volumes A&B, but now i see there is a third edition too. Great book by great authors. cheers
//LIFE WOULD BE EASIER IF I HAD THE SOURCE CODE!
http://www.amazon.com/Database-System-Concepts-Abraham-Silberschatz/dp/0073523321/ref=sr_1_2?ie=UTF8&s=books&qid=1278670370&sr=1-2
"Agile Database Techniques" and "Refactoring Databases" by Scott Ambler (et. al) both discuss how to augment & maintain existing databases without breaking them and how to incorporate lessons learned as you go. This is good if you need to modify or improve what you already have, and also if you have a megalithic up front data model which turns out to be flawed these techniques can help you avoid having to start all over again.
Link to Ambler's book page: (http://www.ambysoft.com/books/). For the record I have never had dealings with Ambler, I'm merely a dev who found some of his works useful
You'll probably still need a good relational db fundamentals reference. I found Date's book (mentioned in posts above) useful but your mind may not agree with his style, which is mathematician-like.
Yandelvayasna grldenwi stravenka
Get books with lab materials. Element K has some great stuff, and Axzo press has good supporting materials, and publishes previews of their books online. Email either one of these publishers, tell them you are considering using their materials for in-house training. They'll send you an evaluation copy for free. The stuff we use in our classes works because after spacing out to what I say for an hour, the participants then have to build it using supporting activities. Reference books, or even college texts don't typically include that. Grab a lab manual on MS Access and skip all the access-specific stuff. Every training guide starts with the fundamentals of DB structure and normalization.
OT, but can you provide a citation for Bergstrom's Law? (I think it's clever, but a casual googling doesn't turn up a source. Is it yours?)
To extend the original question, how do the cubes fit into data modeling and database design? Any comments would be greatly appreciated. Thanks.
All big-iron database engines that most people are familiar with were written in an era when 210 MB was a large HD and 2 MB was a lot of RAM. As a result, they make many design decisions that are simply not useful any more -- the very idea of an index makes very little sense if the entire table fits in RAM.
In spite of five or six doublings of Moore's Law, every DB book I've seen is still written as if it were the 1990s. They'll tell you to normalize your tables, never use insert-time as an index, use multi-part indexes, etc.
All of it is wrong. Generally speaking, on any normal data problem you can simply insert every single object into a single large table in the order they are created and that will be the most efficient way to get the data back. That's because the DB designers have spent the last 20 years making their DBs work the way most DB programs use them.
I experiment with every design decision. With modern tools I can built a temp table, populate it with millions of rows, index it, benchmark it and drop it again in a few minutes. When that doesn't tell me what I want to know, I go to the forums.
Don't be afraid to experiment!
Maury
I use forums when I run into particular problems. I
To give a small, simple example, consider a sales database. You have customers, you have products, you have invoices, but mainly you have items sold (simplifying.) A relational db designer might, without thinking, use the product object's primary key as a foreign key in the sales item list. That is sparse but omits the consideration that you sell the same product at different prices at different times. So, for speed of reporting, without growing the tables too fast, you would store the invoice number,despatch note number,product object's primary key, the selling price, and the sales tax rate. When you warehouse the data, you add indexes for fast retrieval. But your fastest growing table - items sold - remains very compact because it has only 5 elements, all numeric.
I hope this isn't just a ramble and is of some use, but I've been in your position - in a fast growing company - and you have my sympathies.
From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
The smaller the data set the more Normalize the database can be structured without a noticeable loss in performance. Large data, (what large is depends on the DB software) lets say 10+ million, then the less normalization the better.
Come up with a naming convention and stick to it. Include a description of the table/fields use in the data dictionary, helps to keep stuff documented.
All tables should have a unique primary key.
Only use a single field as the primary key so table join are fast. Reverse order number PK indexes if the DB supports them.
Only add indexes based off real SELECT statements and not just for the sake of it.
Store the data and indexes on different drives, the same with the undo files or archive logs. This will help load balance IO performance.
Archiving is the key to keeping large data sets manageable.
Figure out the average size of a record in a table then configure the block storage to hold the most amount of them without creating fragmented records. Meaning a record stored in 2 db blocks is very cost to look up when it should be stored in just one. You also want to minimize the white space in a DB block so it doesn’t waist space.
Rebuild indexes regularly for tables that change a lot
Indexes on fields with low cardinality doesn't help much, unless you have bitmap indexes.
"Database Design for Mere Mortals" by Michael Hernandez and "SQL Queries for Mere Mortals" by Hernandez and John Viescas. These are excellently written books that really do a great job of covering the essentials, but are suitable even for non-technical folks. They are extremely approachable, and if you're starting to look at theory, I would highly recommend you put these books near the top of your start list.
I'm not a human, but I play one on T.V.
"Good Database Design Books?"
So you're certainly not looking for anything about MySQL.
Comment removed based on user account deletion
The Handbook of Relational Database Design, ISBN 0-201-11434-8, by Candace C. Fleming and Barbara von Halle is superb. A step by step handbook that tells you what to do and why.
Peace is easy to achieve, just surrender. Liberty is much harder get/keep.
If you already understand the essentials of data modeling, then all you need is the Data Model Resource Book by Silverston, Inmon and Graziano.It has the ERD models with explanations for all typical processes found in a company.
For basics, here is a page: http://moxcey.net/mike/pgm/dbs_design/01_intro.html
I learned dbms from:
Database Management Systems (Third Edition)
by Raghu Ramakrishnan and Johannes Gehrke
The book has a bunch of cows on the cover.
It definitely has its limitations, and will not teach you advanced topics in DBMS, but based on your post it sounds like this is where you should be starting. It will teach you relational algebra, how to make advanced SQL queries, about the physical properties of a hard disk, and how to organize the tables and use the most efficient sorting and search methods to optimize your database.
A large portion of the book is also dedicated to teaching you how to program everything you've learned, but for the sake of practicality you might want to ignore these sections.
Good luck.
-Keith
Database Systems: Design, Implementation and Management 9th Edition by Carlos Coronel
ISBN: 978-0-538-4698-5
The way that your data is modeled will have a great influence on systems performance, maintainability, growth potential, etc. Find someone who is versed in several database technologies and not just one database system. When your only tool is a hammer, everything begins to look like a nail.
Find someone who is versed in at least two RDBMSs, at least one XML database (I prefer eXist-db and Marklogic), LDAP, etc. That person has the experience to help you avoid problems and to develop a better solution.
In the long run, a good Data Architect will save you money!
or how the hell are you Director of IT without any background in DB design.
Anyone who has taken the first couple years of a Computer Science degree, should have a handful of expensive books on their bookshelf collecting dust.
Also if your looking for a "New" book, you are interested in particular implementation of modern software. Get My/MS SQL for dummies or something. The basic principles of how to design optimal relational databases hasn't changed all that much in the past decade that I am aware of.
About the only thing that has changed other than the software with some nice new tools, is scale in that Databases are continually getting larger over time, which may influence how you design things for optimal performance depending on what/how you use your DB for. Modern Databases also allow for more than just text and numbers now, and you can now store things like media and various other constructs, how these are organized may have more a function of the software you are using, and design may impinge upon application design which is something else entirely.
Also the type of book you need depends on the type of DB you plan on designing as well. My advice is to take some DB courses at your local university or online. If you don't have the prerequisites for those, I would argue you have no business trying to do this work yourself, and to hire a Computer Science Grad. They are a dime a dozen now, and will likely be happy just to have a job in his field when he graduates.
Lots of good suggestions on how to learn what to do.
This is a good book to show you common ways you can get yourself in trouble and how to avoid them: http://pragprog.com/titles/bksqla/sql-antipatterns
J
1. If you ask it to retrieve a million records, it will.
2. After you design a database, toss the whole thing and start over. Repeat until done.
3. (Borrowed from project management) You can have it fast, cheap, or good. Pick any two.
Check out High Performance MySQL by Schwartz, Zaitsev, Tkachenko, Zawodny, Lentz and Balling
http://oreilly.com/catalog/9780596101718
A couple of the authors are the guys that run http://www.mysqlperformanceblog.com/, which is itself a good resource for MySQL scaling.
I was going to compile a searchable DB containing a list of good DB books but I didn't know where to start.
I started my illustrious career in the IT world as a system administrator until shifting into programming, which I like much better and has remained my focus for the last 10 years. That's not to say I still don't get sucked into doing the sys admin work and even that of what should be a DBA.
Being as such, I've had to learn some things through the school of hard knocks, others by trial and error, and more through the vast abundance of resources and printed texts. While you can't go wrong by reading any O'Reilly publication, and I am a devout fan of always reaching for the O'Reilly book on any topic first (and they do have a book on virtually everything technical you could want to know), they still cannot cover everything. Any database is vast in it's complexity and configuration settings to optimize it's performance. Each has advantages over other DB's, like the fact that MySQL has virtually no overhead in LEFT JOIN statements, making it smarter to break tables down into chunks of data used for various purposes and split each into it's own table. Of course each DB also has it's handicaps, using MySQL again as an example, the fact that MyISam tables don't play well with InnoDB tables. MyISam tables are amazing for full text index and searching, but have no foreign key referencing ability like the InnoDB, so your left making a choice of data integrity or searchability.
I've found that the O'Reilly books are a great start, but for the neurotic programmer always reaching for improvement, I really think the best resource is in reading the full documentation provided by the DB. MySQL has a wealth of information in their on-line documentation, as well as PostgreSQL, including user feedback and comments. You will end up reading 100 pages for that one page that gives you the answer you wanted, and some pages may take you re-reading it 10 times to understand their way of explaining something, but they provide information I haven't seen in any other resource. While they don't have a "design" section, they do give tips throughout the entire content scattered. Like MySQL recommends tables to contain less than 15 columns, or how to define a column with the correct and most efficient data type and size, such as a column for a zip code. Would you use an integer? Valid only for US and wouldn't cover hyphenated additional 4-5 digit entries, but you could - and making it unsigned can reduce the size you need. If using a varchar column, it's no advantage limiting it to 10, as 16 takes up the exact same amount of memory by how MySQL allocates space. It's the little things like that which are handy, but more than a pain to have to read through the docs. Luckily that's where O'Reilly should come first and fills those gaps first, then torcher yourself with the documentation for the little improvements over time. 8-)
-Davey
Wow, Slashdotters, ask a simple question, get a hundred flames in response. Yes, I know he should have seen it coming, but hopefully he'll be able to locate some decent information in the midst of most of your garbage. Go flame about something actually controversial for Pete's sake.
Data Modeling and Database Design (amazon.com) by Umanath and Scamell - Excellent resource that covers several levels of data modelling, including conceptual schemas and logical schemas as well as normalization, relational algebra and quality/efficient SQL design.
If you have a table with lots of entries (millions of rows), then the best thing you can do is to use any tools that will tell you what are the most frequently requested entries, and then reorder the table on that basis, especially if the table is sequentially scanned - no one needs a book to figure this out. I would expect a table that big not to be ordered sequentially, but hashed in some way to lower the scan time.
I once worked for a local university's administrative computer group that used an IBM Database, and a brand new hired project leader came in and re-ordered one of our most frequently referenced (small) tables into alphabetical order. We informed him that he had unilaterally slowed down our system by at least an order of magnitude - and the table was put back from a restore operation.
Different types of applications use data (and hence databases) in very different ways. The above posts have done a good job of covering books about design for OLTP type databases and fundamental theories, but there's a whole other world of design when it comes to reporting and data warehousing. For DW type applications, the book you want is The Data Warehouse Toolkit by Kimball. There's a version of this book that is MSSQL specific which you probably want to go with. You'll also find that MSSQL does a pretty awesome job of supporting DW and OLAP. For OLTP, MSSQL is decent, but I'd strongly recomment checking out PostgreSQL if you're looking for a free db.
I'm surprised that nobody has mentioned Ramez Elmasri & Shamkant Navathe's "Fundamentals of Database Systems". It is a weighty academic tome but it is thorough and likely to rid you of many bad design habits. I have many gripes with it, but principally - as far as your issues are concerned - that it should have dealt with normalisation at an earlier stage and pulled the whole issue more clearly out from the wider discussion on functional dependencies.
It is a free (to develop, deploy and distribute) database.
There are some limitations; "Oracle Database XE can be installed on any size host machine with any number of CPUs (one database per machine), but XE will store up to 4GB of user data, use up to 1GB of memory, and use one CPU on the host machine."
http://www.oracle.com/technology/products/database/xe/index.html
IMHO, it is vastly superior to Access, and because I am familiar with pl/sql and DBA actions, it is much simpler to me than either MySQL or PostgresSQL
If you are really comfortable with Access's data modeling capability, you can also download (and 'evaluate') Oracle Designer, which can generate table create statements and appplications based on data you enter through BPM and ERD diagrams
Wherever You Go, There You Are
See subject-line above... says it all!
APK
I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology(sic.) at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with a number of jargon words which convinced my supervisor to hire me, and have a general understanding of what someone might want me to do. What I'm looking for is a good book, particularly a newer book, to explain generally what I actually need to type into my computer to make it happen. We have some software that is broken, and screaming users, and I'd like to know the best method of disposing of them.
Much good advice here but before you go to far you need to understand your data.
Understand your data.
How you use it.
How it changes.
Why it changes.
Once you get a handle on the data then do a risk assessment.
Then you are ready to begin.
Ward
. Silence! Be thankful thy species is unpalatable! .
James Martin used to be considered an authority on this stuff back in the days of Big Iron.
He has written a couple of books on DB design, including relational DB design.
Having said that, there has been a huge advance in technology, and in the SQL language since his day.
The principles remain sound, especially for Relational DB's but the implementations may be different.
There is one other thing to consider. What type of queries will you be making against the million row tables?
OLTP queries where you expect to get one or two records back, or Decision Management queries where you will get several hundred or thousand rows back that
will then be summarized in some sort of report/spreadsheet/graph?
In the first case traditional relational DB design with SQL is probably still OK. In the second case there are new ways to approach this, with the table designs being based on columns
and not on rows.
softcoder.
Unfortunately you might have to read them all in order to get a complete picture. On the positive side, they are all easily read books. Look up books from Ken Orr and Associates; some of the best books were written by others in his company. It may look like the subject matter is systems and program design plus requirements definitions, but it is ALL database driven and data-oriented.
"The mind works quicker than you think!"