Ask Slashdot: Learning DB the Right Way; Books, Tutorials, or What?
An anonymous reader writes "I have deep experience programming in many languages, and I've some exposure to SQL through PostgreSQL. My math goes so far as trig and algebra, with a little statistics. So far, I've learned enough to be dangerous: mostly via other people's code, experimenting, the PostgreSQL docs, etc. I've been successful using the DB in various ways, but I know I am missing a great deal (and probably doing it wrong, at that.) When DB articles come up on Slashdot, I don't recognize a good deal of the terminology. What is the best way for a technical person to learn SQL/DB work using PostgreSQL? Books? Tutorials? I should mention I don't have local access to a university or people with DB knowledge; have to do this on my own, so books or the Internet are pretty much my options."
Oh well, goes along with the theme:
http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html
Date, An Introduction to Database Systems
When Stanford first offered free online courses, I took a couple including Intro to DB. It's an online course and it was very informative and I learned a lot through it. I'm not sure when it starts next (and if you can just enroll whenever to see material), but here it is: https://class2go.stanford.edu/db/Winter2013/preview/
Keep in mind though: this is a full fledged college class, not some sort of YouTube tutorial or anything like that. If you want to follow it properly, be prepared to spend some time a week doing homework and following lectures.
Before you develop any bad habits it would be excellent to get a good handle on how to organize data. _Database Design for Mere Mortals_ by Michael Hernandez is an excellent source for this and you will be able to breeze through it with your programming knowledge. You already know data types, but this book, which does not contain a single line of code, is a good primer on data organization and techniques for making relational databases function efficiently.
A little of everything, with emphasis on hands on experimenting. Or a lot of everything, with big emphasis on hands on experimenting.
You won't learn all you need to know in a week, despite the pitches made by the books and boot camp courses.
There is no One True Way to learn a language, a piece of technology, etc. It depends on your learning style. One thing a lot of people who come into IT are shocked to discover is the sheer amount of stuff to learn, and the lack of tutorials, classes, etc., that effectively cover it. Many leave for just this reason. The first thing you need to learn in this field is how to teach yourself something, and that means knowing what works best for you. Some people need to write it down. Some people need to hear people talking about it. Some people can just absorb it by osmosis. Some people are global thinkers, others are detail-oriented. Personalities run the gamut in this field, but the one thing everyone who succeeds in this field has in common is that they can learn new information quickly, and on their own.
A lot of people will suggest books here, and that's fine. It may work well for them, and possibly for you. But you need to know what your own learning style is first, before you go much farther, especially if you're branching out into a new field or subfield. The time spent teaching yourself how to learn, and finding your own learning style, will pay for itself far, far more than any book suggested here -- your whole career will benefit.
#fuckbeta #iamslashdot #dicemustdie
On Coursera you can find the Stanford course "Introduction to Databases" by Jennifer Widom. https://www.coursera.org/course/db . It is free and covers a very broad range of database topics.
Are the databases preexisting, and you'll be "just" querying and modifying (inserts and deletes count as modifications)? In which case SQL is "all" (note the quotes) that you need to know.
Or do you also need to design systems? Then you need to know UML, data normalization, access strategies and a dozen other things.
"I don't know, therefore Aliens" Wafflebox1
While the rules that mostly determine a good design or query from a bad one are much more solid and accepted than in software development.. there is still a good amount of variability, and much like on software development, that gut feeling that tells you what's going to actually work in practice comes from just doing a lot of it.
Start up a hobby project that's heavily database oriented. Channel your inner OCD and make a program that keeps track of something you care about (and have a desire to keep track of). Could be something cliche (like your media collection) or something completely silly (like your epic collection of ponies stuff).
To the extent that set theory is important, I would recommend reading Applied Mathematics for Database Professionals (which I'm reading right now). It very carefully builds the ideas of set theory which seem necessary to work with relational DB's, one step at a time. Some of these same concepts also appear necessary if you ever want to work with graphs/lattices/big data, etc. If you've ever done a truth table, you're already mildly familiar with it.
From the book ...
It is a common misconception that the word relational in “the relational model of data” refers to “relationships” (many-to-one, many-to-many, and so on) that can exist between different “entity types,” as you too probably have designed at some point using the pervasive design technique known as entity relationship diagrams (ERDs). The word relational in “the relational model of data” has nothing to do with the R of ERD. The word refers to the mathematical concept of a (n-ary) relation, which is totally different from the “relationship” concept in ERD. The former is a mathematical—set-theory—concept; the latter eventually maps to certain cases of data integrity constraints (loosely speaking).
It's not that complicated.
In the SQL world, data is stored in "tables". Each table consists of "rows" of "records". Each record has "fields". Each field has a "field name" and a fixed "type", like TEXT, INTEGER, or DATE. Tables are created with the CREATE statement, where all the field names and types have to be specified. So that's what SQL data looks like. That part is fairly simple.
Tables start out empty. Data is added to a table using the INSERT statement. Existing records can be changed with the UPDATE statement. The SELECT statement is used for searching.
What makes SQL useful is that searching is very powerful. One SELECT statement can look things up in more than one table, find matching items, sort, summarize, and extract specific fields. The key to understanding SQL is learning what SELECT can do. On the other hand, if all you need to do is find one row of a table based on one key, the SELECT statement for that is very simple.
Tables have "indexes". If you use a SELECT statement with a search request for which there is no helpful index, the entire table will be linearly searched. This is slow. So you specify which fields need an INDEX to speed things up. This is usually done when the table is created with CREATE, but it can be done later. When looking things up with SELECT, you usually don't have to mention indexes; which index to use and how to use it is figured out by the database system.
SQL databases scale up well. Gigabyte-sized tables are normal. Terabyte-sized tables are not unusual. You can have many queries and updates running on the same table at the same time. The database system handles all the locking for you. Some database systems can be run on clusters of machines, and some support multiple redundant copies. You can do a lot of things while a database is running that you wouldn't think of as being possible. You can add a new index, or even a new field, to an existing table while the database is in use. There's a lot of heavy machinery behind the scenes to make all this work.
All the major databases try hard to maintain data integrity. A machine crash and restart will not damage any serious modern database. Program crashes are handled, too. A group of SELECT, INSERT, and UPDATE statements can be blocked together as a "transaction". The database doesn't change until a COMMIT statement is executed, and then all the changes take effect at the same time. If something goes wrong, like the program crashing or even the machine crashing before the COMMIT, the database is unchanged. If your program detects an error and needs to abort the transaction in progress, it does a ROLLBACK and the database is as it was before the transaction started. There's a lot of heavy machinery behind the scenes to make all this work.
There are security features. Access to tables can be restricted, in some cases down to the field level. Databases have user accounts, which are not necessarily tied to operating system login accounts. You can have accounts which can only read some tables, not update or delete them, or accounts which can't see some fields of some tables. This is valuable in web applications.
Database programs have libraries which allow them to be called from various programming languages. Programs in different programming languages can talk to the same database at the same time. So you're not locked to a specific programming language.
Those are the basics. Go install some SQL database on your desktop machine and play with it. MySQL, MariaDB, and Postgres are all free and will work on Linux or Windows desktops.
1) SQL is not a relational database, it is an interface to a relational database management systems, e.g. Postgresql. The "NoSQL" crowd lost me in the first 10 minutes when they showed me they did not know the difference. From that point forward I had a tough time taking them seriously.
2) Date is good but a bit hard to slog through sometimes. Realize that RDBMs are based on actual Math theory. But you don't have to derive the theory so don't be afraid.
3) Normalization is import. Honestly, people talk about the "Object/Relational impedance" and I have never seen it. I have found that if you define your objects properly up front you get your DB normalization almost for free. And if you understand your data properly and do a good job at normalization you get your objects almost for free.
4) Know your database engine. RTM and try various scenarios. Have fun but only on a test instance on a test machine.
5) Know your hardware/VM system. I have found many people blame the DB engine for poor performance when poor hardware configuration is the fault. Learn how to profile.
6) Learn how to profile software as well. Everybody blames the DB engine when performance is poor when most of the time it is their crappy code.
7) Some best practices: http://c2.com/cgi/wiki?DatabaseBestPractices
putting the 'B' in LGBTQ+
Practical PostgreSQL: http://www.commandprompt.com/ppbook/
Learn SQL The Hard Way: http://sql.learncodethehardway.org/
Use The Index, Luke!: A Guide To SQL Database Performance: http://use-the-index-luke.com/ (my own site)
Source: http://stackoverflow.com/questions/194812/list-of-freely-available-programming-books
I'm not sure it is possible, today. From about 1995-2005 forward a large amount of database work, and development of applications against databases, was done by people who didn't really have a good grasp of the fundamentals of the concept, the history of its implementation, or a very thorough knowledge of the available technology. The result is that various self-taught paradigms became embedded in an entire generation of developers, and those paradigms have now been picked up and embedded in technology by vendors. As a result it is very difficult to get any training or mentoring, or even find any books published after 1995, that go through the fundamentals and provide solid examples of the various paradigms and patterns and the strengths/weaknesses of each. IMHO that's in large part because the older patterns were stronger but more difficult to understand and use, so they got pushed aside by the self-taught patterns which are now dominate.
As a related example see this thread - I would personally say that Postgres and Oracle RDBMS each have their strengths and weaknesses, and each have places where they should be used and should not be used, but the rabid anti-Oracle posters dominate the discussion and shout down any reasoned discussion. Similarly with the "business logic in the RDBMS" paradigm - in my experience over 25 years that often works far better than the "tons of Java middleware" approach, but while each has its advantages the proponents of the latter generally shout down the former. So you only get one viewpoint, and that not always the strongest (again IMHO).
The state of business software development trends leaves me a big depressed, actually.
Date is of course very good in the earlier editions if you can understand his math. Later on he went down his own non-traditional path that is more than a bit off the mainstream, and of course there are many very capable database people who don't use or claim to be experts at relational calculus or symbolic logic (was a bit shocked to find out Tom Kyte was in that camp, although he interacts quite a bit with Cary Milsap who is on the other extreme). Kyte's books are very good although practical and not theory, and of course 99% Oracle. _Tales of the Oak Table_ is a good read with many cautionary tales and although written by Oracle professionals the case studies apply to any information technology.
sPh
http://sql.learncodethehardway.org/book/
I read the headline and thought,"I still have lots of dBase III books in a box somewhere.
Some mornings it's hardly worth chewing through the restraints to get out of bed.
Here's a stupid question.
Why not put all data into the ldap, next to all the login information etc.
Then you can learn to be a bad ass sysadmin who allows you to login from everywhere AND learn a database at the same time! Many apps like mail clients, server daemons can integrate with ldap! You can do cloud computing : sync your phone contacts to it.
If you're working in a company, tell the boss you now only need the Windows AD domain controller. It's awesome consolidation and cost savings. Also, it's a mature, market leading NoSQL implementation.
google.
The most useful way I have found of thinking in a relatively simple yet robust mathematical way about SQL and relational databases is in terms of set theory.
For example a standard join can be considered as finding the intersection of two sets.
This this level of abstraction should be usable by someone with your level of mathematical training.
Learn a bit about how databases are constructed, and you will learn why they do things they do. I nominally use mysql, but postgresql is ok too, I've also studied oracle (and used it professionally). Knowing how row locking works, what ACID means, First Normal Form, Second Normal Form, Third Normal Form, Backus-Naur Normal Form, etc. will help you (and it doesn't matter what the database is). True databases also need network connections and multiple users (in that regard, microsoft access was never a true database as there is only ever one person accessing it, so you don't need row locking, you don't need to worry about rollbacks or contention, etc). Again, all this is independent of the database. So buy a book (even a dummies book is a good start), and move on to more advanced topics from there. Another good one is "The Manga Guide to Databases" if you don't mind two Japanese Anime characters walking you through database design (and most of the theory I've yakked about so far), 213 pages to get you on your way.
then you'll be just as useless, but instead of saying
'oh my, i don't know whats going on'
you can say
'what a crappy useless design'
ANnd Jennifer Widom is the wife of Michael Stonebraker, who is a hard core database dude and also happened to be the one who started the Ingres project, on which PostgreSQL is based.
Chris date has the clearest understanding of databases out there and while most of what he's written is not vendor or even SQL specific (although some of it is too..) his thoroughness, precision and ability to go meta on every aspect of databases is without parallel. Sure, it's not "how to use postgres" but studying him will pay off many many times over during your career when you're "stuck" because a lot of being 'stuck" in this field is not understanding the actual relational model at the conceptual level and when I say "actual relational model" I don't mean SQL, which is one just so-so implementation of the relational model. It's the places where SQL departs from the relational model and the practices that have developed in the community to compensate for that departure - without people even being aware that that's what they're doing - which create the worst and hardest problems.
The Manga Guide to Databases is the best introduction to (relational) databases I've come across. It provides an entertaining and thorough overview of database terminology and concepts. I've used this book for years with junior engineers and interns who have all loved it, and the DBAs I've loaned it to were impressed by how much ground the book covers.
Excellent book that covers all the bases and leaves you well-prepared to build solid data models. From the theory (normal forms) up to the practical (when and why to index). 42 thumbs up.
http://www.amazon.com/Database-Design-Mere-Mortals-Hands-/dp/0321884493/ref=sr_1_1?s=books&ie=UTF8&qid=1373749472&sr=1-1&keywords=database+design+for+mere+mortals
Data is nothing but a collection of information. The information itself is unimportant. What is important is how that information will be used. The way information is used is the predicate for its organization.
After you grok that then you can start to learn how electronic systems store and retrieve information.
One of the things that has never let me down is the fundamental understanding of how computers work. Remember that no matter what whiz bang shite you read about, computers do exactly two things: They loop and they branch nothing more nothing less.
Once you have that in your head, then you can begin to think about how the results of your analysis of the uses of the data can best be organized to make the job of the machine smaller and faster.
None of this is intuitive. It is all based on understanding gained from reading books of theory and the practical application of those theories and by trial and error. Every data use case will present its own challenges, but if you want to do serious database work never ever let a coder dictate the data schema. Never let them short cut and make your data organization match the organization of the data structures in their code because 99 times out of 99 times they have organized their data structures to suite their coding style and 99 times out of 99 times it will be completely hair brained and not have any relevance to the best way to store and retrieve some data from the mountain of data you need to store.
Hey KID! Yeah you, get the fuck off my lawn!
A Lot of comments here about SQL and the obvious RTFM and some books answer to this question, but not one response has yet mentioned the importance of transactions (with commits and rollback points) and some other stuff like row locking / table locking and their use in avoiding nasties like phantom reads, etc.
Incidentally as a long term Oracle / DB2 user I find PostgreSQL an excellent database for non big enterprise level applications, and with some of the clustering / replication features that have appeared over the last decade or so, even that argument may be becoming a little old hat.
Read Yoshinori Matsunobu's blog:
http://yoshinorimatsunobu.blogspot.com/
At least, read his first post and view the slide deck:
http://yoshinorimatsunobu.blogspot.com/2009/04/mastering-art-of-indexing-slides.html
Normalization is good because otherwise when one of your users gets married and wants to change her surname you realise that every entry in every table that relates to the user has their surname recorded and you have to update millions of rows in twenty different tables and none of it is documented and if you miss updating it any of the tables you might break something important. In a normalised database you change it one place and you are done.
When I heard that PostgreSQL 9.2 (released last September) was touting its new feature of index only scans, I thought, wow these guys are way behind. Oracle and IBM had that in the mid-'90s and Sybase and Microsoft probably did too. To use a car analogy, that's like hearing an upstart manufacturer announce that electronic fuel injection will now be standard on all of its new cars, replacing carburetors. You think, gee what else are they missing?
You know this. I know this. I'm not confident the GP knows this. :-)
Let's see - proper RDBMS considerations:
Proper normalization of data
Efficient keying for relating data. Compound keys can be difficult to work with.
Use of simplest possible recordset type for working with data.
Understanding of boolean logic and sets. Needful for creating efficient SQL queries.
Understanding of a particular RDBMS's optimization techniques/order of operations.
Data validation to prevent GIGO.
Record locking schemes to prevent inconsistent updates - transactions should be used to minimize impact.
Learn those things which your chosen RDBMS does NOT do well and find a way to optimize or work around them.
Codifex Maximus ~ In search of... a shorter sig.
The SQL book on this course is good http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-171-software-engineering-for-web-applications-fall-2003/readings/
Subject says all. Not for me.
Are you trying to learn the concepts (normalization, cardinality, keys, constraints etc) or the hands-on specifics for a specific DBMS?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
In the beginning there was only Codd, and then came the Date...