Slashdot Mirror


Ask Slashdot: Learning DB the Right Way; Books, Tutorials, or What?

An anonymous reader writes "I have deep experience programming in many languages, and I've some exposure to SQL through PostgreSQL. My math goes so far as trig and algebra, with a little statistics. So far, I've learned enough to be dangerous: mostly via other people's code, experimenting, the PostgreSQL docs, etc. I've been successful using the DB in various ways, but I know I am missing a great deal (and probably doing it wrong, at that.) When DB articles come up on Slashdot, I don't recognize a good deal of the terminology. What is the best way for a technical person to learn SQL/DB work using PostgreSQL? Books? Tutorials? I should mention I don't have local access to a university or people with DB knowledge; have to do this on my own, so books or the Internet are pretty much my options."

16 of 106 comments (clear)

  1. No Obligatory XKCD by rwa2 · · Score: 4, Funny
    1. Re:No Obligatory XKCD by gl4ss · · Score: 4, Funny

      once he has put on a db here's a handy reference http://howfuckedismydatabase.com/

      --
      world was created 5 seconds before this post as it is.
    2. Re:No Obligatory XKCD by ICLKennyG · · Score: 4, Insightful

      I don't see where he said same schema or same hardware.

      No business cares about which database performs how well per unit of clock cycle. They care about how well does it perform per dollar. The fact that you are paying about 3-10x the hardware price in licensing costs means that you can come in and throw a lot more hardware at a problem and solve it for less money simply because the database technology is 'free'. This is the dominant reason people go with FOSS. Not because it's better but because it's good-enough and they can spend a few extra integrating it and having consultants for support and not be strapped over a barrel when Balmer or Ellison decide they need to buy another sports team or yacht respectively.

      Example:
      A single dual socket, octo-core (16 total) core box to run SQL Server will cost you about $10k from dell depending on config. It will cost you another $1000 for Windows Server (at least) and then another $110,000 (6,874/core x 16 cores)for enterprise edition (plus support!). You can buy 12x the hardware for MySQL than you can for SQL Server. There are reasons you would go with Windows, but don't scoff at people making money by replacing MSSQL with MySQL, they do exist and it's a very real business model. It gets even more fun when you start talking about virtualization and other licensing gotchas that exist with the big proprietary DB vendors.

    3. Re:No Obligatory XKCD by Anonymous Coward · · Score: 3, Interesting

      As someone who has done a fair bit in mysql, mssql, and pgsql over the past 15 years, mysql is not anywhere close to the product mssql is. That being said, pgsql has become quite a bit better than mssql recently in many ways. So probably my current ordering, from best to worst, for my most common use case (web application data tier) is: pgsql, mssql, mysql

      Now, of course it depends on the application! MSSQL has some pretty nice strategies for replication and multi master stuff that you just can't do with mysql. BUT MariaDB has some pretty nice multi-master stuff, and now has synchronous replication with Galera. If you want mysql to be anywhere near MSSQL on what I would consider "enterprise" replication, you gotta go MariaDB, and it just drops in.

      Also, I don't think anyone would argue that the polish of the MSSQL administrative toolset is quite a bit beyond anything available free or commercial for pg or my. Plus you have Analysis Services (cube stuff) and integration services (which actually aren't too great), and lots of stuff built in to .NET framework for accessing and manipulating stuff. Plus all the built in authorization stuff, kerberos/AD, you name it. The backups work awesome. T-SQL is pretty good. Stored procs and functions are pretty solid. If you are a completely Microsoft shop, it's not a complete nightmare, except for the money. And the constant changes to product naming and licensing. And the support guys from India. Ok, it sucks, but it does have a place.

      pgsql is just amazing with the groundbreaking things they are doing, and the programability is second to none, probably not even Oracle. Oracle does manage to stay ahead on some things, namely reporting workloads, but pgsql is the nerd's database, the computer scientist's database, and yet is surprisingly easy to use, maintain and make perform.

      I'm leaving out Oracle because it's redic expensive and there are other good commercial DBs for Unix (like Ingres...maybe DB2) that should be mentioned more often than they are that are too much to go into.

      But really, Mysql is shit. We have about 10 mysql servers running and a bunch of them are going to move to MariaDB fairly soon to get those replication features. Also, check out Percona, which is also a mysql replacement. I think Oracle and the remaining Sun people that haven't jumped ship have been doing an ok job keeping mysql alive enough but it's just not making the leaps it needs to stay relevant in the next 10 years. I hope they prove me wrong, but right now I am leaning towards stuff that will last 10 years and I don't see that being mysql.

  2. The answer hasn't changed for decades by Tim+Ward · · Score: 4, Informative

    Date, An Introduction to Database Systems

    1. Re:The answer hasn't changed for decades by K.+S.+Kyosuke · · Score: 3, Informative

      Amen to that, brother. Although Date has recently published a few "less hardcore" books mostly for working professionals: O'Reilly's SQL and Relational Theory might be more palatable for some. And since most people will be forced to work with SQL anyway (as opposed to D)...

      --
      Ezekiel 23:20
    2. Re:The answer hasn't changed for decades by dbc · · Score: 4, Insightful

      Yup. Date is the One True Source.

      That said, when I found myself in a similar situation to the OP, here is what I did and found: 1) Read Date. 2) Make some toy databases. 3) Go back to 1 for a few iterations. At the point where I needed to deploy a database at work, I was able to do two things: a) convice my boss to hire a database expert, and b) have a strawman design for a database to go along with a spec when the expert came on board.

      I learned that database experts learn to be very good at diplomacy :) I learned a huge amount from her as she showed me how to work up a better design and show me why it was better. Of course, that is why I hired her.

      After reading Date and making toy databases I had the fundamental concepts, but lacked the practical experience to make good choices in how to normalize the database. How you normalize impacts performance, maintainability, and ease of maintaining data integrity. So bottom line is I found that self-teaching database design got me to something useable, but like many other things in software, the fastest way to learn is to make friends with a smart cube neighbor.

  3. Stanford Introduction to Databases by dejanc · · Score: 5, Informative

    When Stanford first offered free online courses, I took a couple including Intro to DB. It's an online course and it was very informative and I learned a lot through it. I'm not sure when it starts next (and if you can just enroll whenever to see material), but here it is: https://class2go.stanford.edu/db/Winter2013/preview/

    Keep in mind though: this is a full fledged college class, not some sort of YouTube tutorial or anything like that. If you want to follow it properly, be prepared to spend some time a week doing homework and following lectures.

    1. Re:Stanford Introduction to Databases by Anonymous Coward · · Score: 3, Informative

      The class isn't in session but you can still work through all of the coursework on Coursera. https://www.coursera.org/course/db

  4. Another starter resource by tbg58 · · Score: 4, Informative

    Before you develop any bad habits it would be excellent to get a good handle on how to organize data. _Database Design for Mere Mortals_ by Michael Hernandez is an excellent source for this and you will be able to breeze through it with your programming knowledge. You already know data types, but this book, which does not contain a single line of code, is a good primer on data organization and techniques for making relational databases function efficiently.

  5. Stanford "Introduction to Databases" on Coursera by parbot · · Score: 5, Informative

    On Coursera you can find the Stanford course "Introduction to Databases" by Jennifer Widom. https://www.coursera.org/course/db . It is free and covers a very broad range of database topics.

  6. It's not that complicated by Animats · · Score: 4, Informative

    It's not that complicated.

    In the SQL world, data is stored in "tables". Each table consists of "rows" of "records". Each record has "fields". Each field has a "field name" and a fixed "type", like TEXT, INTEGER, or DATE. Tables are created with the CREATE statement, where all the field names and types have to be specified. So that's what SQL data looks like. That part is fairly simple.

    Tables start out empty. Data is added to a table using the INSERT statement. Existing records can be changed with the UPDATE statement. The SELECT statement is used for searching.

    What makes SQL useful is that searching is very powerful. One SELECT statement can look things up in more than one table, find matching items, sort, summarize, and extract specific fields. The key to understanding SQL is learning what SELECT can do. On the other hand, if all you need to do is find one row of a table based on one key, the SELECT statement for that is very simple.

    Tables have "indexes". If you use a SELECT statement with a search request for which there is no helpful index, the entire table will be linearly searched. This is slow. So you specify which fields need an INDEX to speed things up. This is usually done when the table is created with CREATE, but it can be done later. When looking things up with SELECT, you usually don't have to mention indexes; which index to use and how to use it is figured out by the database system.

    SQL databases scale up well. Gigabyte-sized tables are normal. Terabyte-sized tables are not unusual. You can have many queries and updates running on the same table at the same time. The database system handles all the locking for you. Some database systems can be run on clusters of machines, and some support multiple redundant copies. You can do a lot of things while a database is running that you wouldn't think of as being possible. You can add a new index, or even a new field, to an existing table while the database is in use. There's a lot of heavy machinery behind the scenes to make all this work.

    All the major databases try hard to maintain data integrity. A machine crash and restart will not damage any serious modern database. Program crashes are handled, too. A group of SELECT, INSERT, and UPDATE statements can be blocked together as a "transaction". The database doesn't change until a COMMIT statement is executed, and then all the changes take effect at the same time. If something goes wrong, like the program crashing or even the machine crashing before the COMMIT, the database is unchanged. If your program detects an error and needs to abort the transaction in progress, it does a ROLLBACK and the database is as it was before the transaction started. There's a lot of heavy machinery behind the scenes to make all this work.

    There are security features. Access to tables can be restricted, in some cases down to the field level. Databases have user accounts, which are not necessarily tied to operating system login accounts. You can have accounts which can only read some tables, not update or delete them, or accounts which can't see some fields of some tables. This is valuable in web applications.

    Database programs have libraries which allow them to be called from various programming languages. Programs in different programming languages can talk to the same database at the same time. So you're not locked to a specific programming language.

    Those are the basics. Go install some SQL database on your desktop machine and play with it. MySQL, MariaDB, and Postgres are all free and will work on Linux or Windows desktops.

  7. A breif intro by plopez · · Score: 4, Informative

    1) SQL is not a relational database, it is an interface to a relational database management systems, e.g. Postgresql. The "NoSQL" crowd lost me in the first 10 minutes when they showed me they did not know the difference. From that point forward I had a tough time taking them seriously.

    2) Date is good but a bit hard to slog through sometimes. Realize that RDBMs are based on actual Math theory. But you don't have to derive the theory so don't be afraid.

    3) Normalization is import. Honestly, people talk about the "Object/Relational impedance" and I have never seen it. I have found that if you define your objects properly up front you get your DB normalization almost for free. And if you understand your data properly and do a good job at normalization you get your objects almost for free.

    4) Know your database engine. RTM and try various scenarios. Have fun but only on a test instance on a test machine.

    5) Know your hardware/VM system. I have found many people blame the DB engine for poor performance when poor hardware configuration is the fault. Learn how to profile.

    6) Learn how to profile software as well. Everybody blames the DB engine when performance is poor when most of the time it is their crappy code.

    7) Some best practices: http://c2.com/cgi/wiki?DatabaseBestPractices

    --
    putting the 'B' in LGBTQ+
    1. Re:A breif intro by plopez · · Score: 3, Informative

      normalization is mainly used for space reduction, in our days that is far less important than at the time when it was "invented".

      Wrong again. Normalization is about data consistency, space saving is secondary (and given the overhead of keys, constraints, indices etc. possibly not true). But the NoSQL crowd also lost me on this point as well. "Eventual consistency?" What does that mean? How about "you will eventually get paid" or "your medical records will eventually show your drug allergies" or "your credit rating will be eventually correct".

      But you're right, I was in a hurry I should have said: "SQL is a scripting language used to access an API of a RDBMS".

      --
      putting the 'B' in LGBTQ+
  8. LDAP? by Blaskowicz · · Score: 4, Funny

    Here's a stupid question.
    Why not put all data into the ldap, next to all the login information etc.
    Then you can learn to be a bad ass sysadmin who allows you to login from everywhere AND learn a database at the same time! Many apps like mail clients, server daemons can integrate with ldap! You can do cloud computing : sync your phone contacts to it.
    If you're working in a company, tell the boss you now only need the Windows AD domain controller. It's awesome consolidation and cost savings. Also, it's a mature, market leading NoSQL implementation.

  9. Re: You haven't told us what you want/need to do. by cc1984_ · · Score: 3, Interesting

    Someone once told me "Normalize until it hurts, then denormalize until it works."