Slashdot Mirror


Ask Slashdot: Learning DB the Right Way; Books, Tutorials, or What?

An anonymous reader writes "I have deep experience programming in many languages, and I've some exposure to SQL through PostgreSQL. My math goes so far as trig and algebra, with a little statistics. So far, I've learned enough to be dangerous: mostly via other people's code, experimenting, the PostgreSQL docs, etc. I've been successful using the DB in various ways, but I know I am missing a great deal (and probably doing it wrong, at that.) When DB articles come up on Slashdot, I don't recognize a good deal of the terminology. What is the best way for a technical person to learn SQL/DB work using PostgreSQL? Books? Tutorials? I should mention I don't have local access to a university or people with DB knowledge; have to do this on my own, so books or the Internet are pretty much my options."

106 comments

  1. No Obligatory XKCD by rwa2 · · Score: 4, Funny
    1. Re:No Obligatory XKCD by gl4ss · · Score: 4, Funny

      once he has put on a db here's a handy reference http://howfuckedismydatabase.com/

      --
      world was created 5 seconds before this post as it is.
    2. Re:No Obligatory XKCD by Anonymous Coward · · Score: 0

      Funny how that site says MySQL is fucked, but MSSQL is "probably not" when I make a substantial living replacing MS DB's with FOSS ones that literally triple the performance.

    3. Re:No Obligatory XKCD by K.+S.+Kyosuke · · Score: 1

      Non sequitur.

      --
      Ezekiel 23:20
    4. Re:No Obligatory XKCD by aztracker1 · · Score: 1

      I doubt you are getting triple the performance from a FOSS database as you do MS-SQL from the same hardware, with the same schema design. There's also feature set to look at.. not that I use any database as much more than dumb storage.

      --
      Michael J. Ryan - tracker1.info
    5. Re:No Obligatory XKCD by turp182 · · Score: 1

      Thanks for that site, awesome.

      --
      BlameBillCosby.com
    6. Re:No Obligatory XKCD by The+Snowman · · Score: 2

      I doubt you are getting triple the performance from a FOSS database as you do MS-SQL from the same hardware, with the same schema design. There's also feature set to look at.. not that I use any database as much more than dumb storage.

      I agree. I have increased performance for specific tables and queries more than ten times simply by looking at the MS SQL query analyzer. Often it will even give you the SQL you should run to create the index it thinks it needs. Sometimes it requires more thought than that, but MS's little SQL Server has come a long ways since its split with Sybase. Hint: Sybase is still a piece of shit but Microsoft actually improved SQL Server.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    7. Re:No Obligatory XKCD by ICLKennyG · · Score: 4, Insightful

      I don't see where he said same schema or same hardware.

      No business cares about which database performs how well per unit of clock cycle. They care about how well does it perform per dollar. The fact that you are paying about 3-10x the hardware price in licensing costs means that you can come in and throw a lot more hardware at a problem and solve it for less money simply because the database technology is 'free'. This is the dominant reason people go with FOSS. Not because it's better but because it's good-enough and they can spend a few extra integrating it and having consultants for support and not be strapped over a barrel when Balmer or Ellison decide they need to buy another sports team or yacht respectively.

      Example:
      A single dual socket, octo-core (16 total) core box to run SQL Server will cost you about $10k from dell depending on config. It will cost you another $1000 for Windows Server (at least) and then another $110,000 (6,874/core x 16 cores)for enterprise edition (plus support!). You can buy 12x the hardware for MySQL than you can for SQL Server. There are reasons you would go with Windows, but don't scoff at people making money by replacing MSSQL with MySQL, they do exist and it's a very real business model. It gets even more fun when you start talking about virtualization and other licensing gotchas that exist with the big proprietary DB vendors.

    8. Re: No Obligatory XKCD by Anonymous Coward · · Score: 0

      based on your "triple performance" metric, i suspect your "substantial income", isn't.
      Unless your the same guy who borked the original MS SQL installation.

    9. Re:No Obligatory XKCD by rk · · Score: 1

      Also, thanks for that. As an ex Slony user, I appreciate that reference!

    10. Re:No Obligatory XKCD by plopez · · Score: 1

      "not that I use any database as much more than dumb storage"

      Which is the right thing to do. Embedding business logic in your database is an anti-pattern. Basically the same as mingling your model with your controller.

      --
      putting the 'B' in LGBTQ+
    11. Re:No Obligatory XKCD by rev0lt · · Score: 1

      No business cares about which database performs how well per unit of clock cycle. They care about how well does it perform per dollar.

      They do, because time is money. If a given daily task that takes 30m can be done in 10m, it is a matter of cost. Also, throwing hardware at it increases operational costs - increased complexity to maintain, electricity/ HAVAC, etc.

      As for your example, it is flawed - if you truely believe that you can replace SQL Server Enterprise Edition with MySQL (when you actually need the enterprise features - or actual SQL features), well... go ahead - it would be fun to watch. Since most MSSQL users don't need the enterprise edition (even if they DO need a proper RDMS with advanced features and not MySQL), we're talking about ~$8500 for your server and ~$210 per user.

    12. Re:No Obligatory XKCD by Anonymous Coward · · Score: 3, Interesting

      As someone who has done a fair bit in mysql, mssql, and pgsql over the past 15 years, mysql is not anywhere close to the product mssql is. That being said, pgsql has become quite a bit better than mssql recently in many ways. So probably my current ordering, from best to worst, for my most common use case (web application data tier) is: pgsql, mssql, mysql

      Now, of course it depends on the application! MSSQL has some pretty nice strategies for replication and multi master stuff that you just can't do with mysql. BUT MariaDB has some pretty nice multi-master stuff, and now has synchronous replication with Galera. If you want mysql to be anywhere near MSSQL on what I would consider "enterprise" replication, you gotta go MariaDB, and it just drops in.

      Also, I don't think anyone would argue that the polish of the MSSQL administrative toolset is quite a bit beyond anything available free or commercial for pg or my. Plus you have Analysis Services (cube stuff) and integration services (which actually aren't too great), and lots of stuff built in to .NET framework for accessing and manipulating stuff. Plus all the built in authorization stuff, kerberos/AD, you name it. The backups work awesome. T-SQL is pretty good. Stored procs and functions are pretty solid. If you are a completely Microsoft shop, it's not a complete nightmare, except for the money. And the constant changes to product naming and licensing. And the support guys from India. Ok, it sucks, but it does have a place.

      pgsql is just amazing with the groundbreaking things they are doing, and the programability is second to none, probably not even Oracle. Oracle does manage to stay ahead on some things, namely reporting workloads, but pgsql is the nerd's database, the computer scientist's database, and yet is surprisingly easy to use, maintain and make perform.

      I'm leaving out Oracle because it's redic expensive and there are other good commercial DBs for Unix (like Ingres...maybe DB2) that should be mentioned more often than they are that are too much to go into.

      But really, Mysql is shit. We have about 10 mysql servers running and a bunch of them are going to move to MariaDB fairly soon to get those replication features. Also, check out Percona, which is also a mysql replacement. I think Oracle and the remaining Sun people that haven't jumped ship have been doing an ok job keeping mysql alive enough but it's just not making the leaps it needs to stay relevant in the next 10 years. I hope they prove me wrong, but right now I am leaning towards stuff that will last 10 years and I don't see that being mysql.

    13. Re:No Obligatory XKCD by Dog-Cow · · Score: 1

      Depends on what you mean by embedding. The stored-procedure engine of an RDBMS is really a separate system that happens to reside in the same software stack as the data access portion. There's nothing wrong with embedding business logic in SPs, especially if you have multiple systems accessing the DB. In that scenario, you don't have a single point of access other than the DB which can act as a controller.

      And, as with most design patterns, they work great until they don't. Being able to realise when they don't is an important skill.

    14. Re:No Obligatory XKCD by rycamor · · Score: 1

      I think modern developers have a very fuzzy idea of what "business logic" means. I model business logic all the time in my databases, completely without the use of stored procedures (or even custom functions for the most part). Just the fact that you use a foreign key constraint happens to be business logic, and it *absolutely* belongs in the database.

      This idea that a database is just an adjunct to the application only works for startup companies or lightweight web applications. Once you have a mature company doing anything of any real complexity, you will end up with such problems as

      a. databases being accessed by more than one application.
      b. databases that outlive application (or several applications)
      c. enough tragic "oops" moments from the application team that the company finally hires a real DBA and/or data architect to make sure that no matter what, the data has trustworthy characteristics and integrity on its own.

      Yes, data IS more important to a company than your application. Get over it. Just like the money in the bank (and who it belongs to) is more important to the bank than its tellers. It's just a fact of life.

  2. The answer hasn't changed for decades by Tim+Ward · · Score: 4, Informative

    Date, An Introduction to Database Systems

    1. Re:The answer hasn't changed for decades by K.+S.+Kyosuke · · Score: 3, Informative

      Amen to that, brother. Although Date has recently published a few "less hardcore" books mostly for working professionals: O'Reilly's SQL and Relational Theory might be more palatable for some. And since most people will be forced to work with SQL anyway (as opposed to D)...

      --
      Ezekiel 23:20
    2. Re:The answer hasn't changed for decades by Anonymous Coward · · Score: 1

      As a dba for over 40 years and having worked on almost all the major db systems around (and some that are no longer with us) I can totally recommend the Date/Codd books on db theory. They have it all. What they don't have about some specific system you can get from the vendor documentation or via online searching. I still have my original 1971 edition of Date & Codd's book on db systems.

    3. Re:The answer hasn't changed for decades by dbc · · Score: 4, Insightful

      Yup. Date is the One True Source.

      That said, when I found myself in a similar situation to the OP, here is what I did and found: 1) Read Date. 2) Make some toy databases. 3) Go back to 1 for a few iterations. At the point where I needed to deploy a database at work, I was able to do two things: a) convice my boss to hire a database expert, and b) have a strawman design for a database to go along with a spec when the expert came on board.

      I learned that database experts learn to be very good at diplomacy :) I learned a huge amount from her as she showed me how to work up a better design and show me why it was better. Of course, that is why I hired her.

      After reading Date and making toy databases I had the fundamental concepts, but lacked the practical experience to make good choices in how to normalize the database. How you normalize impacts performance, maintainability, and ease of maintaining data integrity. So bottom line is I found that self-teaching database design got me to something useable, but like many other things in software, the fastest way to learn is to make friends with a smart cube neighbor.

    4. Re:The answer hasn't changed for decades by computerchimp · · Score: 1

      $170! Holy Cow that is an expensive book

    5. Re:The answer hasn't changed for decades by Anonymous Coward · · Score: 1

      It's an upper-division textbook. On the other hand, if you want to know WHY you should probably be using an RDBMS for most of your application problems, this is the book to get.

      I have the 8th edition, and although the book could do without some of his more opinionated footnotes, the entire text from front to back is absolutely rock-solid relational database theory.

    6. Re:The answer hasn't changed for decades by rk · · Score: 1

      If your career will involve databases for any length of time, it will pay for itself many times over.

    7. Re:The answer hasn't changed for decades by plopez · · Score: 1

      And Gray, "Transaction Processing: Concepts and Techniques".

      --
      putting the 'B' in LGBTQ+
    8. Re:The answer hasn't changed for decades by BlackHawk-666 · · Score: 1

      That's the one I learnt from in 1987. Relational databases haven't changed that radically since then and the knowledge is applicable to any implementation of them.

      --
      All those moments will be lost in time, like tears in rain.
    9. Re:The answer hasn't changed for decades by Anonymous Coward · · Score: 0

      Following a reading of Date I would also suggest Applied Mathematics for Database Professionals by Lex de Haan and Toon Koppelaars.

  3. Stanford Introduction to Databases by dejanc · · Score: 5, Informative

    When Stanford first offered free online courses, I took a couple including Intro to DB. It's an online course and it was very informative and I learned a lot through it. I'm not sure when it starts next (and if you can just enroll whenever to see material), but here it is: https://class2go.stanford.edu/db/Winter2013/preview/

    Keep in mind though: this is a full fledged college class, not some sort of YouTube tutorial or anything like that. If you want to follow it properly, be prepared to spend some time a week doing homework and following lectures.

    1. Re:Stanford Introduction to Databases by Anonymous Coward · · Score: 0

      Yup, I was about to mention this (or rather "db-class.org" as it was called then; it now redirects to that longer link). As an introduction it will certainly get you started, if with a stiff workload.

      Importantly, it starts with relational algebra, the theory backing the relational database idea. This, along with exactly what ACID does and provides (and doesn't), is something you need to understand and work with.

      Curiously, plenty of users (and devs, admins, even implementers of *cough* certain databases *cough*) appear to lack that grasp of the foundations. It's not postgresql-specific, but getting your head around it is an easy head-start on a large chunk of the competition, and indeed a good foundation to build from.

      If you'd rather have a book (or in addition) the course has references to a number of suitable database theory books. Once you're through that there's the online documentation, admin cookbooks, performance books, and so on. Plenty of material. But if you want to start at the basics, this is a good starting point.

    2. Re:Stanford Introduction to Databases by Anonymous Coward · · Score: 3, Informative

      The class isn't in session but you can still work through all of the coursework on Coursera. https://www.coursera.org/course/db

    3. Re:Stanford Introduction to Databases by gnikhog · · Score: 2

      Great advice and its free and web-based. I also took this Stanford course when it was first offered in 2011. The professor has added additional course topics since the course was first developed. While it is a self contained course, I strongly suggest you obtain one of the 4 referenced textbooks - any of which can further expand on the topics covered and offer additional underlying theory. A side benefit will of the course (for me) was an exposure to some of the shortcomings/lack of capabilities of some rdbs vs the SQL standard.

  4. Another starter resource by tbg58 · · Score: 4, Informative

    Before you develop any bad habits it would be excellent to get a good handle on how to organize data. _Database Design for Mere Mortals_ by Michael Hernandez is an excellent source for this and you will be able to breeze through it with your programming knowledge. You already know data types, but this book, which does not contain a single line of code, is a good primer on data organization and techniques for making relational databases function efficiently.

    1. Re:Another starter resource by Jawnn · · Score: 2

      An enthusiastic second to this suggestion. This book is the place to start when trying to learn relational database design. Yes, it is far from a complete treatise on the subject, but what it does offer is eminently approachable and immediately useful; something that can not be said of most of the other texts covering the subject. Dear gawd, I wish that more "developers" had read this book before building anything that used a relational database to store and manipulate information. Once you get the normalization part down, all the other tools in the dba's quiver (triggers, stored procedures, functions, RI enforcement, etc.) start to look less like a hammer in search of a nail, and much more like a welcome aid to keeping things sane, regardless of what some application or user may through at your database.

  5. Same as with learning any big subject by Anonymous Coward · · Score: 1

    A little of everything, with emphasis on hands on experimenting. Or a lot of everything, with big emphasis on hands on experimenting.

    You won't learn all you need to know in a week, despite the pitches made by the books and boot camp courses.

  6. There is no One True Way by girlintraining · · Score: 2

    There is no One True Way to learn a language, a piece of technology, etc. It depends on your learning style. One thing a lot of people who come into IT are shocked to discover is the sheer amount of stuff to learn, and the lack of tutorials, classes, etc., that effectively cover it. Many leave for just this reason. The first thing you need to learn in this field is how to teach yourself something, and that means knowing what works best for you. Some people need to write it down. Some people need to hear people talking about it. Some people can just absorb it by osmosis. Some people are global thinkers, others are detail-oriented. Personalities run the gamut in this field, but the one thing everyone who succeeds in this field has in common is that they can learn new information quickly, and on their own.

    A lot of people will suggest books here, and that's fine. It may work well for them, and possibly for you. But you need to know what your own learning style is first, before you go much farther, especially if you're branching out into a new field or subfield. The time spent teaching yourself how to learn, and finding your own learning style, will pay for itself far, far more than any book suggested here -- your whole career will benefit.

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:There is no One True Way by Common+Joe · · Score: 2

      girlintraining is absolutely right. Heed her advice: know thyself. With that said, I'm going to give you what works for me and some gotchas I've found in the field.

      Some people think they need just a specific answer so they learn by googling for everything and never learn outside of a narrow box. Personally, I find they often don't make good programmers or DBAs. I think a disciplined approach to learning a language or database is a must. The PostgreSQL documentation is excellent in that regard. It starts off with a brief history of the database, then gives you basics and then moves on to more advanced topics. Yes, yes... of course you have to use your google-fu for specific answers sometimes, but I think you need a rock-solid core to base your studies off of. Not everyone works this way, but I think we'd have better programmers if they had good material to base a foundation from. When I learn a new language, I hunt down a good, comprehensive core to work from and work my way though.

      Next: Practice. Being a book worm won't get you anywhere in the real world. You have to be able to utilize that knowledge. I like to give myself a task or a goal and work towards completing it. I'm not talking about "make a table" then "add a row". I'm talking about working towards a complex goal. My favorite is the address book. It can be extremely simple or incredibly complex. Take something like names. You can just slap a common name into a field in a table, right? Are you sure? How many digits are in a phone number? I live in Germany and the number of digits vary greatly... sometimes within the same city. Who lives at what address? What if a person lives at multiple addresses? (North in the summer, south in the winter.) What if a person has more than one phone number? What if a phone is shared between people at home? Or a team of people in a work place? Another option is to form links between people: who got married? Divorced? Married again? When? Kids? From which spouse? Don't get me wrong... you can drop all this information into a single table or into thirty. It can as complex or simple as you want. You decide. The most important thing is to have something to practice against as you learn.

      Good luck.

    2. Re:There is no One True Way by Anonymous Coward · · Score: 1

      Regarding the name thing. Everyone has a name if they ever want to be called/referred to by someone else. If you live amongst people eventually someone will give you a name whether you like it or not ;).

      But as globally unique identifiers they are useless. Where possible I prefer to let people choose what name and title they prefer to use in correspondence with them.

  7. Stanford "Introduction to Databases" on Coursera by parbot · · Score: 5, Informative

    On Coursera you can find the Stanford course "Introduction to Databases" by Jennifer Widom. https://www.coursera.org/course/db . It is free and covers a very broad range of database topics.

  8. You haven't told us what you want/need to do. by Nutria · · Score: 2

    Are the databases preexisting, and you'll be "just" querying and modifying (inserts and deletes count as modifications)? In which case SQL is "all" (note the quotes) that you need to know.

    Or do you also need to design systems? Then you need to know UML, data normalization, access strategies and a dozen other things.

    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:You haven't told us what you want/need to do. by nurb432 · · Score: 1

      No matter what you are doing with a DB i would say understanding normalization is the important thing.

      --
      ---- Booth was a patriot ----
    2. Re:You haven't told us what you want/need to do. by K.+S.+Kyosuke · · Score: 1

      Then you need to know UML

      No, he won't. He'll need a brain, though.

      --
      Ezekiel 23:20
    3. Re:You haven't told us what you want/need to do. by Anonymous Coward · · Score: 0

      If s/he's primarily interested in analytics and/or data mining, then normalization is not very important.

    4. Re:You haven't told us what you want/need to do. by idontusenumbers · · Score: 1

      I agree; normalized databases come naturally. You have to work at it to have an non-normalized database.

    5. Re:You haven't told us what you want/need to do. by HornWumpus · · Score: 1

      What? Keeping total on invoice is non-normalized (redundant data, violates 3rd). You supposed to total up line item every time you want invoice total. Fanatics will say lineitem shouldn't even have price, it should just join to pricing history.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    6. Re:You haven't told us what you want/need to do. by Anonymous Coward · · Score: 2, Insightful

      normalized databases come naturally

      Not sure about that. I keep on thinking database design is easy (having worked with them for over ten years). Its just a case of working out if entities have a one to many or many to many relationship usually.

      Then I get asked to "take a look" at some PhD students design, and realise how unnatural it is to some people.

    7. Re:You haven't told us what you want/need to do. by phantomfive · · Score: 1

      This is a true point. If you have a good feel for organizing things in a way that makes sense, then your database will probably make sense. You'll have to learn things like query optimization, and how to add indexes to things, but overall it's a matter of organizational skill.

      On the other hand, some things that seem simple and obvious (version your database, make backups) seem to be like rocket science to some people. I'm not sure why.

      --
      "First they came for the slanderers and i said nothing."
    8. Re: You haven't told us what you want/need to do. by cc1984_ · · Score: 3, Interesting

      Someone once told me "Normalize until it hurts, then denormalize until it works."

    9. Re:You haven't told us what you want/need to do. by nabsltd · · Score: 1

      Fanatics will say lineitem shouldn't even have price, it should just join to pricing history.

      That's basically impossible once you hit the "Amazon" level of item count and algorithmic price adjustment.

      Even in simpler cases, it makes sense to store what you charged that customer at that time for that item. Normalizing that sort of data makes sense, though, so you have an "Invoice" table with a key (like the invoice number) and a "InvoiceLines" table with each line referencing an invoice. That kind of normalization makes it easy to answer questions like "how much did we charge for that item (on average) this month last year and display with groups by purchaser with a totals row?"

    10. Re:You haven't told us what you want/need to do. by plover · · Score: 1

      Understanding normalization would help in creating temporary tables used to hold intermediate results during analysis. If your temp tables lack integrity, you could be creating tables with the wrong rows, leading you to the wrong conclusions.

      --
      John
    11. Re:You haven't told us what you want/need to do. by Hognoxious · · Score: 1

      What should it link to for a manual override entered directly onto the invoice?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    12. Re:You haven't told us what you want/need to do. by Bloke+down+the+pub · · Score: 1

      I've met more than one very good developer who didn't grok database design.

      One subcontracted me to teach his girlfriend who needed it for a course she was doing. Basically, it involved us all going to the pub and talking about movies and music while she scribbled furiously. It's a good teaching example because it involves natural/surrogate keys (how do you deal with movies/bands/artists with the same name, or movies with more than one title?), multiple relationships (an actor might also direct, a musician can also be a producer) etc.

      Another had gone through the induction training in half the normal time, so he'd caught up with my cohort and we were on a DB concepts course together. The DB instructor just couldn't make him understand the difference between "Gone With The Wind" in the abstract and a particular tape of it. In the end I think I convinced him that in the first case it always stars Clark Gable, but in the second we might have two in the shop (of which one is reserved) and the third is out and overdue. At least he shut up and let us get on with the class...

      --
      It's true I tell you, feller at work's next door neighbour read it in the paper.
    13. Re:You haven't told us what you want/need to do. by HornWumpus · · Score: 1

      A discount % is what the fanatics will say. I'm not one of them, but they are everywhere. I agree with the sib, lineitem should record price charged and discount given. Invoice should record price schedule used. By the magic of definitions it's not redundant data.

      This is just to be 3rd NF.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  9. Experience helps by Anonymous Coward · · Score: 0

    While the rules that mostly determine a good design or query from a bad one are much more solid and accepted than in software development.. there is still a good amount of variability, and much like on software development, that gut feeling that tells you what's going to actually work in practice comes from just doing a lot of it.

    Start up a hobby project that's heavily database oriented. Channel your inner OCD and make a program that keeps track of something you care about (and have a desire to keep track of). Could be something cliche (like your media collection) or something completely silly (like your epic collection of ponies stuff).

  10. Set Theory by Anonymous Coward · · Score: 1

    To the extent that set theory is important, I would recommend reading Applied Mathematics for Database Professionals (which I'm reading right now). It very carefully builds the ideas of set theory which seem necessary to work with relational DB's, one step at a time. Some of these same concepts also appear necessary if you ever want to work with graphs/lattices/big data, etc. If you've ever done a truth table, you're already mildly familiar with it.

    From the book ...

    It is a common misconception that the word relational in “the relational model of data” refers to “relationships” (many-to-one, many-to-many, and so on) that can exist between different “entity types,” as you too probably have designed at some point using the pervasive design technique known as entity relationship diagrams (ERDs). The word relational in “the relational model of data” has nothing to do with the R of ERD. The word refers to the mathematical concept of a (n-ary) relation, which is totally different from the “relationship” concept in ERD. The former is a mathematical—set-theory—concept; the latter eventually maps to certain cases of data integrity constraints (loosely speaking).

    1. Re:Set Theory by plopez · · Score: 1

      But remember that SQL queries are bag operations rather than set operations. You have to hack the scripts to ensure set operations.

      --
      putting the 'B' in LGBTQ+
  11. It's not that complicated by Animats · · Score: 4, Informative

    It's not that complicated.

    In the SQL world, data is stored in "tables". Each table consists of "rows" of "records". Each record has "fields". Each field has a "field name" and a fixed "type", like TEXT, INTEGER, or DATE. Tables are created with the CREATE statement, where all the field names and types have to be specified. So that's what SQL data looks like. That part is fairly simple.

    Tables start out empty. Data is added to a table using the INSERT statement. Existing records can be changed with the UPDATE statement. The SELECT statement is used for searching.

    What makes SQL useful is that searching is very powerful. One SELECT statement can look things up in more than one table, find matching items, sort, summarize, and extract specific fields. The key to understanding SQL is learning what SELECT can do. On the other hand, if all you need to do is find one row of a table based on one key, the SELECT statement for that is very simple.

    Tables have "indexes". If you use a SELECT statement with a search request for which there is no helpful index, the entire table will be linearly searched. This is slow. So you specify which fields need an INDEX to speed things up. This is usually done when the table is created with CREATE, but it can be done later. When looking things up with SELECT, you usually don't have to mention indexes; which index to use and how to use it is figured out by the database system.

    SQL databases scale up well. Gigabyte-sized tables are normal. Terabyte-sized tables are not unusual. You can have many queries and updates running on the same table at the same time. The database system handles all the locking for you. Some database systems can be run on clusters of machines, and some support multiple redundant copies. You can do a lot of things while a database is running that you wouldn't think of as being possible. You can add a new index, or even a new field, to an existing table while the database is in use. There's a lot of heavy machinery behind the scenes to make all this work.

    All the major databases try hard to maintain data integrity. A machine crash and restart will not damage any serious modern database. Program crashes are handled, too. A group of SELECT, INSERT, and UPDATE statements can be blocked together as a "transaction". The database doesn't change until a COMMIT statement is executed, and then all the changes take effect at the same time. If something goes wrong, like the program crashing or even the machine crashing before the COMMIT, the database is unchanged. If your program detects an error and needs to abort the transaction in progress, it does a ROLLBACK and the database is as it was before the transaction started. There's a lot of heavy machinery behind the scenes to make all this work.

    There are security features. Access to tables can be restricted, in some cases down to the field level. Databases have user accounts, which are not necessarily tied to operating system login accounts. You can have accounts which can only read some tables, not update or delete them, or accounts which can't see some fields of some tables. This is valuable in web applications.

    Database programs have libraries which allow them to be called from various programming languages. Programs in different programming languages can talk to the same database at the same time. So you're not locked to a specific programming language.

    Those are the basics. Go install some SQL database on your desktop machine and play with it. MySQL, MariaDB, and Postgres are all free and will work on Linux or Windows desktops.

    1. Re:It's not that complicated by BenJaminus · · Score: 1

      That and normalisation based on keys to design out duplication of data and remove years of developer pain!

    2. Re:It's not that complicated by sphealey · · Score: 1

      - - - - - Tables have "indexes". If you use a SELECT statement with a search request for which there is no helpful index, the entire table will be linearly searched. This is slow. - - - - -

      Some tables have indexes. Some don't. Sometimes it is faster to access rows via an index, and sometimes a full table scan is faster (and in the cases where a full table scan is faster, it will typically be 100x faster than index-to-table access).

      I agree with you that getting started is not that hard (you can also download Oracle for free for training purposes as well), and that doing things from an early stage is key. However, a good solid understanding of the fundamentals is critical and it is not anywhere near as simple as your post would indicate. Very few powerful tools are simple, once you get past the "neat-o!" stage.

      sPh

    3. Re:It's not that complicated by computerchimp · · Score: 2

      Actually. It is complicated. You don't know enough to know that and that makes you dangerous if you were a DBA.
      Schemas, OLAP, speed, how to write queries so you actually get the results you were expecting and not some subset.......

      Anyone that voted you up either did not think about it or is equally as dangerous.

      computerChimp

    4. Re:It's not that complicated by DRichardHipp · · Score: 2

      Go install some SQL database on your desktop machine and play with it. MySQL, MariaDB, and Postgres are all free and will work on Linux or Windows desktops.

      Or, experiment with SQLite. You can download a self-contained standalone precompiled binary that you run as an ordinary command-line program. (ex: "sqlite3 mynewdatabase.db") In fact, sqlite3 is already installed by default on your Mac and probably also on your Linux desktop, so you might not need to install anything at all. There are no servers to set up and maintain and no access permissions and user accounts and passwords to configure. And the database you create is just an ordinary disk file that can delete once you finish experimenting.

      All of the databases on your Android and iPhone are SQLite databases, so if you want to look at some real-world data, just upload them and look at them using the sqlite3 command-line tool. You might find other SQLite databases to look at already on your workstation from programs like Firefox, Skype, iTunes, Dropbox, etc.

      MySQL, MariaDB, and PosgreSQL are all fine products. But if all you want to do is experiment with the SQL language, they are way, way more complication than you need.

    5. Re:It's not that complicated by gnikhog · · Score: 1

      Actually, it is more complicated to do it well. Creating a good database begins with the design, not a SQL statement. If terms like E-R model, normalization, decomposition, * normal form, etc are not familiar terms - they should be (or at least their concepts). If you just want to kick the tires and write some code, that's great. Just don't confuse a proficiency in SQL programming with proficiency in database design.

    6. Re:It's not that complicated by plopez · · Score: 1

      Denis is that you?

      --
      putting the 'B' in LGBTQ+
    7. Re:It's not that complicated by computerchimp · · Score: 1

      ha ha no. I suspect that Denis is a wise man.

  12. A breif intro by plopez · · Score: 4, Informative

    1) SQL is not a relational database, it is an interface to a relational database management systems, e.g. Postgresql. The "NoSQL" crowd lost me in the first 10 minutes when they showed me they did not know the difference. From that point forward I had a tough time taking them seriously.

    2) Date is good but a bit hard to slog through sometimes. Realize that RDBMs are based on actual Math theory. But you don't have to derive the theory so don't be afraid.

    3) Normalization is import. Honestly, people talk about the "Object/Relational impedance" and I have never seen it. I have found that if you define your objects properly up front you get your DB normalization almost for free. And if you understand your data properly and do a good job at normalization you get your objects almost for free.

    4) Know your database engine. RTM and try various scenarios. Have fun but only on a test instance on a test machine.

    5) Know your hardware/VM system. I have found many people blame the DB engine for poor performance when poor hardware configuration is the fault. Learn how to profile.

    6) Learn how to profile software as well. Everybody blames the DB engine when performance is poor when most of the time it is their crappy code.

    7) Some best practices: http://c2.com/cgi/wiki?DatabaseBestPractices

    --
    putting the 'B' in LGBTQ+
    1. Re:A breif intro by angel'o'sphere · · Score: 1

      1) SQL is not a relational database, it is an interface to a relational database management systems, e.g. Postgresql. The "NoSQL" crowd lost me in the first 10 minutes when they showed me they did not know the difference. From that point forward I had a tough time taking them seriously.
      If you want to nitpick then do it right. SQL is not an interface. It is a query language: Standard Query Language, well the standard also contains DDL ...

      I guess you simply are not smart enough to grasp what NoSQL is about or? Otherwise you would not "ditch" the "crowd" (what ever that should mean, the "NoSQL crowd")

      The rest of your comment makes sense, though. Albeit: normalization is mainly used for space reduction, in our days that is far less important than at the time when it was "invented".

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    2. Re:A breif intro by Anonymous Coward · · Score: 0

      Good job proving his point.

    3. Re:A breif intro by nabsltd · · Score: 2

      SQL is not an interface. It is a query language: Standard Query Language, well the standard also contains DDL

      Structured Query Language, actually.

      If you want to nitpick then do it right.

      Words to live by.

    4. Re:A breif intro by plopez · · Score: 3, Informative

      normalization is mainly used for space reduction, in our days that is far less important than at the time when it was "invented".

      Wrong again. Normalization is about data consistency, space saving is secondary (and given the overhead of keys, constraints, indices etc. possibly not true). But the NoSQL crowd also lost me on this point as well. "Eventual consistency?" What does that mean? How about "you will eventually get paid" or "your medical records will eventually show your drug allergies" or "your credit rating will be eventually correct".

      But you're right, I was in a hurry I should have said: "SQL is a scripting language used to access an API of a RDBMS".

      --
      putting the 'B' in LGBTQ+
    5. Re:A breif intro by rk · · Score: 1

      Normalization has almost nothing to do with space reduction. Normalization is there to help ensure correctness of your information.

    6. Re:A breif intro by Anonymous Coward · · Score: 0

      NoSql is a glossy web-scalable word for key:datum storage, with better glue and strings around all the pieces providing a better interface to it than doing it all yourself.
      But since it is web scalable, I know I'm ready to just punt and go shovel cow shit or shear sheep instead if it gets too prevalent.

    7. Re:A breif intro by angel'o'sphere · · Score: 1

      "Eventual consistency?" means that in the end the data will be consistent :D pretty simple.

      However it is not ment in the way we know it from traditional databases as we are not talking about schema consistency here.

      Consistency in huge data means, if you query, regardless which part of the cluster answers your query the answer will be the same.

      The background is: lets say you have 100 nodes. And you write data to your cluster. You define in the write "transaction" you accept it as written if 10 nodes have performed the write.

      Now you read data from the cluster, aiming for the same one you just wrote with a query like this: "retrieve last 10 posts by 'angel-o-sphere' ". Depending on which node the load balancer selects to work on the query you may get my latest post or not. Because the first 10 nodes on the example above might not have distributed their data to the node answering.
      However they work on that. So if you do the same query in 10 mins, the cluster might be consistent. Which mean all queries regarding this topic will yield the same answer.

      Perhaps you should really work on your attitude :D most stuff which is done our days with NoSQL DBs is simply impossible to do with relational DBs. And on top of that: here comes my point about attitude: one can be a Cassandra expert and deal with big data without any trouble whatsoever with out the need to be an expert in ACID/SQL, relational DBs or the history of DBs.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    8. Re:A breif intro by angel'o'sphere · · Score: 1

      True :D it is meanwhile "Structured" ... I think when it was first introduced it was still named "Standard" and I keep mixing this up, lucky this was not an exame ;D

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    9. Re:A breif intro by plopez · · Score: 1

      How do you know you are accessing the correct 10 nodes?

      --
      putting the 'B' in LGBTQ+
    10. Re:A breif intro by angel'o'sphere · · Score: 1

      You don't know. That is the point. You accept that for a while your data is "inconsistent".

      E.g. you look at my facebook site and your friend does at the same time. For quite a while, perhaps half an hour, your views might be different, one of you missing my latest status. But who cares? Tomorrow you will both see the same data.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  13. A few free resources by mws · · Score: 2

    Practical PostgreSQL: http://www.commandprompt.com/ppbook/
    Learn SQL The Hard Way: http://sql.learncodethehardway.org/
    Use The Index, Luke!: A Guide To SQL Database Performance: http://use-the-index-luke.com/ (my own site)
    Source: http://stackoverflow.com/questions/194812/list-of-freely-available-programming-books

    1. Re:A few free resources by greg1104 · · Score: 1

      "Practical PostgreSQL" is from 2002. There are so many things in PostgreSQL that have changed since then, you'd be better off reading a generic database introduction instead of that one now.

  14. I don't know if it is possible by sphealey · · Score: 1

    I'm not sure it is possible, today. From about 1995-2005 forward a large amount of database work, and development of applications against databases, was done by people who didn't really have a good grasp of the fundamentals of the concept, the history of its implementation, or a very thorough knowledge of the available technology. The result is that various self-taught paradigms became embedded in an entire generation of developers, and those paradigms have now been picked up and embedded in technology by vendors. As a result it is very difficult to get any training or mentoring, or even find any books published after 1995, that go through the fundamentals and provide solid examples of the various paradigms and patterns and the strengths/weaknesses of each. IMHO that's in large part because the older patterns were stronger but more difficult to understand and use, so they got pushed aside by the self-taught patterns which are now dominate.

    As a related example see this thread - I would personally say that Postgres and Oracle RDBMS each have their strengths and weaknesses, and each have places where they should be used and should not be used, but the rabid anti-Oracle posters dominate the discussion and shout down any reasoned discussion. Similarly with the "business logic in the RDBMS" paradigm - in my experience over 25 years that often works far better than the "tons of Java middleware" approach, but while each has its advantages the proponents of the latter generally shout down the former. So you only get one viewpoint, and that not always the strongest (again IMHO).

    The state of business software development trends leaves me a big depressed, actually.

    Date is of course very good in the earlier editions if you can understand his math. Later on he went down his own non-traditional path that is more than a bit off the mainstream, and of course there are many very capable database people who don't use or claim to be experts at relational calculus or symbolic logic (was a bit shocked to find out Tom Kyte was in that camp, although he interacts quite a bit with Cary Milsap who is on the other extreme). Kyte's books are very good although practical and not theory, and of course 99% Oracle. _Tales of the Oak Table_ is a good read with many cautionary tales and although written by Oracle professionals the case studies apply to any information technology.

    sPh

  15. The best way is always the hard way by sweet · · Score: 1

    http://sql.learncodethehardway.org/book/

  16. dBase? by Kozar_The_Malignant · · Score: 1

    I read the headline and thought,"I still have lots of dBase III books in a box somewhere.

    --
    Some mornings it's hardly worth chewing through the restraints to get out of bed.
    1. Re:dBase? by HornWumpus · · Score: 1

      Better db engine then MySQL. Very similar issues. Kids have to learn on their own.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  17. LDAP? by Blaskowicz · · Score: 4, Funny

    Here's a stupid question.
    Why not put all data into the ldap, next to all the login information etc.
    Then you can learn to be a bad ass sysadmin who allows you to login from everywhere AND learn a database at the same time! Many apps like mail clients, server daemons can integrate with ldap! You can do cloud computing : sync your phone contacts to it.
    If you're working in a company, tell the boss you now only need the Windows AD domain controller. It's awesome consolidation and cost savings. Also, it's a mature, market leading NoSQL implementation.

    1. Re:LDAP? by sphealey · · Score: 1

      I hope someone has some Funny points left to give your post.

      Then again, given what I've seen over the years I'm sure this has been done... More than once.

      sPh

    2. Re:LDAP? by Anonymous Coward · · Score: 1

      Not so much a stupid question as a criminally insane suggestion. Much like how LDAP retains its horror-inducing properties while actually proving true to its "lightweight" claim... compared to X.500.

      There really ought to be a middle way between LDAP and NIS. Something combining the simplicity of the latter with the "one question, one answer" architecture of the former.

      Anyway, this has exactly nothing to do with learning about postgresql, SQL or relational databases in general.

    3. Re:LDAP? by hardluck86 · · Score: 2

      I've always thought LDAP was great to work with for the type of people who also like to do their own dental work.

    4. Re:LDAP? by plover · · Score: 1

      One of the better Dilbert cartoons about the boss wanting to build an inventory system in email comes to mind: http://dilbert.com/strips/comic/1995-12-31/

      --
      John
  18. best answer.. by houbou · · Score: 1

    google.

  19. Set Theory by the+eric+conspiracy · · Score: 2

    The most useful way I have found of thinking in a relatively simple yet robust mathematical way about SQL and relational databases is in terms of set theory.

    For example a standard join can be considered as finding the intersection of two sets.

    This this level of abstraction should be usable by someone with your level of mathematical training.

  20. Learn some terminology by Anonymous Coward · · Score: 0

    Learn a bit about how databases are constructed, and you will learn why they do things they do. I nominally use mysql, but postgresql is ok too, I've also studied oracle (and used it professionally). Knowing how row locking works, what ACID means, First Normal Form, Second Normal Form, Third Normal Form, Backus-Naur Normal Form, etc. will help you (and it doesn't matter what the database is). True databases also need network connections and multiple users (in that regard, microsoft access was never a true database as there is only ever one person accessing it, so you don't need row locking, you don't need to worry about rollbacks or contention, etc). Again, all this is independent of the database. So buy a book (even a dummies book is a good start), and move on to more advanced topics from there. Another good one is "The Manga Guide to Databases" if you don't mind two Japanese Anime characters walking you through database design (and most of the theory I've yakked about so far), 213 pages to get you on your way.

    1. Re:Learn some terminology by sphealey · · Score: 1

      Isolation level and transaction are key concepts as well, ones that many people working with databases miss entirely.

      sPh

    2. Re:Learn some terminology by Codifex+Maximus · · Score: 1

      You are correct. Access in it's simplest form is NOT a true database in that it is not a first class server of data per se. It can be used in a multi-user fashion though. You have multiple users running Access and a client program which is linked to a common Access database file on the network. All the intelligence except for data validation is implemented in the client. Back in the old days (the 90's), Access did not have row-locking but locked chunks of the database. Concurrency was a major issue and required code to bulldoze it's way into a record pessimistic lock so it could store data without stomping on someone else's update. The trick was to pad out the chuck to be one record in size. Wasteful of space but good for performance. These days, the ability to lock an actual row means less space wasted in the database - potentially saving gigabytes in a table with millions of records.

      Access is more a Swiss Army Knife type of user database tool than a mission critical system. I wouldn't dream of using it for anything but data analysis of smaller sets of data, reporting and prototyping of client frontsides.

      --
      Codifex Maximus ~ In search of... a shorter sig.
  21. write a database by Anonymous Coward · · Score: 0

    then you'll be just as useless, but instead of saying
        'oh my, i don't know whats going on'
    you can say
        'what a crappy useless design'

  22. Re:Stanford "Introduction to Databases" on Courser by Anonymous Coward · · Score: 1

    ANnd Jennifer Widom is the wife of Michael Stonebraker, who is a hard core database dude and also happened to be the one who started the Ingres project, on which PostgreSQL is based.

  23. CHris Date by WOOFYGOOFY · · Score: 2

    Chris date has the clearest understanding of databases out there and while most of what he's written is not vendor or even SQL specific (although some of it is too..) his thoroughness, precision and ability to go meta on every aspect of databases is without parallel. Sure, it's not "how to use postgres" but studying him will pay off many many times over during your career when you're "stuck" because a lot of being 'stuck" in this field is not understanding the actual relational model at the conceptual level and when I say "actual relational model" I don't mean SQL, which is one just so-so implementation of the relational model. It's the places where SQL departs from the relational model and the practices that have developed in the community to compensate for that departure - without people even being aware that that's what they're doing - which create the worst and hardest problems.

  24. The Manga Guide to Databases by Call+Me+Black+Cloud · · Score: 1


    The Manga Guide to Databases is the best introduction to (relational) databases I've come across. It provides an entertaining and thorough overview of database terminology and concepts. I've used this book for years with junior engineers and interns who have all loved it, and the DBAs I've loaned it to were impressed by how much ground the book covers.

  25. Best book: database design for mere mortals by Anonymous Coward · · Score: 0

    Excellent book that covers all the bases and leaves you well-prepared to build solid data models. From the theory (normal forms) up to the practical (when and why to index). 42 thumbs up.

    http://www.amazon.com/Database-Design-Mere-Mortals-Hands-/dp/0321884493/ref=sr_1_1?s=books&ie=UTF8&qid=1373749472&sr=1-1&keywords=database+design+for+mere+mortals

  26. Learn how to organize data! by FlyingGuy · · Score: 1

    Data is nothing but a collection of information. The information itself is unimportant. What is important is how that information will be used. The way information is used is the predicate for its organization.

    After you grok that then you can start to learn how electronic systems store and retrieve information.

    One of the things that has never let me down is the fundamental understanding of how computers work. Remember that no matter what whiz bang shite you read about, computers do exactly two things: They loop and they branch nothing more nothing less.

    Once you have that in your head, then you can begin to think about how the results of your analysis of the uses of the data can best be organized to make the job of the machine smaller and faster.

    None of this is intuitive. It is all based on understanding gained from reading books of theory and the practical application of those theories and by trial and error. Every data use case will present its own challenges, but if you want to do serious database work never ever let a coder dictate the data schema. Never let them short cut and make your data organization match the organization of the data structures in their code because 99 times out of 99 times they have organized their data structures to suite their coding style and 99 times out of 99 times it will be completely hair brained and not have any relevance to the best way to store and retrieve some data from the mountain of data you need to store.

    --
    Hey KID! Yeah you, get the fuck off my lawn!
  27. WRONG! Also Transactions! by Anonymous Coward · · Score: 0

    A Lot of comments here about SQL and the obvious RTFM and some books answer to this question, but not one response has yet mentioned the importance of transactions (with commits and rollback points) and some other stuff like row locking / table locking and their use in avoiding nasties like phantom reads, etc.

    Incidentally as a long term Oracle / DB2 user I find PostgreSQL an excellent database for non big enterprise level applications, and with some of the clustering / replication features that have appeared over the last decade or so, even that argument may be becoming a little old hat.

  28. Read Yoshinori by ph43thon · · Score: 2

    Read Yoshinori Matsunobu's blog:

    http://yoshinorimatsunobu.blogspot.com/

    At least, read his first post and view the slide deck:

    http://yoshinorimatsunobu.blogspot.com/2009/04/mastering-art-of-indexing-slides.html

  29. Re:A brief intro by Gimric · · Score: 1

    Normalization is good because otherwise when one of your users gets married and wants to change her surname you realise that every entry in every table that relates to the user has their surname recorded and you have to update millions of rows in twenty different tables and none of it is documented and if you miss updating it any of the tables you might break something important. In a normalised database you change it one place and you are done.

  30. Re:WRONG! Also Transactions! by Anonymous Coward · · Score: 0

    When I heard that PostgreSQL 9.2 (released last September) was touting its new feature of index only scans, I thought, wow these guys are way behind. Oracle and IBM had that in the mid-'90s and Sybase and Microsoft probably did too. To use a car analogy, that's like hearing an upstart manufacturer announce that electronic fuel injection will now be standard on all of its new cars, replacing carburetors. You think, gee what else are they missing?

  31. Re:A brief intro by rk · · Score: 1

    You know this. I know this. I'm not confident the GP knows this. :-)

  32. Database efficiency considerations by Codifex+Maximus · · Score: 1

    Let's see - proper RDBMS considerations:
    Proper normalization of data
    Efficient keying for relating data. Compound keys can be difficult to work with.
    Use of simplest possible recordset type for working with data.
    Understanding of boolean logic and sets. Needful for creating efficient SQL queries.
    Understanding of a particular RDBMS's optimization techniques/order of operations.
    Data validation to prevent GIGO.
    Record locking schemes to prevent inconsistent updates - transactions should be used to minimize impact.
    Learn those things which your chosen RDBMS does NOT do well and find a way to optimize or work around them.

    --
    Codifex Maximus ~ In search of... a shorter sig.
  33. Fucking Javascript-only site by Anonymous Coward · · Score: 1

    Subject says all. Not for me.

  34. What do you mean by learning DB? by Hognoxious · · Score: 1

    Are you trying to learn the concepts (normalization, cardinality, keys, constraints etc) or the hands-on specifics for a specific DBMS?

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  35. Codd & Date by Anonymous Coward · · Score: 0

    In the beginning there was only Codd, and then came the Date...