SQL is an imperfect attempt at implementing (and demotic progression of) relational algebra. A table *is* a relation for relational math purposes. Relations can also be synthetic (result of various operations on one or more relations) and so forth. Also while some aspects of SQL allow you to break the relational model, SQL itself also implements everything you need to follow it.
Do you need to know relational algebra to design simple database queries? No. Do you need to understand relational theory to design robust databases? Absolutely. While SQL is not perfect in its attempts at implementing relational math,
Because ORM's suck and generally promote bad db design.
You may find my posts about SODA interesting. It is a different concept to fill the same need but allows you to build *good* relational databases without the complexity of maintaining a lot of SQL code in your object classes.
My grandmother knew all about microwaves
on
Head First SQL
·
· Score: 1
And x-rays, and radio waves, and element formation in stars. Her name was G. R. Caughlin and she worked with nobel laureates in areas relating to astrophysics.
And she could probably program Fortran better than you could:-)
Re:Don't get in over your head...
on
Head First SQL
·
· Score: 1
You might find my latest journal entry interesting. I have started building databases with an SOA layer built into the db so that programmers can write integration code once and not have to worry about relational layout (i.e. we map procedure to procedure rather than data structure to data structure).
I call this approach Service Oriented Database Architecture or SODA.
Why are you starting a transaction for a single update statement? Normally transactions are required for grouping multiple insert/update/deletes so that they succeed/fail as a unit or for grouping SELECT... FOR UPDATE; with the related update so you know your information is current.
Also, I would *highly* suggest getting a good book on db design. Chris Date's "An Introduction to Database Systems" may be too heavy for you, so I would start with some reading on relational algebra and the mathematical (rather than dictionary) definitions of the normal forms.
If you don't understand normalization, you can go near a database. You don't really need to understand the principle to use DML effectively (insert, select, update, delete). I would even suggest that things like VIEWS could be safely created by someone who doesn't understand good relational design provided that you know enough to know what a candidate key is and why they are important.
However, if you don't understand normalization, you shouldn't go anywhere near database design in terms of tables and so forth.
1NF only has two conditions and only one possibly causes issues in RDBMS design. The only two reqirements are that you have a fixed number of tuple elements *and* that you have candidate keys. (I generally suggest that people consider an additional requirement which states that every tuple element is semantically atomic but technically that is not part of 1NF.)
When you dont have candidate keys, you lose the ability to manage your data because you cannot refer to any row uniquely and this undermines any basic attempt at data management. This is not like a spread sheet (at least there you have row numbers).
LedgerSMB 1.2 has every table meeting at least 1NF, and eventually, I expect we will get everything into good shape in the next few major releases.
If the goal is to make people competent consumers of database applications, and users of predesigned databases, the book cannot do any harm and will certainly do some good.
"I read one book and now I can design databases for paying customers" is a very different issue.
Re:Make sure you study ALL the normal forms
on
Head First SQL
·
· Score: 1
I find 4NF to be largely useless for actual work but it is extremely important for understanding normalization.
I would argue that the 2 most important normal forms are BCNF and 5NF. Both of them are above 3NF.
Re:Don't get in over your head...
on
Head First SQL
·
· Score: 1
That is a fair point. So it requires a different way of looking at db design so that db experts can design databases that programmers can use. The current generation of solutions (including ORMs) tend to give a lot of the worst elements of both worlds.
My solution for OOP integration is to design databases on two levels: good, semantic and mathematical relational design with appropriate triggers, etc. on the bottom level, and discoverable stored procedures on the top level. "Discoverable" means that the stored procedures are stored and cataloged in such a way that applications can either discover appropriate stored procedures themselves or, once they know a name, provide a semantic and consistent matching to their own internal data structures without additional programmer interference. This is sort of like taking the ideas behind SOAP and applying them to stored procs. I call this approach Service Oriented Database Architecture, or SODA. SODA can be done fairly easily in PostgreSQL by using consistent naming conventions of both stored procedures and arguments. This provides a lot of other benefits to the organization using the software as well, including the fact that business logic is applied consistently across applications even if those applications are written in different languages. (SAP does something similar with web services in the middleware layer, but I am talking about putting the SOA piece in the db itself.)
Re:Don't get in over your head...
on
Head First SQL
·
· Score: 1
A better question is why one would design a database which arbitrarily excludes the possibility you mention.
Re:You've not done real database work
on
Head First SQL
·
· Score: 1
Light-weight CMS is not "real database work" regardless of traffic volume. Heavy-weight CMS could be (i.e. tracking legally mandated approval processes for engineering designs for, say, a hydroelectric dam) but it is hardly the same thing.
For "real database work" you need at least moderate volume and at least a moderate degree of semantic complexity. Lightweight CMS does not provide the latter.
Real database work is building an accounting database for a 150+ employee organization.
My business is based on the premise of helping people and businesses use FOSS. Why is that disgusting?
Note that all my code is released FOSS, and I actively promote projects where my business is not the only vendor.
I do find the principle of one entity controlling development for financial gain to be distasteful. Hence the protection is to ensure that one has a broader core team which represents a more diverse set of interests.
difficult at best. I am involved in the LedgerSMB project and in fact I helped to found the fork (we split off from SQL-Ledger because the author was not interested in fixing security holes). Since then we have found that the codebase is ACID compliant only in the Timothy Leary sense (i.e. the code reminds you of code one might write while on LSD)....
LedgerSMB is a rare fork because we are still around 1+ years later. This occurred only because the community was, as a whole, dissatisfied with the way SQL-Ledger was founded and we made sure we had enough resources to sustain development before we started. Now a year later we are finally taking it to the next level, have a new architecture/framework in place and are starting to move the code over to it.
Nearly all forks fail and most of those that succeed do so by being absorbed back into the main project. We actually encourage collaborative forking in LedgerSMB where the idea is to build different solutions on a common framework. THe only way a fork succeeds is if the core developer(s) don't meet the needs of the community and if a fork comes along that can (a number of dead forks of SQL-Ledger also exist). If you do this, why worry about forks?
IANAA either, but I have to know how these things are supposed to be done. Hence I have discussed this issue with CPA's in order to ensure I understand how adjustments to closed periods are supposed to work.
You have a point that one should be aware that people sometimes do things wrong and guard against that (for example, by using custom triggers/check constraints to ensure that inserts don't occur in closed periods, nor do updates/deletes *ever* occur outside of a few specific issues such as voucher approval/deletion).
However, here is how the correction process is supposed to work: The correction is supposed to be posted to the first open date in the books. Note that this does not affect income tax requirements (and you could have to go back and restate earnings) but for the purposes of financial accounting, unless a systematic error is made, adjustments are made to an open period. If a systematic error is made, it is going to be invasive to fix anyway so having to go in and rebuild the summary data when you are done is not a bad thing.
There is one other case that causes a problem. Suppose you have an open voucher (i.e. recorded transaction which has not been fully posted to the books) which remains open for several years (I have seen this happen). In these cases, one would also need a flag to show that a transaction had been used in the summary. Such vouchers pose other issues for financial accounting and I think that this is a bad practice but some businesses do have strange requirements of this sort.
So it is more complex that one would think, but it is still doable.
Re:Don't get in over your head...
on
Head First SQL
·
· Score: 1
I would think that inserts, updates, and deletes would qualify as the other queries one prejudices against.
In general, when I have to do this, I tend to use a hybrid approach (and only where inserts/updates/deletes on a set of records is forbidden) and that is to essentially store what I call "checkpoint summaries." For example "last time you closed your books, your AR account had a balance of $53,123.31" Then reports can start at the checkpoint summaries and build forward. Again, this is only possible in a subset of applications (mostly those where business rules are well established and forbid inserting data into the range specified). Even there, the complexity can be a problem, but it is a lot better than dealing with live summaries.
Re:I really hate these kind of books
on
Head First SQL
·
· Score: 1
Most of us who survived the 60s understand this. Timothy Leary taught us well. And a few of us have experience cleaning up db's which are ACID-compliant in this way.....
That is OK.... When we forked LedgerSMB, maybe half of the tables were in 3NF, and several tables weren't even in 1NF (as a few didn;t even have candidate keys).
Re:You've not done real database work
on
Head First SQL
·
· Score: 1
BTW, the one original bug I reported in PostgreSQL was back in 7.x, and involved ways of building tables so that they were write-only (i.e. the db server couldn't read them). Had to do with handling tuples as attributes in tables.
Re:You've not done real database work
on
Head First SQL
·
· Score: 1
You don't do real database work on MySQL. You haven't done real database work until you report an original bug in PostgreSQL.
Re:Depends on what the Db Is For
on
Head First SQL
·
· Score: 1
Maybe but not always. It is true that people often denormalize OLAP systems because they are trying to get specific answers out fast so summary tables/materialized views make sense. Ask a new question nobody has built summary tables for and expect to wait....
However, this is not always the case. If you look at a lot of OLAP setups on BizgressMPP or Terradata, there isn;t a lot of denormalization necessary because you can address the issues via parallelism rather than summary storage. This is important because it means your OLAP solution is able to answer a far wider range of questions quickly.
Re:I really hate these kind of books
on
Head First SQL
·
· Score: 1
I think that one of the problems is that you can't approach database design like programming in terms of logical structures. This is especially true when trying to do OO programming against RDBMS's. Hence if you don;t teach people theory, they build bad databases because they approach relations like classes (which they clearly are *not*).
Yes, theory is extremely important. If you don't have good theory both in the DML and DDL side, you are going to run into problems. But these are not extremely difficult concepts-- they are just not taught.
I used to provide tech support for MS Access at Microsoft. I could tell you some stories....
Re:Not the most imformative review
on
Head First SQL
·
· Score: 1
Take a look at the SQL-Ledger schema sometimes.
No normalization (some tables are not 1NF). No RI enforcement. No NOT NULL enforcement in critical areas (chart_id is NULL. Where did the money go? UNKNOWN!) Ambiguous foreign keys (as in a foreign key that references any of a number of possible tables) Dangerous abuse of data types (double precision floats for storing money) Using custom triggers where ON DELETE/ON UPDATE events and foreign keys would be better
I saw that in the ToC. However, I saw no reason to assume that there was actually good information about normalization in the book.
For example, what *is* third normal form? How is it defined? You can't reasonably approach this subject without assuming that you can discuss at least basic algebra (and preferably relational algebra) with your readers. I hence got the impression that this was the non-mathematic approach to normalization which doesn't work (it is like non-math-applied physics).
Yes, it advocates normalization (good) but I have yet to see many good beginners books that cover normalization well enough to make it understandable.
Re:Don't get in over your head...
on
Head First SQL
·
· Score: 1
M-M in customers/addresses can make a lot of sense in many cases. In general, I would M-M them unless I had good reason not to do so. However, I have also seen people do this sort of thing badly and hence one has to understand that this is is something which is more difficult than just duplicating the information (actually I would probably decompose addresses into several other tables-- city, country, etc but that is another matter). Also it usually isn;t as simple as a M-M mapping. Instead you have a customer, which maps to an address along with a bunch of metadata.
This isnt about saving space. It is about ensuring that as much data as possible is centrally managed so we know exactly what it means at any given time. And it also maximizes your flexibility later on. For example you can ask which customers have shipped orders to a specific address much more quickly and robustly than you an if the address information is duplicated.
SQL is an imperfect attempt at implementing (and demotic progression of) relational algebra. A table *is* a relation for relational math purposes. Relations can also be synthetic (result of various operations on one or more relations) and so forth. Also while some aspects of SQL allow you to break the relational model, SQL itself also implements everything you need to follow it.
Do you need to know relational algebra to design simple database queries? No. Do you need to understand relational theory to design robust databases? Absolutely. While SQL is not perfect in its attempts at implementing relational math,
Because ORM's suck and generally promote bad db design.
You may find my posts about SODA interesting. It is a different concept to fill the same need but allows you to build *good* relational databases without the complexity of maintaining a lot of SQL code in your object classes.
And x-rays, and radio waves, and element formation in stars. Her name was G. R. Caughlin and she worked with nobel laureates in areas relating to astrophysics.
:-)
And she could probably program Fortran better than you could
You might find my latest journal entry interesting. I have started building databases with an SOA layer built into the db so that programmers can write integration code once and not have to worry about relational layout (i.e. we map procedure to procedure rather than data structure to data structure).
I call this approach Service Oriented Database Architecture or SODA.
Why are you starting a transaction for a single update statement? Normally transactions are required for grouping multiple insert/update/deletes so that they succeed/fail as a unit or for grouping SELECT ... FOR UPDATE; with the related update so you know your information is current.
Also, I would *highly* suggest getting a good book on db design. Chris Date's "An Introduction to Database Systems" may be too heavy for you, so I would start with some reading on relational algebra and the mathematical (rather than dictionary) definitions of the normal forms.
I disagree.
If you don't understand normalization, you can go near a database. You don't really need to understand the principle to use DML effectively (insert, select, update, delete). I would even suggest that things like VIEWS could be safely created by someone who doesn't understand good relational design provided that you know enough to know what a candidate key is and why they are important.
However, if you don't understand normalization, you shouldn't go anywhere near database design in terms of tables and so forth.
1NF only has two conditions and only one possibly causes issues in RDBMS design. The only two reqirements are that you have a fixed number of tuple elements *and* that you have candidate keys. (I generally suggest that people consider an additional requirement which states that every tuple element is semantically atomic but technically that is not part of 1NF.)
When you dont have candidate keys, you lose the ability to manage your data because you cannot refer to any row uniquely and this undermines any basic attempt at data management. This is not like a spread sheet (at least there you have row numbers).
LedgerSMB 1.2 has every table meeting at least 1NF, and eventually, I expect we will get everything into good shape in the next few major releases.
If the goal is to make people competent consumers of database applications, and users of predesigned databases, the book cannot do any harm and will certainly do some good.
"I read one book and now I can design databases for paying customers" is a very different issue.
I find 4NF to be largely useless for actual work but it is extremely important for understanding normalization.
I would argue that the 2 most important normal forms are BCNF and 5NF. Both of them are above 3NF.
That is a fair point. So it requires a different way of looking at db design so that db experts can design databases that programmers can use. The current generation of solutions (including ORMs) tend to give a lot of the worst elements of both worlds.
My solution for OOP integration is to design databases on two levels: good, semantic and mathematical relational design with appropriate triggers, etc. on the bottom level, and discoverable stored procedures on the top level. "Discoverable" means that the stored procedures are stored and cataloged in such a way that applications can either discover appropriate stored procedures themselves or, once they know a name, provide a semantic and consistent matching to their own internal data structures without additional programmer interference. This is sort of like taking the ideas behind SOAP and applying them to stored procs. I call this approach Service Oriented Database Architecture, or SODA. SODA can be done fairly easily in PostgreSQL by using consistent naming conventions of both stored procedures and arguments. This provides a lot of other benefits to the organization using the software as well, including the fact that business logic is applied consistently across applications even if those applications are written in different languages. (SAP does something similar with web services in the middleware layer, but I am talking about putting the SOA piece in the db itself.)
A better question is why one would design a database which arbitrarily excludes the possibility you mention.
Light-weight CMS is not "real database work" regardless of traffic volume. Heavy-weight CMS could be (i.e. tracking legally mandated approval processes for engineering designs for, say, a hydroelectric dam) but it is hardly the same thing.
For "real database work" you need at least moderate volume and at least a moderate degree of semantic complexity. Lightweight CMS does not provide the latter.
Real database work is building an accounting database for a 150+ employee organization.
My business is based on the premise of helping people and businesses use FOSS. Why is that disgusting?
Note that all my code is released FOSS, and I actively promote projects where my business is not the only vendor.
I do find the principle of one entity controlling development for financial gain to be distasteful. Hence the protection is to ensure that one has a broader core team which represents a more diverse set of interests.
difficult at best. I am involved in the LedgerSMB project and in fact I helped to found the fork (we split off from SQL-Ledger because the author was not interested in fixing security holes). Since then we have found that the codebase is ACID compliant only in the Timothy Leary sense (i.e. the code reminds you of code one might write while on LSD)....
LedgerSMB is a rare fork because we are still around 1+ years later. This occurred only because the community was, as a whole, dissatisfied with the way SQL-Ledger was founded and we made sure we had enough resources to sustain development before we started. Now a year later we are finally taking it to the next level, have a new architecture/framework in place and are starting to move the code over to it.
Nearly all forks fail and most of those that succeed do so by being absorbed back into the main project. We actually encourage collaborative forking in LedgerSMB where the idea is to build different solutions on a common framework. THe only way a fork succeeds is if the core developer(s) don't meet the needs of the community and if a fork comes along that can (a number of dead forks of SQL-Ledger also exist). If you do this, why worry about forks?
IANAA either, but I have to know how these things are supposed to be done. Hence I have discussed this issue with CPA's in order to ensure I understand how adjustments to closed periods are supposed to work.
You have a point that one should be aware that people sometimes do things wrong and guard against that (for example, by using custom triggers/check constraints to ensure that inserts don't occur in closed periods, nor do updates/deletes *ever* occur outside of a few specific issues such as voucher approval/deletion).
However, here is how the correction process is supposed to work: The correction is supposed to be posted to the first open date in the books. Note that this does not affect income tax requirements (and you could have to go back and restate earnings) but for the purposes of financial accounting, unless a systematic error is made, adjustments are made to an open period. If a systematic error is made, it is going to be invasive to fix anyway so having to go in and rebuild the summary data when you are done is not a bad thing.
There is one other case that causes a problem. Suppose you have an open voucher (i.e. recorded transaction which has not been fully posted to the books) which remains open for several years (I have seen this happen). In these cases, one would also need a flag to show that a transaction had been used in the summary. Such vouchers pose other issues for financial accounting and I think that this is a bad practice but some businesses do have strange requirements of this sort.
So it is more complex that one would think, but it is still doable.
I would think that inserts, updates, and deletes would qualify as the other queries one prejudices against.
In general, when I have to do this, I tend to use a hybrid approach (and only where inserts/updates/deletes on a set of records is forbidden) and that is to essentially store what I call "checkpoint summaries." For example "last time you closed your books, your AR account had a balance of $53,123.31" Then reports can start at the checkpoint summaries and build forward. Again, this is only possible in a subset of applications (mostly those where business rules are well established and forbid inserting data into the range specified). Even there, the complexity can be a problem, but it is a lot better than dealing with live summaries.
That is OK....
When we forked LedgerSMB, maybe half of the tables were in 3NF, and several tables weren't even in 1NF (as a few didn;t even have candidate keys).
BTW, the one original bug I reported in PostgreSQL was back in 7.x, and involved ways of building tables so that they were write-only (i.e. the db server couldn't read them). Had to do with handling tuples as attributes in tables.
You don't do real database work on MySQL.
You haven't done real database work until you report an original bug in PostgreSQL.
Maybe but not always. It is true that people often denormalize OLAP systems because they are trying to get specific answers out fast so summary tables/materialized views make sense. Ask a new question nobody has built summary tables for and expect to wait....
However, this is not always the case. If you look at a lot of OLAP setups on BizgressMPP or Terradata, there isn;t a lot of denormalization necessary because you can address the issues via parallelism rather than summary storage. This is important because it means your OLAP solution is able to answer a far wider range of questions quickly.
I think that one of the problems is that you can't approach database design like programming in terms of logical structures. This is especially true when trying to do OO programming against RDBMS's. Hence if you don;t teach people theory, they build bad databases because they approach relations like classes (which they clearly are *not*).
Yes, theory is extremely important. If you don't have good theory both in the DML and DDL side, you are going to run into problems. But these are not extremely difficult concepts-- they are just not taught.
I used to provide tech support for MS Access at Microsoft. I could tell you some stories....
Take a look at the SQL-Ledger schema sometimes.
No normalization (some tables are not 1NF).
No RI enforcement.
No NOT NULL enforcement in critical areas (chart_id is NULL. Where did the money go? UNKNOWN!)
Ambiguous foreign keys (as in a foreign key that references any of a number of possible tables)
Dangerous abuse of data types (double precision floats for storing money)
Using custom triggers where ON DELETE/ON UPDATE events and foreign keys would be better
I saw that in the ToC. However, I saw no reason to assume that there was actually good information about normalization in the book.
For example, what *is* third normal form? How is it defined? You can't reasonably approach this subject without assuming that you can discuss at least basic algebra (and preferably relational algebra) with your readers. I hence got the impression that this was the non-mathematic approach to normalization which doesn't work (it is like non-math-applied physics).
Yes, it advocates normalization (good) but I have yet to see many good beginners books that cover normalization well enough to make it understandable.
M-M in customers/addresses can make a lot of sense in many cases. In general, I would M-M them unless I had good reason not to do so. However, I have also seen people do this sort of thing badly and hence one has to understand that this is is something which is more difficult than just duplicating the information (actually I would probably decompose addresses into several other tables-- city, country, etc but that is another matter). Also it usually isn;t as simple as a M-M mapping. Instead you have a customer, which maps to an address along with a bunch of metadata.
This isnt about saving space. It is about ensuring that as much data as possible is centrally managed so we know exactly what it means at any given time. And it also maximizes your flexibility later on. For example you can ask which customers have shipped orders to a specific address much more quickly and robustly than you an if the address information is duplicated.