Stored Procedures - Good or Bad?
superid asks: "I'd like to get opinions and real world experiences that people have had with database centric applications that rely extensively on stored procedures. I believe that most enterprise class databases such as Oracle, MS-SQL, PostgreSQL, DB2 and others implement stored procedures. MySQL has been criticized for not supporting stored procedures and will be adding them in MySQL 5. The ANSI-92 SQL Standard also requires implementing some form of stored procedure (section 4.17). So, I'm asking Slashdot readers: if you were architecting a highly data-centric web based application today from a clean slate, how much (if at all) would/should stored procedures factor into your design? Where are they indispensable and where do they get in the way?"
"The arguments for stored procedures are pretty straightforward: 1) Centralized code; 2) Compiled SQL is faster; 3) Enhanced security (as our application is over 15 years old, and consists of much legacy code, reimplementation and feature creep that now includes over 3000 stored procedures). At one time we had a client/server architecture so those three advantages were relevant. However, in the past 4 years we have moved everything to web front ends and I have argued that this is no longer true. Does it really matter if my business rules are centralized in stored procedures or in a set of php/asp scripts (ie, in the web tier)? Is it really important to shave compilation time when connection and execution times dominate? (and overall response is ok anyway?) Since the focal point is the webserver, shouldn't security be done there, rather than the DB?
In addition, you either have to have a dedicated T-SQL or PL/SQL coder who then is the weak link in your coding chain, or your pool of developers must become fluent in both your scripting language of choice as well as the SP language. I have experienced both of these approaches and found this to cause bottlenecks when 'the database guy' is unavailable and learning curve problems (bugs) with new coders getting familiar with the db language.
Finally, after staying with our DB engine choice for all these years we are acknowledging that they may not be around forever. Management has asked us to look into migrating our data and business logic to another DB choice. We'd sure love to just be able to point the web tier at a new data source but that is unattainable due to a convoluted tangle of db specific code."
In addition, you either have to have a dedicated T-SQL or PL/SQL coder who then is the weak link in your coding chain, or your pool of developers must become fluent in both your scripting language of choice as well as the SP language. I have experienced both of these approaches and found this to cause bottlenecks when 'the database guy' is unavailable and learning curve problems (bugs) with new coders getting familiar with the db language.
Finally, after staying with our DB engine choice for all these years we are acknowledging that they may not be around forever. Management has asked us to look into migrating our data and business logic to another DB choice. We'd sure love to just be able to point the web tier at a new data source but that is unattainable due to a convoluted tangle of db specific code."
I like to keep business logic in one place as much as possible. You are almost assured to have some in your app, so I try to keep all logic there.
Stored Procs and triggers make can make the code simpler and more efficient, but spread out the workings of the application and unless properly documented, more difficult to understand.
Just my $0.02 CDN
Bad, of course.
I only put triggers or constraints or whatever in the database for one reason: to make sure only valid data enters the database.
For instance if ColumnA must be between 5 and ColumnB+34, that should go in the database. The database itself should guarantee the data is clean. "Data logic" you could call it.
Business logic and everything else should go in the higher layers. There is some ambiguity about what is "data logic" and what is "Business logic" but it's usually pretty clear.
Why? Maintanence. The stored procedures tend to rust in place over time. If you're an "agile" developer you'll go nuts not being able to refactor business logic or have unit tests check your database procedures. If you're a "BDUF" (big design up front) shop, you might like it, but thankfully many are moving away from that.
Particularly for an application where you are returning large amounts of data, stored procedures hold a distinct advantage over dynamic SQL queries in that, if the SP is designed correctly, the database has pre-optimized the query plan at compile-time and runtime execution is therefore much faster. It also allows for underlying table structures to change without impacting your application logic.
Also, when it comes to long-term database maintainability, putting your database logic in stored procedures allows the db admins to get an accurate overview of what objects/tables are in use and which are no longer needed. At my company, where we have over 20 databases, this is an absolute must.
Generally speaking, I use dynamic SQL during initial development and move to stored procs for QA and production.
If I were a bit more of a tinfoil hat wearing man, I'd be Slashdot makes some of these "Ask Slashdot" topics up because the ensuing flamewar will cause more page hits than usual.
The idea of a database is to put the whole data-relation logic in the database, if only to insure atomicity of operations.
Because as soon as you rely on an external process to maintain data integrity, you're bound to fall prey to some sloppy programmer who does not understand the data relationships and will not properly maintain the data integrity.
At least, when you use stored procedures, you can concentrate the data integrity logic in only one place, which is easier to control and manage.
Most of the development I do at my job is Coldfusion+Fusebox with SQL Server on the backend (I don't care if you hate MS, don't bother knocking SQL Server) and stored procedures just make life easier. They're also handy in the instance that you may have multiple front-ends written in multiple languages accessing the same database in many cases. Making a change to the way data is returned is far easier to do in one stored procedure than in X number of front-ends. One of the main reasons we don't use mysql is because the stable versions don't have them.
If the applications are written in one of the various scripting languages, then this argument doesn't apply:
One major problem with enterprise applications is that when a problem is found in an adhoc query (poorly written, a bug with the DBMS, performance related, etc) then the application would normally have to be recompiled and pushed out to the entire enterprise (could be tens of thousands of computers to push to). This isn't desirable.
Moving the queries into stored procedures (where possible) allows you to correct the stored procedure at a central location and roll it back to the 'old' stored procedure if necessary with minimal effort.
A good rule of thumb: use stored procedures for compiled applications
Jason L. Froebe
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
Prepared statements and vendor-neutral SQL are the way to go for portability and controllability of the development process. Use SPs judiciously, if at all, and only when there's a highly compelling need to do so(e.g., order of magnitude speedup, etc).
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
Besides the optimization the DB might do on SP's as opposed to dynamically created SQL statements, SP's are nice from a security point of view.
:).
You have to be extra careful with dynamic SQL due to SQL injection bugs that we all know about. This isn't really an issue when you're dealing with stored procedures that take defined data types as opposed to creating SQL on the fly based upon your data (which could have injected SQL).
Controlling which DB accounts can use what stored procedures is also handy mechanism for determining permissions. Stored procedures represent what all your application might do, so picking which DB connections (which have credentials) can access these is a nice place to control those permissions.
Granted, you can still do lots of stupid things to mess up security
Also, there are places where SP's are not really possible. Severely dynamic reports are a good example (assuming you allow that functionality in your application). There's definitely times when you HAVE to generate SQL on the fly. In any event, try to create a "data access layer" in your code and if you have to dynamically generate SQL, run all sorts of checks on it with regexp's etc to check for injection.
An RDMBS (as opposed to just a database) is actually for manipulating data, not just storing it. Otherwise you'd just use flatfile for everything and implement all the relational logic in your app code. The database can execute stored procs far, far faster than your app code can perform the same functions. Using database side stored procs gives you the exact same advantages as a class library with additional performance and security options. There's no loss. Why not use them?
I'm an Oracle DBA, and I like creating packages of stored procedures and functions (and especially table functions) that represent an API of sorts to the database. This means that the application code doesn't care how the data is stored, and the DBA is free to rearrange the data for tuning purposes, without requiring any app changes (assuming the API remains constant).
In the past I've supported keeping more of the business logic in the database, but I no longer believe this is an optimal design. Now I keep business logic out of the database as much as possible and limit the stored code to enforcing data integrity and making the database look like a "black box" to the apps.
I've found that, say, writing an app with a lot of code in Oracle PL/SQL, using cursors, etc, means your app will only and forever support Oracle, without a whole lot of re-write and likely re-design.
So unless you like vendor tie-in... stay away from db-specific stored procedures.
MORTAR COMBAT!
Stored procedures are a performance optimization, consider the following scenario:
Retrieve the 20th page using a page size of 50 records for all the SKUs under a catalog (potentially millions total) for a specific user which could or not have visibility permissions for each SKU. Assume the security provided by the database is too coarse to fulfill the business requirements, therefore some set of rules must be evaluated to determine SKU visibility for a particular user.
That query would normally be very fast if implemented as a stored procedure because:
1) Only one round-trip is needed.
2) You don't have to move all the data to a middle-tier and then filter out information.
3) RDBMSes are usually faster at filtering data out (by using indexes, denormalization, etc.) that what a developer could code in a middle-tier to filter out information.
4) Most RDBMSes are very good at scheduling tasks, caching, managing memory, etc. The more you move logic away from it, the more you would have to implement it yourself.
You could send all the SQL statements to the database and achieve the same effect, but it might make debugging harder and you still have all the SQL logic in some place, only a different one.
On the flip side:
1) It's harder to write stored procedures than it is to write code in a managed language like Java or C# (thirty-line SELECT statements are not very intuitive).
2) Generally speaking, the compiler of a managed language does a better job at catching errors than a compiler for stored procedures, where a lot of the errors will be caught at runtime.
3) Stored procedures are not portable.
My advice is, if you are only using the RDBMS as a persistence device and your data size is not huge, avoid stored procedures and create some sort of middle-tier object model. Only when performance is a impediment, use stored procedures. You might as well use a hybrid approach, try to model as much as you can in the middle-tier and implement stored procedures for those tasks which are performance intensive.
I work with people that worship UML and patterns as well with RDBMS Gods that can plow through pages and pages of stored procedures without blinking. As much as I love ULM and patterns there are some tasks that must be done in the RDBMS for performance reasons, and tasks that are simply more maintainable when done in the middle-tier. Both approaches have advantages and disadvantages, the trick is to use the best approach according to the situation.
On the other hand, you should never, ever put actual application logic in a stored procedure. The reasons are several. The most important is that stored procedure languages are all, to a greater or lesser extent, crap. This comment will cause me to be flamed to death by those who only know PL/SQL etc, but the fact is it's true. They are not general-purpose programming languages.
Sure, you might not RIGHT NOW want to fork off sendmail from your application, but some day you might. Or, horror of horrors, maybe you'd like to write directly to a system file? Or use a neat SNMP library you found? Although there are twisted, hacker-like ways to do these things in most DBMs they are hardly the model of reliability or professionalism. [1]
Secondly, they tie you in at a fundamental level to a particular database vendor. Database software is generally neither Free nor free. They want you to put your business logic in their stored procedure language because it will only run in their database products. Lock in is bad. OK, you'll be locked in whatever you do, but I'd rather be locked into Java or Python than PL/SQL.
Thirdly, you are losing control of your application's performance. You have very little control over how the code will be optimised or run.
Fourthly, you are breaking abstraction. It is very, very hard to write stored procedures which aren't entirely dominated by the structure of the underlying database.
Finally, assuming you probably will have to have a middle layer between the client and the database anyway, it's a bad idea from a maintainability point of view to bits of the same functionality among your layers.
[1] have you ever written a cron job to run a query to dump a table to a file to be parsed by a Perl script to send an email? You might be an Oracle Portal user.
DBA: "I think we should use more stored procedures"
Project Manager: "That would be consistent with our n-tiered strategy"
Programmer: "But all of our business rules are in object libraries"
Division Manager: "Could I get the ranch dressing? Thanks."
DBA: "The right way is to put business logic in stored procedures. It's faster"
Project Manager: "We're standardizing on faster programs across the enterprise"
Programmer: "But we'll have to re-write the entire system!"
Division Manager: "Pass the pepper. Thanks."
DBA: "Oh, that's alright. We were going to upgrade the servers too. We can finish both jobs at the same time"
Programmer: "We're upgrading the hardware too?! So basically we're building the entire company from nothing again, right?"
Project Manager: "You're not being much of a team player"
DBA: "The new system will be more robust and have new features"
Division Manager: "Lets get the system built as soon as possible. We're laying off the entire division in a few weeks"
Programmer: "WHAT?! I just signed a new car lease!!"
Division Manager: "Pass the croutons please"
Business isn't willing to pay for products, innovation and careers, so we get brands, mortgage commercials and layoffs.
I've always wondered about the real speed difference between MySQL and PostgreSQL. You said that you were able to take out a lot of code because PostgreSQL was to do more fo the checks for you, and the stored procedures handled the remaining high-level details. Back in the day, the MySQL developers publically trashed the idea of transactions, instead recommending people emulate similar functionality in their client-side code. And that they did.
Now, consider two similar DB-based applications. One is connecting to MySQL with all sorts of client-side code to emulate transactions and referential integrity (foreign keys, etc). The other is connecting to PostgreSQL where most of the work is done in the DB engine. Foreign keys are used, as are transactions and stored procedures. Thus the number of app->db requests will be far fewer than the client-heavy MySQL app. Now, even though the MySQL server may beat the PostgreSQL server in raw speed from simple selects, how would that change when you factor in the extra work that the MySQL app has to do? And what about more complex queries? I've found that PostgreSQL can handle complex queries (sub-selects, unions, aggregrates, etc) very well. It's much more efficient to do a single big multi-row query than lots of single-row queries. It probably takes longer for the programmer to write the query, but once that's done the optimizer gobbles it up and produces a plan that gives you all your data at once. Very nice.
I'd bet that all or most of the MySQL/PostgreSQL benchmarks don't take into account the very different ways these two DB's are used in real apps. MySQL is traditionally used as a simple data store with an SQL interface. Whereas PostgreSQL can become a more intergral part of the application, with stored procedures, triggers, foreign keys, etc.
Procedural code and SQL code are two separate programming language processes. The first directs the computer from a singular point of view. For instance "do this, then that, then go here and check this. etc.". And the other deals with groups of items. such as "everyone wearing blue shirts go to room 103", or "we don't need these anymore".
As far as intermixing these code bases, your procedural business logic and data business logic should be split when it makes sense. The database is optimized for merging and managing sets of data, and procedural code is good for binding this to a functional form. The business logic should be split into these two zones and implemented appropriately. It would be inappropriate to return a set from a database then loop through that set searching for some name or value. And at the same time it would be unwise to return two sets and join them in your code. With experience it seams cleaner to maintain these two zones of code. This doesn't mean that you need to use stored procedures though.
As far as stored procedures, they are a convenient way of separating these two types of languages, another way is to in place the Sql code into your procedural code, but it seams advisable to centralize this type of code in one place for visibility, and manageability. If stored procedures are not available or undesirable, then using classes or function that are located in some central, or locatable place, is recommended.
As far as for speed, implementing the data and logic in the appropriate place will speed your application, but stored procedures will not in there own right speed anything up. At least in MsSql server, stored procedures are not precompiled. They exist as plain text, just like issued queries. They do however get their own query cache, separate for the issued query cache, which could be of a little assistance.
Anyways. I am over talking about this. Take it, as u wants it.
-- Grimace1975
First, I think that stored procs are not only often good but lead to better, more powerful, secure, and flexible applications that would be feasible without them. But on the other hand, they lead to hard to maintain applications which are explicitly tied to one database. So they are necessary but often misused.
Triggers are more important and usually use stored proceedures to ensure that the information in the database is always meaningful or that some other automated activity happens on an insert or update.
Unless I absolutely have to, I try to avoid having my application call stored proceedures directly. A relational database manager *should* be able to hide the stored procedures behind a view allowing a nice standard interface for your data. This means that if you have to move to another RDBMS later, porting is much more simple and mostly confined to the backend.
BTW, I agree with the points about having your business logic in one place. Stored procedures allow you to move business logic which is inherent to the database into the database, thus making it available from all clients regardless of the language they are written in. For a single app/db pair this is not an issue but if you have a large DB with many different clients, it is a larger issue. Maintaining your application in one location is a LOT less work than reimplimenting it all in every one of your apps.
Triggers, BTW, as I mentioned before are very powerful mechanisms. They are not called by the app directly but are run every time a row is inserted, updated, or deleted (some RDBMS's also allow select triggers, though some have alternate ways of implimenting this). They can be used to impliment additional security restrictions, enforce referential integrity, or more advanced stuff such as send email when a new order is added to the database. Again, this is done regardless of whether the order is filed using a web app or a Windows app for the rep in the call center. Since the logic is unknown to the app, it isn't even aware of what happens after the order is placed. Talk about clean interfaces..... This requires stored procedures.
So, these are the great things about stored procedures. But when they are used badly, you end up with the stored procedures reducing the agility of the application because they tie it in too closely to the database. What do you do when your app is tied to Sybase and your customers want MS SQL Server? What if *all* your logic is in that database? Do you rewrite *all* of it for MS SQL Server? Probably Not. You are stuck and out of your investment.
In my opinion, it is important to maintain a clear line between what is the database's job and what is the application's job. If this line is blurred, bad things can result. Stored procedures are a very easy way to blur this line.
I design my apps with the following layers to them:
UI
Application-specific logic
Database Access Layer
Information Presentation (SQL API using views)
Information Management and Data Logic (usually stored procedures)
Information Storage (highly normalized, usually).
This allows the database to be a server app in its own right, and the client logic to run in the apps themselves. HERMES (see my sig) is mostly built this way, but there are a few things I need to change before I am happy with the interfaces. This is one reason it no longer supports MySQL.
LedgerSMB: Open source Accounting/ERP