Stored Procedures - Good or Bad?
superid asks: "I'd like to get opinions and real world experiences that people have had with database centric applications that rely extensively on stored procedures. I believe that most enterprise class databases such as Oracle, MS-SQL, PostgreSQL, DB2 and others implement stored procedures. MySQL has been criticized for not supporting stored procedures and will be adding them in MySQL 5. The ANSI-92 SQL Standard also requires implementing some form of stored procedure (section 4.17). So, I'm asking Slashdot readers: if you were architecting a highly data-centric web based application today from a clean slate, how much (if at all) would/should stored procedures factor into your design? Where are they indispensable and where do they get in the way?"
"The arguments for stored procedures are pretty straightforward: 1) Centralized code; 2) Compiled SQL is faster; 3) Enhanced security (as our application is over 15 years old, and consists of much legacy code, reimplementation and feature creep that now includes over 3000 stored procedures). At one time we had a client/server architecture so those three advantages were relevant. However, in the past 4 years we have moved everything to web front ends and I have argued that this is no longer true. Does it really matter if my business rules are centralized in stored procedures or in a set of php/asp scripts (ie, in the web tier)? Is it really important to shave compilation time when connection and execution times dominate? (and overall response is ok anyway?) Since the focal point is the webserver, shouldn't security be done there, rather than the DB?
In addition, you either have to have a dedicated T-SQL or PL/SQL coder who then is the weak link in your coding chain, or your pool of developers must become fluent in both your scripting language of choice as well as the SP language. I have experienced both of these approaches and found this to cause bottlenecks when 'the database guy' is unavailable and learning curve problems (bugs) with new coders getting familiar with the db language.
Finally, after staying with our DB engine choice for all these years we are acknowledging that they may not be around forever. Management has asked us to look into migrating our data and business logic to another DB choice. We'd sure love to just be able to point the web tier at a new data source but that is unattainable due to a convoluted tangle of db specific code."
In addition, you either have to have a dedicated T-SQL or PL/SQL coder who then is the weak link in your coding chain, or your pool of developers must become fluent in both your scripting language of choice as well as the SP language. I have experienced both of these approaches and found this to cause bottlenecks when 'the database guy' is unavailable and learning curve problems (bugs) with new coders getting familiar with the db language.
Finally, after staying with our DB engine choice for all these years we are acknowledging that they may not be around forever. Management has asked us to look into migrating our data and business logic to another DB choice. We'd sure love to just be able to point the web tier at a new data source but that is unattainable due to a convoluted tangle of db specific code."
I like to keep business logic in one place as much as possible. You are almost assured to have some in your app, so I try to keep all logic there.
Stored Procs and triggers make can make the code simpler and more efficient, but spread out the workings of the application and unless properly documented, more difficult to understand.
Just my $0.02 CDN
If I were a bit more of a tinfoil hat wearing man, I'd be Slashdot makes some of these "Ask Slashdot" topics up because the ensuing flamewar will cause more page hits than usual.
The idea of a database is to put the whole data-relation logic in the database, if only to insure atomicity of operations.
Because as soon as you rely on an external process to maintain data integrity, you're bound to fall prey to some sloppy programmer who does not understand the data relationships and will not properly maintain the data integrity.
At least, when you use stored procedures, you can concentrate the data integrity logic in only one place, which is easier to control and manage.
If the applications are written in one of the various scripting languages, then this argument doesn't apply:
One major problem with enterprise applications is that when a problem is found in an adhoc query (poorly written, a bug with the DBMS, performance related, etc) then the application would normally have to be recompiled and pushed out to the entire enterprise (could be tens of thousands of computers to push to). This isn't desirable.
Moving the queries into stored procedures (where possible) allows you to correct the stored procedure at a central location and roll it back to the 'old' stored procedure if necessary with minimal effort.
A good rule of thumb: use stored procedures for compiled applications
Jason L. Froebe
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
Prepared statements and vendor-neutral SQL are the way to go for portability and controllability of the development process. Use SPs judiciously, if at all, and only when there's a highly compelling need to do so(e.g., order of magnitude speedup, etc).
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
An RDMBS (as opposed to just a database) is actually for manipulating data, not just storing it. Otherwise you'd just use flatfile for everything and implement all the relational logic in your app code. The database can execute stored procs far, far faster than your app code can perform the same functions. Using database side stored procs gives you the exact same advantages as a class library with additional performance and security options. There's no loss. Why not use them?
I'm an Oracle DBA, and I like creating packages of stored procedures and functions (and especially table functions) that represent an API of sorts to the database. This means that the application code doesn't care how the data is stored, and the DBA is free to rearrange the data for tuning purposes, without requiring any app changes (assuming the API remains constant).
In the past I've supported keeping more of the business logic in the database, but I no longer believe this is an optimal design. Now I keep business logic out of the database as much as possible and limit the stored code to enforcing data integrity and making the database look like a "black box" to the apps.
I've found that, say, writing an app with a lot of code in Oracle PL/SQL, using cursors, etc, means your app will only and forever support Oracle, without a whole lot of re-write and likely re-design.
So unless you like vendor tie-in... stay away from db-specific stored procedures.
MORTAR COMBAT!
Stored procedures are a performance optimization, consider the following scenario:
Retrieve the 20th page using a page size of 50 records for all the SKUs under a catalog (potentially millions total) for a specific user which could or not have visibility permissions for each SKU. Assume the security provided by the database is too coarse to fulfill the business requirements, therefore some set of rules must be evaluated to determine SKU visibility for a particular user.
That query would normally be very fast if implemented as a stored procedure because:
1) Only one round-trip is needed.
2) You don't have to move all the data to a middle-tier and then filter out information.
3) RDBMSes are usually faster at filtering data out (by using indexes, denormalization, etc.) that what a developer could code in a middle-tier to filter out information.
4) Most RDBMSes are very good at scheduling tasks, caching, managing memory, etc. The more you move logic away from it, the more you would have to implement it yourself.
You could send all the SQL statements to the database and achieve the same effect, but it might make debugging harder and you still have all the SQL logic in some place, only a different one.
On the flip side:
1) It's harder to write stored procedures than it is to write code in a managed language like Java or C# (thirty-line SELECT statements are not very intuitive).
2) Generally speaking, the compiler of a managed language does a better job at catching errors than a compiler for stored procedures, where a lot of the errors will be caught at runtime.
3) Stored procedures are not portable.
My advice is, if you are only using the RDBMS as a persistence device and your data size is not huge, avoid stored procedures and create some sort of middle-tier object model. Only when performance is a impediment, use stored procedures. You might as well use a hybrid approach, try to model as much as you can in the middle-tier and implement stored procedures for those tasks which are performance intensive.
I work with people that worship UML and patterns as well with RDBMS Gods that can plow through pages and pages of stored procedures without blinking. As much as I love ULM and patterns there are some tasks that must be done in the RDBMS for performance reasons, and tasks that are simply more maintainable when done in the middle-tier. Both approaches have advantages and disadvantages, the trick is to use the best approach according to the situation.
Don't be assinine. A modern production environment is running at least 100MBit Ethernet, if not 1GBit. A HUGE query, say 2500 characters, is 2.5KBytes. Gee, thats pretty rough.
It's not the query he is talking about, it's the transference of the data set from the database to the application server. This is where the data will be processed if the logic is in the app server.
On the other hand, you should never, ever put actual application logic in a stored procedure. The reasons are several. The most important is that stored procedure languages are all, to a greater or lesser extent, crap. This comment will cause me to be flamed to death by those who only know PL/SQL etc, but the fact is it's true. They are not general-purpose programming languages.
Sure, you might not RIGHT NOW want to fork off sendmail from your application, but some day you might. Or, horror of horrors, maybe you'd like to write directly to a system file? Or use a neat SNMP library you found? Although there are twisted, hacker-like ways to do these things in most DBMs they are hardly the model of reliability or professionalism. [1]
Secondly, they tie you in at a fundamental level to a particular database vendor. Database software is generally neither Free nor free. They want you to put your business logic in their stored procedure language because it will only run in their database products. Lock in is bad. OK, you'll be locked in whatever you do, but I'd rather be locked into Java or Python than PL/SQL.
Thirdly, you are losing control of your application's performance. You have very little control over how the code will be optimised or run.
Fourthly, you are breaking abstraction. It is very, very hard to write stored procedures which aren't entirely dominated by the structure of the underlying database.
Finally, assuming you probably will have to have a middle layer between the client and the database anyway, it's a bad idea from a maintainability point of view to bits of the same functionality among your layers.
[1] have you ever written a cron job to run a query to dump a table to a file to be parsed by a Perl script to send an email? You might be an Oracle Portal user.
I've always wondered about the real speed difference between MySQL and PostgreSQL. You said that you were able to take out a lot of code because PostgreSQL was to do more fo the checks for you, and the stored procedures handled the remaining high-level details. Back in the day, the MySQL developers publically trashed the idea of transactions, instead recommending people emulate similar functionality in their client-side code. And that they did.
Now, consider two similar DB-based applications. One is connecting to MySQL with all sorts of client-side code to emulate transactions and referential integrity (foreign keys, etc). The other is connecting to PostgreSQL where most of the work is done in the DB engine. Foreign keys are used, as are transactions and stored procedures. Thus the number of app->db requests will be far fewer than the client-heavy MySQL app. Now, even though the MySQL server may beat the PostgreSQL server in raw speed from simple selects, how would that change when you factor in the extra work that the MySQL app has to do? And what about more complex queries? I've found that PostgreSQL can handle complex queries (sub-selects, unions, aggregrates, etc) very well. It's much more efficient to do a single big multi-row query than lots of single-row queries. It probably takes longer for the programmer to write the query, but once that's done the optimizer gobbles it up and produces a plan that gives you all your data at once. Very nice.
I'd bet that all or most of the MySQL/PostgreSQL benchmarks don't take into account the very different ways these two DB's are used in real apps. MySQL is traditionally used as a simple data store with an SQL interface. Whereas PostgreSQL can become a more intergral part of the application, with stored procedures, triggers, foreign keys, etc.
Procedural code and SQL code are two separate programming language processes. The first directs the computer from a singular point of view. For instance "do this, then that, then go here and check this. etc.". And the other deals with groups of items. such as "everyone wearing blue shirts go to room 103", or "we don't need these anymore".
As far as intermixing these code bases, your procedural business logic and data business logic should be split when it makes sense. The database is optimized for merging and managing sets of data, and procedural code is good for binding this to a functional form. The business logic should be split into these two zones and implemented appropriately. It would be inappropriate to return a set from a database then loop through that set searching for some name or value. And at the same time it would be unwise to return two sets and join them in your code. With experience it seams cleaner to maintain these two zones of code. This doesn't mean that you need to use stored procedures though.
As far as stored procedures, they are a convenient way of separating these two types of languages, another way is to in place the Sql code into your procedural code, but it seams advisable to centralize this type of code in one place for visibility, and manageability. If stored procedures are not available or undesirable, then using classes or function that are located in some central, or locatable place, is recommended.
As far as for speed, implementing the data and logic in the appropriate place will speed your application, but stored procedures will not in there own right speed anything up. At least in MsSql server, stored procedures are not precompiled. They exist as plain text, just like issued queries. They do however get their own query cache, separate for the issued query cache, which could be of a little assistance.
Anyways. I am over talking about this. Take it, as u wants it.
-- Grimace1975