Actually I think they do [google.com]. Google is exactly the sort of app where databases make a great deal of sense.
They use servers which hold data, so sure, I guess you can call them datatbases. But I'm saying those aren't relational DB servers, and the don't speak SQL. If you have any evidence otherwise, by all means speak up.
Personally, I'd be strongly against it. Relational servers don't make much sense in their context, for a zillion reasons:
the data is poorly suited to the relational model,
they don't give a damn about generalized ad-hoc query capabilities,
they have no need for transactions,
SQL servers aren't particularly fast in the 99.9999%-read mode that Google would use (and if you don't believe me, compare speeds with, say, an LDAP server),
because their data is scattered across thousands of boxes they don't get the RDBMS's benefits that come from a single, tightly linked set of data,
to get decent response times, they have to keep all the data in RAM anyhow,
and on, and on.
As far as I know, no major search engine uses relational, SQL-driven databases. Certainly AltaVista and Inktomi didn't. If you know of one, I'd be intrigued to hear about it.
A modern RDBMS only has to write the transaction state to disk (redo or undo).
No, you have to write the changes to the tables out sometime. The log is just a way to speed up the ACK. Prevayler, on the other hand need only write the data model out occasionally, like once a day.
For equivalent write speed, Prevayler should thus use less hardware.
Modern storage hardware is still not as fast as physical onboard RAM. "pretty zippy" still doesn't cut it. A fibre channel connection is going to top out at 400M/sec [...]
I'm afraid I don't see your point here. Prevayler is, practically speaking, only useful for datasets on the order of a couple of gigabytes. If you can write an update log at 400MB/s, then you'll be able to completely change your dataset every few seconds with Prevayler.
There are probably applications that have such a small data size relative to update volume, but I've sure never seen one. Certainly they are not very common. If they exist, they probably shouldn't use Prevayler.
That latency will translate into latency in your data storage engine.
If you use battery-backed RAM cache for the writes, then there's no real latency issue until you get close to your I/O bandwidth limits. This was a common high-end file server trick a decade ago, and I'm sure it's still available.
If the latency of storage devices aren't an issue for you, you simply aren't doing interesting transaction volume.
By that definition, I'd guess less than 1% of code out there must be interesting, at least to you. Which is fine; Prevayler's probably not right for you. For the other 99% of us, Prevayler's worth a look.
Indeed, this is the key assumption. If you can live with this, the database size requirements and give up ad hoc queries, then it seems very attractive.
Exactly! And the last one may not be too hard to fix; I have sketches in my notebooks for bolting an SQL engine onto an arbitrary object model, so that you can query the live data with Crystal Reports. And I have hopes that I can steal most of the SQL stuff from one of the open-source SQL engines.
I'm glad it's not as bad as I feared; I wish the promotional material was more clear on the technical details instead of being "why are you still using a database".
Agreed! Of course, if it weren't a controversially-phrased manifesto, then a lot fewer people would have heard about it.:-)
But Prevayler's main site is a Wiki, so you are welcome to edit it to make it less confusing. Or you could take your knowledge and write an article for somebody like IBM's DeveloperWorks: "Prevayler: Not As Crazy As It Sounds". They pay well!
The only way to gaurantee that you've written what you want to replay is to do some sort of synchronous IO operation. THIS is why transaction logging is a problem.
If you want reliability, you have to do syncronized writes. There's no way around it, whether you use a DB or not. But Prevayler should be faster at writing than the equivalent DB on the same hardware, as the DB has to write multiple places on a disk, whereas Prevayler just appends to a log. If you favor speed over reliability, then it's a one-line code change to turn syncing off in Prevayler.
But as I said, adding bytes onto the end of an open file is pretty zippy on modern hardware, so that won't be a bottleneck for most people. And if it is, then there are many reasonable hardware solutions to accelerate writes safely.
That's the whole point of something like Prevalayer: high transaction volume.
No, the whole point of Prevayler is that if your objects are just all in memory, then it's much easier to code. That it's thousands of times faster than using a DB is a nice bonus.
How much does the software trust the underlying OS and hardware?
Prevayler does the proper paranoid thing, syncing after every write.
If you want less paranoia, you could easily do that; it's a one-line code change. Or you could just ask your operating system to do it.
If that ends up being a speed issue, then there are plenty of ways to improve IO. But really, putting bytes on the end of an open file is pretty fast these days, so I doubt many will hit this bottleneck in practice.
What if your command is slow? Safe would be to serialize all commands, but I doubt that's what they do based on the performance. It appears they just don't acknowledge the issue. This means that the log written to disk might generate different final snapshots depending on the relative execution time between two different reconstructions.
Prevayler applies commands in strict linear order. Thus it's completely deterministic, and therefore most of your worries aren't an issue.
I think the only remaining one is long transactions. Since Command objects are executed one at a time, a slow command will cause writes to block. The solution: Don't use Prevayler Commands for long transactions. If you need to do something that takes a while, then you should do that first, build up the Command, and then submit that for execution.
So again, use the right tool for the job, and don't buy into their ~"you won't ever need a dbms again"~ marketing hype.
Until you know what this tool is, you'll have a hard time telling what it's good for. Perhaps you could start by reading the FAQ, where a lot of your issues are covered.
Basically what I'm saying is that instead of allowing the programmer to imbed the statment insert Foo ( bar, baz ) values ( @x, @y ) in his code,
Ah, I think this is the source of the confusion.
I wouldn't think of doing a system with persistent data where programmers were just writing stuff to the persitent store willy-nilly. He should just be calling foo.save() and maybe transaction.commit(). The persistence layer should take care of everything else. Then I keep the data integrity code in Foo (where I should have it anyhow).
It's my experience that when you let applications programmers write SQL, you get crappy databases.
Agreed. The persistence layer should be written by somebody with clue, in such a way that the often tricky details of a particular RDBMS are safely hidden.
There's no one language that all programmers know. No one language is suited to all tasks.
Sorry, my original statement must have been poorly phrased. What I meant was that on a typical project, the ratio of application programmers to DBAs is circa 10:1. Thus, it makes sense that core business logic (which includes the kinds of data consistency stuff that often ends up in stored procedures) should be in the language the application programmers know, leaving the DBA to focus on the database, not the business logic.
Even a fairly trivial database-backed CGI web app [...] Perl for the CGI [...]
Oh, my comments should only be taken as applying to OO systems. (Prevayler comes from a very strong OO background, so I'd taken OO architectures as a given.) CGI is very procedural, and despite their best efforts, so is Perl. In procedural languages (or with developers who insist on writing procedural code in OO languages) then the database is really the only point to put a lot of the data integrity checks you mention, so I'm happy to use stored procedures there.
You (or at least *I*) don't let apps rummage around in raw data -- all database access (reads and modifications) goes through stored procedures. The user should never have permission to touch the tables directly - all he needs to have is execute permission on a specific set of SPs.
Yes, but that puts core business logic in a place that's hard to reach, hard to test, and hard to work with.
Wouldn't it be better to have it in a lanugage that all your programmers know? In something that's not locked into a particular vendor? And in a full-fledged programming environment?
Extreme Programming seems to have lost its impetus, or am I just not up to date?:)
You're not up to date. New user groups are still popping up, conference attendance is increasing, and lots of people are adopting it. I'm helping convert another two teams to it right now.
Re-read what I wrote, paying attention to the highlighted words.
Ooops! My mistake.
whereas an RDBMS schema can be wrappered by multiple independent object models
True, but is that a bug or a feature?
One of OO's big wins is that if you wrap your data in code, you gain a lot in reliability and flexibility. By letting lots of apps rummage around in the raw data, you lose encapsulation entirely.
Even worse, once multiple code bases are massaging the same database, your schema is pretty much set in stone. Keeping a database in sync with one set of code is work; simultaneous synchronized refactoring to multiple codebases gets exponentially harder.
Sure, you can do all sorts of database tricks with stored procedures, views, and data warehouses to try to juggle these problems. But in essence this turns a database into an application server or an integration layer. But all the code is written in a non-standard, procedural, vendor-specific language that has poor tool support.
So personally, I prefer architectures where when you need an app server, you use (or write) an app server. If you need an integration layer, you use (or write) one of those. And when you need a database, by all means, use a database.
You're arrogant and/or naive to assume that a simple text search is what he's referring to. What if you want to perform the eqivalent of a left outer join and a few inner joins combined with a UNION, INTERSECT, and NOT EXIST clause here or there? Good fucking luck, coding master.
Patient: Doctor, doctor! It hurts when I go like this! Doctor: Well, don't do that then.
In medicine, that's a joke, but in programming, it's often the truth.
If somebody's got an SQL statement combining all of that, then they've just created write-only code. Nobody coming upon it will get it. A month later, even the author will have to puzzle it out. As Martin Fowler says, "Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Now maybe it's required sometimes in an SQL system so that you don't want to pass too much data over the pipe. But in a Prevayler app, all the data is right there in memory. If you're writing good code, finding data in memory generally just isn't that hard. Just like everything else, you build your complex functionality up piece by piece from simple functionality. If you have code that looks like that SQL statement you mentioned, then it's not the sign of a clever programmer, it's the sign of a bad object model.
In the cases where it is truly a hard problem, SQL may not help anyhow. You can bet that the FPS games don't decide what objects to render by writing SQL queries. And I'm sure that Google isn't doing any "SELECT * FROM Documents WHERE..." when you do a query, either.
Notice that they don't appear to mention recovery time, or the size of the changelog.
Well, I could see why; this depends a great deal on your application. My guess is that this would be IO-bound on a typical system. Java's serialization is pretty zippy. The speed of the Command objects depends entirely on how you write them, of course, but if they're done right they'll be quick as well. Why don't you do some timings and put it up on their Wiki?
If recovery time is really an issue, it's also pretty easy to build hot clones on top of Prevayler. The command objects are serialized to be written to disk, yes? So as you write them to disk, you also write them over the wire to another server, which applies them as well.
But really, Prevayler systems should go down less. The more pieces your system requires to function, the more frequent your outages. Elminating the database server means that you lose all of the failure cases where something happens to that box.
They also don't mention the level of assurance.
The commands are logged before execution. If the server dies, you load the last snapshot and replay the log. What other assurance would you like?
Saying "this kicks Oracle's butt" is a distortion though: it solves different problems than traditional databases.
Well, I've certainly seen a lot of people with the notion that it was completely impossible to build a system for 1 GB of data without using a datatbase server, so I think there's some overlap in the market. But if you go beyond RAM-sized data sets, you're right that Prevayler is a bad choice.
Use the right tool for the job.
Amen! I like SQL databases a lot when I need them. But I'm sooooo tired of people who believe that the history of computing began (and ended) with Oracle.
These people have created a system for people who don't really give a f*ck about their data. However, they choose to measure the performance of their system against one that is specifically geared towards data integrity.
Once a night (or however often you want), Prevayler snapshots the whole object model. Then, every atomic change is encapsulated in a Command object. Before the command is executed, it's serialized and written to a command log.
If there is a crash, you just bring up the latest snapshot and replay the log.
Where do you see the potential for data loss?
Sure there is. In order to get a "robust" system, you're going to have to touch disk. Your level of acceptable transaction loss will determine how often that is. If that is too often, one should expect the performance of Prevayler to go out the window.
Well, you're welcome to demonstrate that. Adding bytes onto the end of an open file is a pretty fast operation, so the upper performance bound should be pretty high even on moderate hardware.
If that becomes a limitation, then there are plenty of fancy hardware options to speed up writes in safe ways, everything from fancy RAID setups with battery-backed write cache to just keeping your command log on a battery-backed solid state disk.
But really, I doubt that would happen much in the real world. Remember that Prevayler is meant for systems with moderate amounts of data; since it all has to be in RAM, you're pretty much limited to a couple GB of data. If your writes are coming so fast and furious that you can't log the update commands on a regular disk, then pretty soon you'll have more data than what you can fit in RAM anyhow.
What about lost updates or ghost reads? Part of what a database has to do is prevent uncommitted transactions from being visible (this is why everyone argues MySQL isn't really a database).
That's a good question. Prevayler Command objects, which is a unit of work sort of like a transaction, get executed in a continuous stream. Since all the data is in RAM, they go very quickly.
This gives you two basic options:
For a lot of applications, you just take the risk of showing slightly inconsistent data. Imagine a discussion system like slashdot. If the top of the page says 583 posts but you end up with 584 posts on the page because one was added during the microseconds you are rendering the page, then so what? Given the post-to-view ratio, odds are low that it will happen, and if it does, nobody will care.
But if you must have consistent data, then you take advantage of the fact that Prevayler executes commands one a time. Just make sure your reads happen in between modify operations and you'll be fine.
Ok, I'm going to assume that you're having a bad day or something, rather than leaping immediately to the conclusion that you're a flaming asshole who comes here because he likes insulting people and doing a lot of nay-saying.
I build real live working systems, and I've been doing it quite a spell. Prevayler is one good way to do that. It's not the only way, and it's only good in certain circumstances.
If you really think that finding and indexing your objects is the hard part of development, and that having somebody do that for you is what really matters, then Prevayler probably isn't for you. Personally, in the systems I build, it hasn't been a big deal.
If you're interested in learning about new ways of tackling things (which is part of why I come to news sites like this) then you may want to give Prevayler a try. Even if you don't ever use it in production, it's a very interesting viewpoint. And if you're happy with the tools you use now, then by all means, stick with 'em.
Unless I had a product that had an extremely specialized use case that matched OODB strengths, I would NEVER develop on this kind of platform again.
You did notice that this article has nothing to do with OODBMS products, right? Prevayler has no database. But I'll address your points directly:
It is very difficult to "see" an OO database. By nature, the data isn't tabular. It's a persistent object heap. There's no "SELECT * FROM USERS". So tracking down data-related problems involves exporting data to an XML file and sifting through it.
This hasn't been a problem for me. For system inspection, you can use something like BeanShell to wander the data. Or you could spend a day or two and write something that used introspection (like java.lang.reflect) to let you wander the data.
Reporting tools don't exist for OODB. Try hooking up Crystal or another reporting tool to this. You end up writing every report from scratch.
You would never let random yahoos run queries against your production database servers, I hope. Which means a data warehouse. In which case, you can export your OO data to an SQL data warehouse happily.
DB Performance when querying outside the normal object hierarchy
Common queries should be indexed. That's true no matter how you keep your data. It's not hard to do in Prevayler.
32-bit memory limited our max customer size dramatically.
Yes. Prevayler should only be used on systems where all data fits in RAM. That's not all systems. Yet.
Object prevalence does nothing to change that. You still have to deal with serialization of all of your business objects, unless you're planning on reloading and re-executing all transactions since the beginning of time every time you restart the server. You can do it less frequently at runtime, but that doesn't save you any development time.
Oh, please. You really feel that making a Java object serialiable is just as hard as writing SQL load and store code? I have written persistence frameworks that work both ways, and the SQL code is much, much more complicated.
If you haven't implemented locks for an object model, then you haven't lived. Seriously, I can see a lot of people screwing this up with deadlocks galore. Locking up concurrent systems can be a nightmare.
Then just wrap all of your lock-sensitive stuff in Prevayler command objects. They've got that working fine, and it guarantees isolation.
Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
Oh, please. If you really need SQL compatability, then dump the data occasionally to a data warehouse, which is where you should be doing unconstrained ad-hoc queries anyhow.
Or if it's so the programmers can peek at the live system, then put in something like BeanShell, which will let you see a lot more than just the persistent data.
Or you could drop an SQL interpreter into your system and present your objects as tables. Many of the pieces are already open sourced, so it would be pretty easy.
Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Can you really not think of ways to write these things in flexible ways? If that's the case, you could learn something about being a programmer. Pick up Martin Fowler's Patterns of Enterprise Application Architecture.
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
Perhaps you should try it before knocking it. As you are, in order, wrong, mostly wrong, wrong, and confused. It's no magic bullet, but it's a useful approach for some systems.
Prevalent may be "fast". However, it seems to be limited to very small and non-critical systems. That tends to limit it's usefulness. What speed it may have now will very likely be quickly reduced if both of those issues are addressed.
If your data set doesn't fit in RAM, then Prevayler is a poor choice now. Wait a year for the 64-bit machines to become common, and a lot more systems will fit in Prevayler's envelope.
But there's no reason a Prevayler system can't be just as reliable as system with a relational database. Indeed, it can be much more reliable: the more pieces a system has, the higher the odds of failure.
This was compared to an Oracle db running in RAM. Who would spend the money for an Oracle db (and an Oracle admin) for a database small enough to fit in RAM?
You must not have worked for a large corporation. I have seen people use a half-million dollars in hardware for data sets well under 1 GB.
1) What about Swapping? I know that you would by limited by physical ram (which IS getting cheaper) but couldn't you also get a really large virutal memory space and utilize that?
I had this thought, too. Alas, it turns out that a swapping Prevayler system is slower than using an RDBMS.
I suspect there are two problems. One is that JVMs probably allocate objects any old place in RAM, rather than clustering related objects. The other is that the swapper isn't tuned for this kind of thing.
So for now, Prevayler's only a good choice if your data fits in RAM.
one of the first things I realized is that I would have to design a new method of querying
Yes! But once you get the hang of it, it turns out to be so much better.
Once the Prevayler site's pipe isn't slashdotted, you should check out the various indexing and search frameworks they mention there. They're a good way to get started kicking the SQL habit.
The smaller and less complex a database is, the less important it becomes to tweak the design for performance benefits. Is this a case of an optimization that only works when optimization is unnecessary?
Well, quite a lot of data fits in RAM these days. 1-2 GB of data can be a whole lot of data. And as the 64-bit systems become reasonably priced, then that number will look more like 10-20 GB.
But even if you don't have a lot of data, you still win big in reduced program complexity and reduced development time. It's also nicer from an operations perspective. No database server means one less process (or box) to babysit, and one less thing to patch and upgrade.
You'd still need to implement searches, and how do you search an collection of objects without placing them on the relational line.
If you don't know how to do that, you don't know how to program.
Amen, brother!
It seems like a lot of people can't imagine handling anyting more than about two strings without an SQL server to help them out. Good thing Larry Ellison doesn't sell crutches to people with broken legs; he'd make 'em so comfortable that they'd never learn to walk again without 'em.
"How do I search?"
Well gosh, how about you iterate through them and find the ones that match?
"But that's slow!"
Have you tried it? As Knuth says, premature optimization is the root of all evil. There's no since reducing your search time from 100 microseconds to ten microseconds if your GUI widgets take 500,000 microseconds to render the result.
But if you've got numbers from your profiler that say that this is a problem, then crack open one of those unread CS textbooks. Back in ye olden dayes, they came up with some smart stuff.
To license Oracle with similar features to Prevalent, you would only be looking at a 5 figure pricetag.
Don't forget the price tag for all the extra hardware; since a Prevaylent system is thousands of times faster, you can get by with a lot less hardware. And add in all the programmer time spent dealing with SQL. Oh, what about the DBA's salary?
How well does Prevalent do on 30TB+ datasets?
One doesn't use Prevayler for systems like that. Prevayler makes sense if your data can fit in RAM. If it doesn't, you should do something else.
But note that "something else" doesn't have to mean some SQL thingy. Google has a metric shitload of data, and you can bet they don't keep it in an Oracle server.
You don't use an RDBMS because it's fast. You use it because it's reliable.
Prevayler can be just as reliable.
Does this new toy support record locking, transactional isolation and integrity, or any of the other key features that an enterprise RDBMS provides? If the answer is no, then it's not a replacement for an RDBMS.
Wrong.
The question isn't the checklist of features, it's whether you can build equivalently reliable systems with Prevayler. The answer: You can.
You'll recall that Prevayler uses the Command Pattern. Before data is changed, the Command object is serialized and written to disk, then executed. Naturally, this means the commands are run in strict order of arrival, yes?
That's all you need to get transactional integrity. All writes are isolated. If you need to isolate the reads, you can use the same mechanism.
The prevalent approach requires developers to do things a little differently, but you don't have to sacrifice reliability.
They use servers which hold data, so sure, I guess you can call them datatbases. But I'm saying those aren't relational DB servers, and the don't speak SQL. If you have any evidence otherwise, by all means speak up.
Personally, I'd be strongly against it. Relational servers don't make much sense in their context, for a zillion reasons:
As far as I know, no major search engine uses relational, SQL-driven databases. Certainly AltaVista and Inktomi didn't. If you know of one, I'd be intrigued to hear about it.
A modern RDBMS only has to write the transaction state to disk (redo or undo).
No, you have to write the changes to the tables out sometime. The log is just a way to speed up the ACK. Prevayler, on the other hand need only write the data model out occasionally, like once a day.
For equivalent write speed, Prevayler should thus use less hardware.
Modern storage hardware is still not as fast as physical onboard RAM. "pretty zippy" still doesn't cut it. A fibre channel connection is going to top out at 400M/sec [...]
I'm afraid I don't see your point here. Prevayler is, practically speaking, only useful for datasets on the order of a couple of gigabytes. If you can write an update log at 400MB/s, then you'll be able to completely change your dataset every few seconds with Prevayler.
There are probably applications that have such a small data size relative to update volume, but I've sure never seen one. Certainly they are not very common. If they exist, they probably shouldn't use Prevayler.
That latency will translate into latency in your data storage engine.
If you use battery-backed RAM cache for the writes, then there's no real latency issue until you get close to your I/O bandwidth limits. This was a common high-end file server trick a decade ago, and I'm sure it's still available.
If the latency of storage devices aren't an issue for you, you simply aren't doing interesting transaction volume.
By that definition, I'd guess less than 1% of code out there must be interesting, at least to you. Which is fine; Prevayler's probably not right for you. For the other 99% of us, Prevayler's worth a look.
Indeed, this is the key assumption. If you can live with this, the database size requirements and give up ad hoc queries, then it seems very attractive.
:-)
Exactly! And the last one may not be too hard to fix; I have sketches in my notebooks for bolting an SQL engine onto an arbitrary object model, so that you can query the live data with Crystal Reports. And I have hopes that I can steal most of the SQL stuff from one of the open-source SQL engines.
I'm glad it's not as bad as I feared; I wish the promotional material was more clear on the technical details instead of being "why are you still using a database".
Agreed! Of course, if it weren't a controversially-phrased manifesto, then a lot fewer people would have heard about it.
But Prevayler's main site is a Wiki, so you are welcome to edit it to make it less confusing. Or you could take your knowledge and write an article for somebody like IBM's DeveloperWorks: "Prevayler: Not As Crazy As It Sounds". They pay well!
The only way to gaurantee that you've written what you want to replay is to do some sort of synchronous IO operation. THIS is why transaction logging is a problem.
If you want reliability, you have to do syncronized writes. There's no way around it, whether you use a DB or not. But Prevayler should be faster at writing than the equivalent DB on the same hardware, as the DB has to write multiple places on a disk, whereas Prevayler just appends to a log. If you favor speed over reliability, then it's a one-line code change to turn syncing off in Prevayler.
But as I said, adding bytes onto the end of an open file is pretty zippy on modern hardware, so that won't be a bottleneck for most people. And if it is, then there are many reasonable hardware solutions to accelerate writes safely.
That's the whole point of something like Prevalayer: high transaction volume.
No, the whole point of Prevayler is that if your objects are just all in memory, then it's much easier to code. That it's thousands of times faster than using a DB is a nice bonus.
How much does the software trust the underlying OS and hardware?
Prevayler does the proper paranoid thing, syncing after every write.
If you want less paranoia, you could easily do that; it's a one-line code change. Or you could just ask your operating system to do it.
If that ends up being a speed issue, then there are plenty of ways to improve IO. But really, putting bytes on the end of an open file is pretty fast these days, so I doubt many will hit this bottleneck in practice.
What if your command is slow? Safe would be to serialize all commands, but I doubt that's what they do based on the performance. It appears they just don't acknowledge the issue. This means that the log written to disk might generate different final snapshots depending on the relative execution time between two different reconstructions.
Prevayler applies commands in strict linear order. Thus it's completely deterministic, and therefore most of your worries aren't an issue.
I think the only remaining one is long transactions. Since Command objects are executed one at a time, a slow command will cause writes to block. The solution: Don't use Prevayler Commands for long transactions. If you need to do something that takes a while, then you should do that first, build up the Command, and then submit that for execution.
So again, use the right tool for the job, and don't buy into their ~"you won't ever need a dbms again"~ marketing hype.
Until you know what this tool is, you'll have a hard time telling what it's good for. Perhaps you could start by reading the FAQ, where a lot of your issues are covered.
Basically what I'm saying is that instead of allowing the programmer to imbed the statment insert Foo ( bar, baz ) values ( @x, @y ) in his code,
Ah, I think this is the source of the confusion.
I wouldn't think of doing a system with persistent data where programmers were just writing stuff to the persitent store willy-nilly. He should just be calling foo.save() and maybe transaction.commit(). The persistence layer should take care of everything else. Then I keep the data integrity code in Foo (where I should have it anyhow).
It's my experience that when you let applications programmers write SQL, you get crappy databases.
Agreed. The persistence layer should be written by somebody with clue, in such a way that the often tricky details of a particular RDBMS are safely hidden.
There's no one language that all programmers know. No one language is suited to all tasks.
Sorry, my original statement must have been poorly phrased. What I meant was that on a typical project, the ratio of application programmers to DBAs is circa 10:1. Thus, it makes sense that core business logic (which includes the kinds of data consistency stuff that often ends up in stored procedures) should be in the language the application programmers know, leaving the DBA to focus on the database, not the business logic.
Even a fairly trivial database-backed CGI web app [...] Perl for the CGI [...]
Oh, my comments should only be taken as applying to OO systems. (Prevayler comes from a very strong OO background, so I'd taken OO architectures as a given.) CGI is very procedural, and despite their best efforts, so is Perl. In procedural languages (or with developers who insist on writing procedural code in OO languages) then the database is really the only point to put a lot of the data integrity checks you mention, so I'm happy to use stored procedures there.
You (or at least *I*) don't let apps rummage around in raw data -- all database access (reads and modifications) goes through stored procedures. The user should never have permission to touch the tables directly - all he needs to have is execute permission on a specific set of SPs.
Yes, but that puts core business logic in a place that's hard to reach, hard to test, and hard to work with.
Wouldn't it be better to have it in a lanugage that all your programmers know? In something that's not locked into a particular vendor? And in a full-fledged programming environment?
Extreme Programming seems to have lost its impetus, or am I just not up to date? :)
You're not up to date. New user groups are still popping up, conference attendance is increasing, and lots of people are adopting it. I'm helping convert another two teams to it right now.
What's died down, thank god, is the hype.
Re-read what I wrote, paying attention to the highlighted words.
Ooops! My mistake.
whereas an RDBMS schema can be wrappered by multiple independent object models
True, but is that a bug or a feature?
One of OO's big wins is that if you wrap your data in code, you gain a lot in reliability and flexibility. By letting lots of apps rummage around in the raw data, you lose encapsulation entirely.
Even worse, once multiple code bases are massaging the same database, your schema is pretty much set in stone. Keeping a database in sync with one set of code is work; simultaneous synchronized refactoring to multiple codebases gets exponentially harder.
Sure, you can do all sorts of database tricks with stored procedures, views, and data warehouses to try to juggle these problems. But in essence this turns a database into an application server or an integration layer. But all the code is written in a non-standard, procedural, vendor-specific language that has poor tool support.
So personally, I prefer architectures where when you need an app server, you use (or write) an app server. If you need an integration layer, you use (or write) one of those. And when you need a database, by all means, use a database.
In medicine, that's a joke, but in programming, it's often the truth.
If somebody's got an SQL statement combining all of that, then they've just created write-only code. Nobody coming upon it will get it. A month later, even the author will have to puzzle it out. As Martin Fowler says, "Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Now maybe it's required sometimes in an SQL system so that you don't want to pass too much data over the pipe. But in a Prevayler app, all the data is right there in memory. If you're writing good code, finding data in memory generally just isn't that hard. Just like everything else, you build your complex functionality up piece by piece from simple functionality. If you have code that looks like that SQL statement you mentioned, then it's not the sign of a clever programmer, it's the sign of a bad object model.
In the cases where it is truly a hard problem, SQL may not help anyhow. You can bet that the FPS games don't decide what objects to render by writing SQL queries. And I'm sure that Google isn't doing any "SELECT * FROM Documents WHERE
Notice that they don't appear to mention recovery time, or the size of the changelog.
Well, I could see why; this depends a great deal on your application. My guess is that this would be IO-bound on a typical system. Java's serialization is pretty zippy. The speed of the Command objects depends entirely on how you write them, of course, but if they're done right they'll be quick as well. Why don't you do some timings and put it up on their Wiki?
If recovery time is really an issue, it's also pretty easy to build hot clones on top of Prevayler. The command objects are serialized to be written to disk, yes? So as you write them to disk, you also write them over the wire to another server, which applies them as well.
But really, Prevayler systems should go down less. The more pieces your system requires to function, the more frequent your outages. Elminating the database server means that you lose all of the failure cases where something happens to that box.
They also don't mention the level of assurance.
The commands are logged before execution. If the server dies, you load the last snapshot and replay the log. What other assurance would you like?
Saying "this kicks Oracle's butt" is a distortion though: it solves different problems than traditional databases.
Well, I've certainly seen a lot of people with the notion that it was completely impossible to build a system for 1 GB of data without using a datatbase server, so I think there's some overlap in the market. But if you go beyond RAM-sized data sets, you're right that Prevayler is a bad choice.
Use the right tool for the job.
Amen! I like SQL databases a lot when I need them. But I'm sooooo tired of people who believe that the history of computing began (and ended) with Oracle.
These people have created a system for people who don't really give a f*ck about their data. However, they choose to measure the performance of their system against one that is specifically geared towards data integrity.
Once a night (or however often you want), Prevayler snapshots the whole object model. Then, every atomic change is encapsulated in a Command object. Before the command is executed, it's serialized and written to a command log.
If there is a crash, you just bring up the latest snapshot and replay the log.
Where do you see the potential for data loss?
Sure there is. In order to get a "robust" system, you're going to have to touch disk. Your level of acceptable transaction loss will determine how often that is. If that is too often, one should expect the performance of Prevayler to go out the window.
Well, you're welcome to demonstrate that. Adding bytes onto the end of an open file is a pretty fast operation, so the upper performance bound should be pretty high even on moderate hardware.
If that becomes a limitation, then there are plenty of fancy hardware options to speed up writes in safe ways, everything from fancy RAID setups with battery-backed write cache to just keeping your command log on a battery-backed solid state disk.
But really, I doubt that would happen much in the real world. Remember that Prevayler is meant for systems with moderate amounts of data; since it all has to be in RAM, you're pretty much limited to a couple GB of data. If your writes are coming so fast and furious that you can't log the update commands on a regular disk, then pretty soon you'll have more data than what you can fit in RAM anyhow.
What about lost updates or ghost reads? Part of what a database has to do is prevent uncommitted transactions from being visible (this is why everyone argues MySQL isn't really a database).
That's a good question. Prevayler Command objects, which is a unit of work sort of like a transaction, get executed in a continuous stream. Since all the data is in RAM, they go very quickly.
This gives you two basic options:
For a lot of applications, you just take the risk of showing slightly inconsistent data. Imagine a discussion system like slashdot. If the top of the page says 583 posts but you end up with 584 posts on the page because one was added during the microseconds you are rendering the page, then so what? Given the post-to-view ratio, odds are low that it will happen, and if it does, nobody will care.
But if you must have consistent data, then you take advantage of the fact that Prevayler executes commands one a time. Just make sure your reads happen in between modify operations and you'll be fine.
Ok, I'm going to assume that you're having a bad day or something, rather than leaping immediately to the conclusion that you're a flaming asshole who comes here because he likes insulting people and doing a lot of nay-saying.
I build real live working systems, and I've been doing it quite a spell. Prevayler is one good way to do that. It's not the only way, and it's only good in certain circumstances.
If you really think that finding and indexing your objects is the hard part of development, and that having somebody do that for you is what really matters, then Prevayler probably isn't for you. Personally, in the systems I build, it hasn't been a big deal.
If you're interested in learning about new ways of tackling things (which is part of why I come to news sites like this) then you may want to give Prevayler a try. Even if you don't ever use it in production, it's a very interesting viewpoint. And if you're happy with the tools you use now, then by all means, stick with 'em.
Unless I had a product that had an extremely specialized use case that matched OODB strengths, I would NEVER develop on this kind of platform again.
You did notice that this article has nothing to do with OODBMS products, right? Prevayler has no database. But I'll address your points directly:
It is very difficult to "see" an OO database. By nature, the data isn't tabular. It's a persistent object heap. There's no "SELECT * FROM USERS". So tracking down data-related problems involves exporting data to an XML file and sifting through it.
This hasn't been a problem for me. For system inspection, you can use something like BeanShell to wander the data. Or you could spend a day or two and write something that used introspection (like java.lang.reflect) to let you wander the data.
Reporting tools don't exist for OODB. Try hooking up Crystal or another reporting tool to this. You end up writing every report from scratch.
You would never let random yahoos run queries against your production database servers, I hope. Which means a data warehouse. In which case, you can export your OO data to an SQL data warehouse happily.
DB Performance when querying outside the normal object hierarchy
Common queries should be indexed. That's true no matter how you keep your data. It's not hard to do in Prevayler.
32-bit memory limited our max customer size dramatically.
Yes. Prevayler should only be used on systems where all data fits in RAM. That's not all systems. Yet.
Object prevalence does nothing to change that. You still have to deal with serialization of all of your business objects, unless you're planning on reloading and re-executing all transactions since the beginning of time every time you restart the server. You can do it less frequently at runtime, but that doesn't save you any development time.
Oh, please. You really feel that making a Java object serialiable is just as hard as writing SQL load and store code? I have written persistence frameworks that work both ways, and the SQL code is much, much more complicated.
Why no Objective-C?
Why haven't you ported it yet? The Prevayler core is only a few hundred lines of code, even in Java.
If you haven't implemented locks for an object model, then you haven't lived. Seriously, I can see a lot of people screwing this up with deadlocks galore. Locking up concurrent systems can be a nightmare.
Then just wrap all of your lock-sensitive stuff in Prevayler command objects. They've got that working fine, and it guarantees isolation.
Goodbye Crystal Reports, Goodbye English Query, Goodbye ANY Ad Hoc query support, because if you need anything different, you're going to have to write a lot more code to enumerate throughout your objects. Have fun.
Oh, please. If you really need SQL compatability, then dump the data occasionally to a data warehouse, which is where you should be doing unconstrained ad-hoc queries anyhow.
Or if it's so the programmers can peek at the live system, then put in something like BeanShell, which will let you see a lot more than just the persistent data.
Or you could drop an SQL interpreter into your system and present your objects as tables. Many of the pieces are already open sourced, so it would be pretty easy.
Indexing - I hope you have a good B-Tree library and are familiar with Indexing/Searching algorithms when implementing HARDCODED indexing. Oh yeah, have fun rewriting all of your query procedures when you decide to change your hardcoded indexing.
Can you really not think of ways to write these things in flexible ways? If that's the case, you could learn something about being a programmer. Pick up Martin Fowler's Patterns of Enterprise Application Architecture.
In all seriousness, this is a bad idea for 99% of projects out there. It's inflexible, unscalable, severely error prone, and timely to implement.
Perhaps you should try it before knocking it. As you are, in order, wrong, mostly wrong, wrong, and confused. It's no magic bullet, but it's a useful approach for some systems.
Prevalent may be "fast". However, it seems to be limited to very small and non-critical systems. That tends to limit it's usefulness. What speed it may have now will very likely be quickly reduced if both of those issues are addressed.
If your data set doesn't fit in RAM, then Prevayler is a poor choice now. Wait a year for the 64-bit machines to become common, and a lot more systems will fit in Prevayler's envelope.
But there's no reason a Prevayler system can't be just as reliable as system with a relational database. Indeed, it can be much more reliable: the more pieces a system has, the higher the odds of failure.
This was compared to an Oracle db running in RAM. Who would spend the money for an Oracle db (and an Oracle admin) for a database small enough to fit in RAM?
You must not have worked for a large corporation. I have seen people use a half-million dollars in hardware for data sets well under 1 GB.
1) What about Swapping? I know that you would by limited by physical ram (which IS getting cheaper) but couldn't you also get a really large virutal memory space and utilize that?
I had this thought, too. Alas, it turns out that a swapping Prevayler system is slower than using an RDBMS.
I suspect there are two problems. One is that JVMs probably allocate objects any old place in RAM, rather than clustering related objects. The other is that the swapper isn't tuned for this kind of thing.
So for now, Prevayler's only a good choice if your data fits in RAM.
one of the first things I realized is that I would have to design a new method of querying
Yes! But once you get the hang of it, it turns out to be so much better.
Once the Prevayler site's pipe isn't slashdotted, you should check out the various indexing and search frameworks they mention there. They're a good way to get started kicking the SQL habit.
The smaller and less complex a database is, the less important it becomes to tweak the design for performance benefits. Is this a case of an optimization that only works when optimization is unnecessary?
Well, quite a lot of data fits in RAM these days. 1-2 GB of data can be a whole lot of data. And as the 64-bit systems become reasonably priced, then that number will look more like 10-20 GB.
But even if you don't have a lot of data, you still win big in reduced program complexity and reduced development time. It's also nicer from an operations perspective. No database server means one less process (or box) to babysit, and one less thing to patch and upgrade.
Amen, brother!
It seems like a lot of people can't imagine handling anyting more than about two strings without an SQL server to help them out. Good thing Larry Ellison doesn't sell crutches to people with broken legs; he'd make 'em so comfortable that they'd never learn to walk again without 'em.
"How do I search?"
Well gosh, how about you iterate through them and find the ones that match?
"But that's slow!"
Have you tried it? As Knuth says, premature optimization is the root of all evil. There's no since reducing your search time from 100 microseconds to ten microseconds if your GUI widgets take 500,000 microseconds to render the result.
But if you've got numbers from your profiler that say that this is a problem, then crack open one of those unread CS textbooks. Back in ye olden dayes, they came up with some smart stuff.
To license Oracle with similar features to Prevalent, you would only be looking at a 5 figure pricetag.
Don't forget the price tag for all the extra hardware; since a Prevaylent system is thousands of times faster, you can get by with a lot less hardware. And add in all the programmer time spent dealing with SQL. Oh, what about the DBA's salary?
How well does Prevalent do on 30TB+ datasets?
One doesn't use Prevayler for systems like that. Prevayler makes sense if your data can fit in RAM. If it doesn't, you should do something else.
But note that "something else" doesn't have to mean some SQL thingy. Google has a metric shitload of data, and you can bet they don't keep it in an Oracle server.
You don't use an RDBMS because it's fast. You use it because it's reliable.
Prevayler can be just as reliable.
Does this new toy support record locking, transactional isolation and integrity, or any of the other key features that an enterprise RDBMS provides? If the answer is no, then it's not a replacement for an RDBMS.
Wrong.
The question isn't the checklist of features, it's whether you can build equivalently reliable systems with Prevayler. The answer: You can.
You'll recall that Prevayler uses the Command Pattern. Before data is changed, the Command object is serialized and written to disk, then executed. Naturally, this means the commands are run in strict order of arrival, yes?
That's all you need to get transactional integrity. All writes are isolated. If you need to isolate the reads, you can use the same mechanism.
The prevalent approach requires developers to do things a little differently, but you don't have to sacrifice reliability.