Prevayler Quietly Reaches 2.0 Alpha, Bye RDBMS?
"We've used relational databases for years despite incompatibilities in SQL implementation. Accessing them from an OOP paradigm has been so tedious, that Object-Relational mapping technologies have sprouted all over the Open Source landscape. Some competing examples and models are Hibernate, OJB, TJDO, XORM, and Castor; which in turn have supporting frameworks such as Spring and SQLExecutor. Because SQL is the dominant form of interfacing with the data in an RDBMS, there's now a specification to offer it a friendlier OO face.
Most of the above, including the SQL-variants, arguably appear to add yet another layer of complexity (even if only at the integration level) where they should be taking complexity away. These solutions are put together by some very smart people, but it's inescapable to get that feeling someone is missing the forest (simple answer) because all the trees (incompatible models) are in the way. If there are so many after-the-fact solutions attempting to simplify relational database access and manipulation from OO, isn't it reasonable to think that there is something generally wrong with trying to cobble-together two disparate concepts with what are essentially high-caliber hacks? Is Prevayler a better way?"
Is that really possible? How do you even benchmark that?
I'm all for trying to use Slashdot to promote your pet project, but don't couch your story in questions about people's use of your admittedly relatively unknown software.
taken! (by Davidleeroth) Thanks Bingo Foo!
This would be great for projects where interoperability isn't an issue or only occurs via edge connections like SOAP. However, I generally would be wary of a "database" which is only accessible in Java, via unique interface. What do you do with your Crystal Reports users? How do I get this into data cubes for analysis?
Frankly, this is simply a persistance layer with some nice properties. It *isn't* a database. A database stands at the center of your applications and makes itself available to as wide of an audiance as possible. It shouldn't limit your choice of tools in such an absurd manner.
Sig under construction since 1998.
A direct quote from your Wike:
Prevalence requires us to have enough RAM on our servers to contain all our business objects.
What do you do if you have a nice big data set that won't fit in memory? Businesses my company works with have millions of customers. Do they have to have all those gigabytes of data in memory to do anything?
Don't come at me with yet another memory resident thing that's supposed to be the greatest ever, when it doesn't come close to addressing the real needs of a database user.
Big fucking deal. This is all in-memory querying from the same process. What about querying from another process or across a network from another machine?
If you could cache an entire MySQL database in RAM, I'm sure your MySQL performance would improve dramatically.
If you could then optimize MySQL's search routines for working on memory instead of disk blocks, your MySQL performance would improve even more dramatically. As it is now, MySQL must go through all sorts of contortions, probably using B-Tree-like structures for indexing, and other fanciful datatypes I can't even conceive of without a PhD. The reason for all of this silliness is the fact that MySQL's backing store is disk, not memory.
In a prevalence system like Prevayler, one of the fundamental tenents of the system is that ALL of your objects are ALWAYS in memory, and are only serialized to disk when they change, for persistence.
So...yes: as always, a memory-based system will be three orders of magnitude (or more!) faster than a disk-based system. Prevayler vs. DBMS is no exception to this rule.
But when your website has grown popular, your prevalance database has swelled to 30 gigs and you find yourself hosting it on massive systems with 12 gigs of core memory and another 30 gigs of swap space -- when your memory access times are starting to look like disk access times because of swapping -- well, don't come crying to mwe.
Prevalence is a brilliant solution, for small projects. But they only scale to the size of your physical memory, or slightly (50-100%) larger. You can't expect them to scale beyond that.
I'd be very interested in this, except for the single fact that's it's Java?
I may be offending some people here, but I hate Java. After having worked with it for a couple of years I hate it even more.
I've often had the need to store objects in other languages (Perl, PHP) and I'd have to say I've had a bit of difficulty mapping the data into an RDBMS, but not enough trouble to make me want to switch languages.
Actually, I don't have a huge problem with the Java language itself, just that it has to run in a VM, it's slow, takes ages to compile, has crappy string handling (compared to Perl/PHP but better than C)..
Yuck!
Anything is possible, except skiing through revolving doors.
Geez, this just seems stupid. They want you to store all your data in RAM and save it to disk once a night via 'plain object serialization' If they really are using 'plain object serialization' in java then all data-access functions are going to lock, which means you won't be able to update anything during the serialization stage. And if you have a system crash, you lose all your data for the whole day.
If they had any sense then they would have everything saved to disk (like a journaled file system) and the 'serialize once a day' thing wouldn't be an issue.
And lets not forget the fact that object serialization is slow. I once tried to build a website using java collections and serialization, and it would take nearly half a second to save the whole sites data, with a 'database' of only a few thousand entries. I can't imagine long it would take to save the hundreds of megs of data autopr0n uses.
Maybe they've solved some of these problems, but it still sounds stupid. There are "real" OODBMs that let you keep your data in an OO system and let you keep that data on a hard drive, like a sane person.
autopr0n is like, down and stuff.
Please don't go around saying, "Could this be the end of RDBMSes?" That is just a crock of shit, and it really bugs the hell out of me. How stupid do you think we are to have a tagline like that?
Please watch "Bowling for Columbine" and it says that this type of exaggeration by the media, and most notably US media, is driving the Americans crazy paranoid with fear. It seems like the editors have fallen for the same thing, and it should stop now!
Let's have a modicum of dignity and avoid all the hyperbole, please. We are all somewhat tech-savvy, please treat us with the respect that we deserve!
I'm sure the technology is good, but for crying out loud, mainframes are still around 40 years later. RDBMSes are going nowhere for the next 20+years until true AI comes around.
If they just benchmarked reads ... then the results don't tell much.
The Raven
Nice of you to mention Crystal Reports. I still am able to sell dBaseIII and ParadoxDOS programs, and I do quite well. They're great for implementing business rules.
I've never been able to make heads or tails of the Python content management systems. I read all that stuff Rueven Lerner publishes in Linux Journal, what the hell is he talking about?
I write programs to manage volunteers, clients, keep track of both inkind and dollar donations, MySQL and php work OK. I'll never figure out Zope and any of that other OO stuff. It's all Scientology to me.
I'm not about to give up PostgreSQL.
An RDBMS:
* Allows a wide variety of applications to operate on the data, in a wide variety of languages, at different times or simultaneously.
* Allows you to manipulate the data inside the database before you get it.
* Allows for a lot more storage, which is sometimes important when you need the memory for some other task.
What I like about an RDBMS (like postgres) is that my requirements constantly change. I try something for a while, then we have better ideas, but we need to work with our existing data. An RDBMS allows me a huge amount of flexibility with my data (also the reason I don't use MySQL...) and I've been able to drastically change the way my application works while still making use of the data that I have.
Maybe this is an OK database for some applications where you have the entire thing laid out in a perfect spec with no chance of a change. However, when I need to get at my data with another language, or it takes up more space than I thought, or I figure out that my application needs to change the way it works, I have no clue where to begin with a huge collection of "objects" that now happen to be obsolete. If I have relations, I have a solid representation of my data that an RDBMS can manipulate efficiently according to a fairly mature mathematical model. I'll take that over a collection of persistant objects any day.
That said, I think this has application in areas where you just need some persistent objects, and they need to be fast, and they don't take up much room. I don't encounter that very often, and when I do I usually just use postgres because it seems fast enough when the tables are cached. I suppose if the objects are really intricate this would be a nice system, because you wouldn't have to spend so much code on a mapping. It just seems so much more narrow as far as usefulness.
Social scientists are inspired by theories; scientists are humbled by facts.
Of course it will suck if your project doesn't fall in those categories. Everyone can see that.
Sure, everyone except the person that posted it and the editor that let the story through. I'll admit I got sucked into reading more because of the overly generalized claims. "Bye RDBMS?" in the headline? Gimme a break. Prevayler is an incredibly specialized tool that won't be able to solve many real-world problems currently solved by an RDBMS.
Prevayler seems like a great idea. It doesn't need the deceptive hype. How about "Bye RDBMS for projects with fit-in-RAM databases?"
A quick search on the wiki showed no hits for the word 'report'.
Note that the classic problem with object databases is that they focus on transactional queries, and that DSS or reporting queries are either too slow or too difficult to perform.
So, yeah it sounds nice if you want *both* an object database and a relational one. Not a bad solution if you already have a data warehouse on the side. But if you don't it just a lot of extra work.
Next...
Because the MOP means CLOS is infinitely mutable, and therefore you can mould the properties of CLOS to fit better with relational theory, with accessor methods capable of arbitrary joins and whatnot?
:db-kind :key :type integer :initarg :emplid) :accessor first-name :type (string 30) :initarg :first-name) :accessor last-name :type (string 30) :initarg :last-name) :accessor employee-email :type (string 100) :nulls-ok t :initarg :email) :type integer) :accessor employee-company :db-kind :join :db-info (:join-class company :home-key companyid :foreign-key companyid :set nil)) :type integer :nulls-ok t) :accessor employee-manager :db-kind :join :db-info (:join-class employee :home-key managerid :foreign-key emplid :set nil)))
;; Employees of Widget's Inc. :where [and [= [slot-value 'employee 'companyid]
Because CL has restartable conditions, not mere exceptions, thus making preservation of transactionality orders of magnitude easier?
Because CLOS is multiple-dispatch, so you don't have the conceptual problem of methods "in" classes, to confuse you when you try to operate on the results of an SQL join?
Because a few lisp macros let you merge an sql sublanguage into lisp?
Seriously, check out uncommonsql!
(sql:def-view-class employee ()
((emplid
(first-name
(last-name
(email
(companyid
(company
(managerid
(manager
(:base-table employee))
(sql:select 'employee
[slot-value 'company 'companyid]]
[= [slot-value 'company 'name]
"Widgets Inc."]])
These folks are either very naive, or very silly.
They claim there's no need for two-phase commit (2pc), as though the only systems they need to interact with are (or will be) prevaylor.
Umm, hello. How about that 50TB database with all our transaction history? You gonna put that in your RAM-based database? No? Well, what happens when you need to do an insert into it, but commit only if the insert and the local transaction succeeds?
Hell, forget the 50TB database, what about the little Oracle database the guys down the hall use? Or the asynchronous queue that you post into?
It's a much bigger world than just your little project, guys, and you have to fit into it. 2PC is not an option. It's a requirement.
The whole "let's keep it in RAM" is cute, and for a lot of projects is probably all you need, but for any kind of large data set you just can't buy enough RAM to hold it all. Once it goes to disk, there's a whole new set of problems.
Also, the fact that you're responsible for defining and managing your transaction boundaries is really lame. It's not that hard to build check-in/check-out logic that can be used.
Come back when you have a real system that can handle real load with real datasets. Until then, I'll keep my RDBMS. You may have performance beat on the tiny systems, but who cares? THEY'RE TINY SYSTEMS!
Oh dear, here I go OT again.
/.'s comment moderation system, why doesn't any site seem to combine the two?
/. at 4 or 5 and get mostly interesting discussions and reasonably thoughtful commentry (depending on the article), but I have to wade through the duplicate articles, blatant product promotions and stuff like this that smells strongly of astroturf (hmm, does Astroturf smell of anything?)...
Not a flame, but... while I like K5's article moderation system and
k5 often has interesting articles but the comments are just pages full of trolls and flamebait, with the occasional interesting comment thrown in that I can't be arsed searching for in all the garbage.
OTOH I can browse
I don't mean to say anything against their product, which as far as I know is the greatest object persistence scheme ever hatched.
But they clearly don't know what a database actually is; they're confusing the issue with services that an RDBMS happens to perform as part of its job. It has always been possible to write procedural code that is faster than database queries because underneath a query is turned into a sequene of operations. When building a system to answer a single question, the system will always be faster without the database layer. Building a hash table or B-Tree to do a simple lookup simply can't be beat.
People have lost sight of history. Years ago, we used to keep our data in indexed files, and guess what? They were faster than databases of the day at doing the tasks they were designed to do.
However while databases are slower, in many cases much slower, than procedural code, they have an important property: they can be used to answer unanticipated questions acceptably quickly. How quickly is acceptably quickly? Well, if the database can come back with an answer faster than it takes a skilled programmer to come up with a special purpose program to answer the question, it has done its job. Compare this to their answer to the querying problem: write a java program. And that's a fine answer if having to answer an unanticipated question is a relatively rare event, which no doubt it is for many applications.
This gets to what a database is: it is a collection of information that that is organized to make reuse simple and efficient. This is different from business object re-use, which is about re-using logic; this is about re-using facts. Relational databases are unequaled at this task because they are based on sets and mathematical operations that are closed on these sets. This allows both the user, but more importantly the system's optimizer, to create sequences of operations that meet the user's requests.
These people may have a wonderful system for a lot of purposes, but they're really talking about a particular set of applications, for which their system might be better than storing object data in a database. Probably is, as far as I know. But really. No need for rollback because "transactions are instantaneous"? Well, they would be right if transactions really were instantaneous. HOwever, while their test case may be so fast they appear instantaneous, that's a long way from actually being instantaneous. In the real world bound by the laws of physics, transactions take finite time. Given enough objects to update, or high enough system load, or both, you will have to either (1) accept a possibility, albeit small, of inconsistent transactions being processed or (2) lock all the objects that might affected by a transaction, with attendent possibilties of deadlocking.
In short, I wish them luck; it sounds like they are producing some interesting and useful stuff. However, it isn't a database or a replacement for a database.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Y'all may want to take a look at for some better (even better than the Prevayler website) information on the problem space that Prevayler is good for.
It seems to me (in agreement with every commentor who has mentioned that this parent (Slashdot) article's title is misleading and inappropriate) that Prevayler's strength is in making it easy to persist *business objects*, not do away with the RDBMS entirely. If you've a strategy that involves tabular data as well as objects, and doesn't absolutely require a closer-than-reference coupling of the two, then Prevayler might help you out. Particularly if the business objects mutate very often during runtime.
I can envision, for example, raw data stuck in normalized tables in an RDBMS, with the *roll-ups* stored in Prevayler objects persisted elsewise.
"The Devil does not know a lot because He's the Devil, He knows a lot because he's old." -- unknown
The "Prevayler Team" has written a persistent HashMap with a redo log, using the command pattern. This is exceptionally trivial and is in no way comparable to a database. A database has things like: 4GL query language, referential integrity constraints, data integrity, queryable metadata, separation of logical and physical layers, data independence, declarative rather than imperative querying, dynamically assembled queries, and gazillions of other things. These are the real features that we mean when we say "database." These features are absolutely necessary. Prevayler includes none of them. It is an extremely trivial persistent HashMap, that's all.
Thus, when the prevayler team says "throw away your database," I must assume one of two things. 1) They're trolling for publicity by saying outrageous and purposefully stupid things. Or 2) They are shockingly, mind-numbingly naive, and they don't know what a database is or what it does.
The author of Prevayler wrote this about himself: "Carlos Eduardo Villela is a 19-year old Brazilian graduate in Information Systems... almost 8 years experience has made him a Java and Python enthusiast."
Thus, I have to assume that the authors are mind-numbingly naive. Don't get me wrong, I'm sure the authors are very bright, and I know that some good insights that went into the implementation of prevayler. But let's not throw away our databases quite yet.
The somewhat naive authors of prevayler confidently announce the following on their website:
"No one has yet found a bug in Prevayler in a Production release. Who will be the first? [bold in original text]."
I already found a serious bug in the current production release. From the prevayler source:
ObjectOutputStream oos = logStream();
try {
oos.writeObject(command);
oos.reset();
oos.flush();
} catch (IOException iox) {
ObjectOutputStream does not guarantee atomicity. If your command object is larger than the page size of your disk, the "transaction" will take at least two page writes. A software failure between those page writes will lead to "half a transaction" being committed and a subsequent corruption of data. Once data integrity is lost, it is often difficult or impossible to recover. Prevayler has nothing to handle this case. Thus, prevayler does not correctly implement ACID, because it doesn't guarantee atomicity (half a transaction can be committed), consistency (referential integrity would be destroyed in such a case), isolation (this failure wouldn't be isolated to a single transaction) or durability (the problem would only show up upon reloading).
Finding this bug took very little searching. I am apparently the first person ever to find a bug in prevayler. Do I get a prize?
Oh come on. You're making the same mistake as 90% of the Slashdotters: you judge something without even having looked at it.
Prevayler 1.0 contains something like 8 classes with 400 lines of code. That's a dimension where it's still possible to code without errors if you are really really careful and test a lot. That's one of the main ideas behind Prevayler: Get rid of the enormous complexity of a full-blown database system.
Most of the complaints about Prevayler are nonsense. It is neither vaporware, an "amateur" project (whatever that is) nor unreliable or unusable.
I can tell, I have written a large web application with Prevayler, and it has been serving thousands of users 24/7 for more than a yer now.
Prevayler does have its own share of problems of course. In particular, there are three that actually hurt:
-There is no direct way to browse or manipulate the database. Everything must be done through Java code that is part of the application itself. You can also see this as an advantage though: Think about data integrity.
-Schema evolution is a pain. Since Prevayler relies on standard Java serialization, you have to live with Java's schema evolution mechanisms. Which really suck.
-Java lacks meta-programming facilities, so you end up writing the same code over and over again for simple things like a table with 10 different string fields (getter/setter methods etc). On the other hand, the complex parts that usually force you to write near-unreadable, slow SQL statements tend to become very easy and efficient.
Regards,
-Stefan Reich (www.drjava.de)
Having trotted out "Database Debunkings" before myself, I agree with you in part. Certainly a relational database can store facts about objects. This is, however, not the same as storing objects themselves. The difference is one of identity.
Chris Date's articles on "OODBMS" seem to deal mainly with the need not to throw away the relational model simply because OO users want more complex data types in order to model objects. This is of course eminently reasonable -- regardless of what data types you are working with, the relational model still makes sense and indeed is required to make your database make sense. It also is orthogonal to the point I was trying to make above.
The crux of my point is that tuples do not carry identity. Tuples are values, not objects -- they are more akin to the number 5 than to the variable x in the C statement int x = 5; The relational model precisely does not let us talk about their position, address, or pointers to them -- it lets us talk about them only as values, and relations as sets of values.
You can have tuples about things which have identity -- after all, customers have identity. To represent customers' identity, we may use unique keys. (Incidentally, because they are not unique, SSNs do not form a candidate key.) However, this is "representing facts about entities which have identity", which isn't what OO users want. They want to store data objects that have identity -- because their objects in core do have identity, and all they want from the database is a persistent object store.
Again: the OO person wants to store objects with identity. The RDBMS offers him the ability to represent facts about objects with identity.
What do I mean by "identity"? Good question. One simple way of putting it is that objects with identity are unique; references to them can be stored, and they can be addressed and updated uniquely by them. References to them (e.g. pointers) can be compared by comparing the references only. The C++ reference variables (or C pointers) first_customer and current_customer point to the same object iff they are themselves equal. We can store a reference to that object by copying a reference variable -- and when we indirect through it later, we know we will get the same object, not merely one with the same values. The relational model does not admit of such objects -- it has no pointers, only values in sets.
(Those familiar with Lisp are reminded of EQ vs. EQUAL.)
Objects in OO can store references to other objects. Tuples in databases cannot; they can only store common values (like foreign keys). A foreign key is not a reference to another tuple; it is simply a value that is noted as being in common with another tuple, which tuple is in a set wherein that value is unique. So when you store facts about an object into a database, how do you store the fact "this object contains a reference to that other one"? Serial numbers as foreign keys? Then how many passes does it take to reconstitute the pointers in core from the serial numbers in the database?
The problem of discerning identity out of facts is not simply hard in computing. It is hard in the real world -- that is why it is a major subject of metaphysics in philosophy. (Which has nothing to do with "metaphysics" in the sense of "supernaturalism".) Those who sweep it under the table as "academic theory" or "irrelevant to business" will get bitten by it later in ambiguous results.
We use prevayler and it's pretty good, but you still have to distort your Java code to accommodate the Command pattern to use it. I think the real solution is a well-designed new O-O programming language that provides fully transparent persistence via some kind of underlying object-relational db implementation.
Where are we going and why are we in a handbasket?