Why Aren't You Using An OODMS?
Why Aren't You Using An Object Oriented Database Management System?
In today's world, Client-Server applications that rely on a database on the server as a data store while servicing requests from multiple clients are quite commonplace. Most of these applications use a Relational Database Management System (RDBMS) as their data store while using an object oriented programming language for development. This causes a certain inefficency as objects must be mapped to tuples in the database and vice versa instead of the data being stored in a way that is consistent with the programming model. The "impedance mismatch" caused by having to map objects to tables and vice versa has long been accepted as a necessary performance penalty. This paper is aimed at seeking out an alternative that avoids this penalty.
What follows is a condensed version of the following paper; An Exploration of Object Oriented Database Management Systems, which I wrote as part of my independent study project under Dr. Sham Navathe.
Introduction
The purpose of this paper is to provide answers to the following questions
- What is an Object Oriented Database Management System (OODBMS)?
- Is an OODBMS a viable alternative to an RDBMS?
- What are the tradeoffs and benefits of using an OODBMS over an RDBMS?
- What does code that interacts with an OODBMS look like?
An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data. The Object Oriented Database Manifesto [Atk 89] specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation , Types and Classes ,Class or Type Hierarchies, Overriding,overloading and late binding, Computational completeness , Extensibility, Persistence , Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility.
>From the aforementioned description, an OODBMS should be able to store objects that are nearly indistinguishable from the kind of objects supported by the target programming language with as little limitation as possible. Persistent objects should belong to a class and can have one or more atomic types or other objects as attributes. The normal rules of inheritance should apply with all their benefits including polymorphism, overridding inherited methods and dynamic binding. Each object has an object identifier (OID) which used as a way of uniquely identifying a particuler object. OIDs are permanent, system generated and not based on any of the member data within the object. OIDs make storing references to other objects in the database simpler but may cause referential intergrity problems if an object is deleted while other objects still have references to its OID. An OODBMS is thus a full scale object oriented development environment as well as a database management system. Features that are common in the RDBMS world such as transactions, the ability to handle large amounts of data, indexes, deadlock detection, backup and restoration features and data recovery mechanisms also exist in the OODBMS world.
A primary feature of an OODBMS is that accessing objects in the database is done in a transparent manner such that interaction with persistent objects is no different from interacting with in-memory objects. This is very different from using an RDBMSs in that there is no need to interact via a query sub-language like SQL nor is there a reason to use a Call Level Interface such as ODBC, ADO or JDBC. Database operations typically involve obtaining a database root from the the OODBMS which is usually a data structure like a graph, vector, hash table, or set and traversing it to obtain objects to create, update or delete from the database. When a client requests an object from the database, the object is transferred from the database into the application's cache where it can be used either as a transient value that is disconnected from its representation in the database (updates to the cached object do not affect the object in the database) or it can be used as a mirror of the version in the database in that updates to the object are reflected in the database and changes to object in the database require that the object is refetched from the OODBMS.
Comparisons of OODBMSs to RDBMSsThere are concepts in the relational database model that are similar to those in the object database model. A relation or table in a relational database can be considered to be analogous to a class in an object database. A tuple is similar to an instance of a class but is different in that it has attributes but no behaviors. A column in a tuple is similar to a class attribute except that a column can hold only primitive data types while a class attribute can hold data of any type. Finally classes have methods which are computationally complete (meaning that general purpose control and computational structures are provided [McF 99]) while relational databases typically do not have computationally complete programming capabilities although some stored procedure languages come close.
Below is a list of advantages and disadvantages of using an OODBMS over an RDBMS with an object oriented programming language.
Advantages- Composite Objects and Relationships: Objects in an OODBMS can store an arbitrary number of atomic types as well as other objects. It is thus possible to
have a large class which holds many medium sized classes which themselves hold many smaller classes, ad infinitum. In a relational database this
has to be done either by having one huge table with lots of null fields or via a number of smaller, normalized tables which are linked via
foreign keys. Having lots of smaller tables is still a problem since a join has to be performed every time one wants to query data based on the
"Has-a" relationship between the entities. Also an object is a better model of the real world entity than the relational tuples with regards to complex
objects. The fact that an OODBMS is better suited to handling complex,interrelated data than an RDBMS means that an OODBMS can outperform an RDBMS by ten to
a thousand times depending on the complexity of the data being handled.
- Class Hierarchy: Data in the real world is usually has hierarchical characteristics. The ever popular Employee example used in most RDBMS texts is
easier to describe in an OODBMS than in an RDBMS. An Employee can be a Manager or not, this is usually done in an RDBMS by having a type identifier
field or creating another table which uses foreign keys to indicate the relationship between Managers and Employees. In an OODBMS, the Employee class is
simply a parent class of the Manager class.
- Circumventing the Need for a Query Language: A query language is not necessary for accessing data from an OODBMS unlike an RDBMS since interaction
with the database is done by transparently accessing objects. It is still possible to use queries in an OODBMS however.
- No Impedence Mismatch: In a typical application that uses an object oriented programming language and an RDBMS, a signifcant amount of time is usually
spent mapping tables to objects and back. There are also various problems that can occur when the atomic types in the database do not map cleanly to
the atomic types in the programming language and vice versa. This "impedance mismatch" is completely avoided when using an OODBMS.
- No Primary Keys: The user of an RDBMS has to worry about uniquely identifying tuples by their values and making sure that no two tuples have the same
primary key values to avoid error conditions. In an OODBMS, the unique identification of objects is done behind the scenes via OIDs and is completely
invisible to the user. Thus there is no limitation on the values that can be stored in an object.
- One Data Model: A data model typically should model entities and their relationships, constraints and operations that change the states of the data in
the system. With an RDBMS it is not possible to model the dynamic operations or rules that change the state of the data in the system because this is
beyond the scope of the database. Thus applications that use RDBMS systems usually have an Entity Relationship diagram to model the static parts of the
system and a seperate model for the operations and behaviors of entities in the application. With an OODBMS there is no disconnect between the database
model and the application model because the entities are just other objects in the system. An entire application can thus be comprehensively modelled in one
UML diagram.
- Schema Changes: In an RDBMS modifying the database schema either by creating, updating or deleting tables is typically independent of the actual
application. In an OODBMS based application modifying the schema by creating, updating or modifying a persistent class typically means that changes have to
be made to the other classes in the application that interact with instances of that class. This typically means that all schema changes in an OODBMS will
involve a system wide recompile. Also updating all the instance objects within the database can take an extended period of time depending on the size of
the database.
The following information was gleaned from the ODBMS Facts website.
- The Chicago Stock Exchange manages stock trades via a Versant ODBMS.
- Radio Computing Services is the world's largest radio software company. Its product, Selector, automates the needs of the entire radio station -- from
the music library, to the newsroom, to the sales department. RCS uses the POET ODBMS because it enabled RCS to integrate and organize various elements,
regardless of data types, in a single program environment.
- The Objectivity/DB ODBMS is used as a data repository for system component naming, satellite mission planning data, and orbital management data deployed by Motorola in The Iridium System.
- The ObjectStore ODBMS is used in SouthWest Airline's Home Gate to provide self-service to travelers through the Internet.
- Ajou University Medical Center in South Korea uses InterSystems' Cachè ODBMS to support all hospital functions including mission-critical departments such as pathology, laboratory, blood bank, pharmacy, and X-ray.
- The Large Hadron Collider at CERN in Switzerland uses an Objectivity DB. The database is currently being tested in the hundreds of terabytes at data rates up to 35 MB/second.
- As of November, 2000, the Stanford Linear Accelerator Center (SLAC) stored 169 terabytes of production data using Objectivity/DB. The production data is distributed across several hundred processing nodes and over 30 on-line servers.
Below are Java code samples for accessing a relational database and accessing an object database. Compare the size of the code in both examples. The examples are for an instant messaging application.
- Validating a user.
Java code accessing an ObjectStore(TM) database
import COM.odi.*;
import COM.odi.util.query.*;
import COM.odi.util.*;
import java.util.*;
try {
//start database session
Session session = Session.create(null, null);
session.join();
//open database and start transaction
Database db = Database.open("IMdatabase", ObjectStore.UPDATE);
Transaction tr = Transaction.begin(ObjectStore.READONLY);
//get hashtable of user objects from DB
OSHashMap users = (OSHashMap) db.getRoot("IMusers");
//get password and username from user
String username = getUserNameFromUser();
String passwd = getPasswordFromUser();
//get user object from database and see if it exists and whether password is correct
UserObject user = (UserObject) users.get(username);
if(user == null)
System.out.println("Non-existent user");
else
if(user.getPassword().equals(passwd))
System.out.println("Successful login");
else
System.out.println("Invalid Password");
//end transaction, close database and retain terminate session
tr.commit();
db.close();
session.termnate();
}
//exception handling would go here ...
Java JDBC code accessing an IBM's DB2 Database(TM)
import java.sql.*;
import sun.jdbc.odbc.JdbcOdbcDriver;
import java.util.*;
try {
//Launch instance of database driver.
Class.forName("COM.ibm.db2.jdbc.app.DB2Driver").newInstance();
//create database connection
Connection conn = DriverManager.getConnection("jdbc:db2:IMdatabase");
//get password and username from user
String username = getUserNameFromUser();
String passwd = getPasswordFromUser();
//perform SQL query
Statement sqlQry = conn.createStatement();
ResultSet rset = sqlQry.executeQuery("SELECT password from user_table WHERE username='" + username +"'");
if(rset.next()){
if(rset.getString(1).equals(passwd))
System.out.println("Successful login");
else
System.out.println("Invalid Password");
}else{
System.out.println("Non-existent user");
}
//close database connection
sqlQry.close();
conn.close();
}
//exception handling would go here ...
There isn't much difference in the above examples although it does seem a lot clearer to perform operations on a UserObject instead of a ResultSet when validating the user.
- Getting the user's contact list.
Java code accessing an ObjectStore(TM) database
import COM.odi.*;
import COM.odi.util.query.*;
import COM.odi.util.*;
import java.util.*;
try {
/* start session and open DB, same as in section 1a */
//get hashmap of users from the DB
OSHashMap users = (OSHashMap) db.getRoot("IMusers");
//get user object from database
UserObject c4l = (UserObject) users.get("Carnage4Life");
UserObject[] contactList = c4l.getContactList();
System.out.println("This are the people on Carnage4Life's contact list");
for(int i=0; i <contactList.length; i++)
System.out.println(contactList[i].toString()); //toString() prints fullname, username, online status and webpage URL
/* close session and close DB, same as in section 1a */
}//exception handling code
Java JDBC code accessing an IBM's DB2 Database(TM)
import java.sql.*;
import sun.jdbc.odbc.JdbcOdbcDriver;
import java.util.*;
try {
/* open DB connection, same as in section 1b */
//perform SQL query
Statement sqlQry = conn.createStatement();
ResultSet rset = sqlQry.executeQuery("SELECT fname, lname, user_name, online_status, webpage FROM contact_list, user_table" + "WHERE contact_list.owner_name='Carnage4Life' and contact_list.buddy_name=user_table.user_name");
System.out.println("This are the people on Carnage4Life's contact list");
while(rset.next())
System.out.println("Full Name:" + rset.getString(1) + " " + rset.getString(2) + " User Name:" + rset.getString(3) + " OnlineStatus:" + rset.getString(4) + " HomePage URL:" + rset.getString(5));
/* close DB connection, same as in section 1b*/
}//exception handling code
The benefits of using an OODBMS over an RDBMS in Java slowly becomes obvious. Consider also that if the data from the select needs to be returned to another method then all the data from the result set has to be mapped to another object (UserObject).
- Get all the users that are online.
Java code accessing an ObjectStore(TM) database
import COM.odi.*;
import COM.odi.util.query.*;
import COM.odi.util.*;
import java.util.*;
try{
/* same as above */
//use a OODBMS query to locate all the users whose status is 'online'
Query q = new Query (UserObject.class, "onlineStatus.equals(\"online\"");
Collection users = db.getRoot("IMusers");
Set onlineUsers = q.select(users);
Iterator iter = onlineUsers.iterator();
// iterate over the results
while ( iter.hasNext() )
{
UserObject user = (UserObject) iter.next();
// send each person some announcement
sendAnnouncement(user);
}
/* same as above */
}//exception handling goes here
Java JDBC code accessing an IBM's DB2 Database(TM)
import java.sql.*;
import sun.jdbc.odbc.JdbcOdbcDriver;
import java.util.*;
try{
/* same as above */
//perform SQL query
Statement sqlQry = conn.createStatement
();
ResultSet rset = sqlQry.executeQuery
("SELECT fname, lname, user_name, online_status,
webpage FROM user_table WHERE
online_status='online'");
while(rset.next()){
UserObject user = new UserObject
(rset.getString(1),rset.getString
(2),rset.getString(3),rset.getString
(4),rset.getString(5));
sendAnnouncement(user);
}
/* same as above */
}//exception handling goes here
Proprietary
- Object Store
- O2
- Gemstone
- Versant
- Ontos
- DB/Explorer ODBMS
- Ontos
- Poet
- Objectivity/DB
- EyeDB
Open Source - Ozone
- Zope
- FramerD
- XL2
The gains from using an OODBMS while developing an application using an OO programming language are many. The savings in development time by not having to worry about separate data models as well as the fact that there is less code to write due to the lack of impedance mismatch is very attractive. In my opinion, there is little reason to pick an RDBMS over an OODBMS system for newapplication development unless there are legacy issues that have to be dealt with.
Ok only one disadvantage with this OODBMS, a system wide recompile EVERY time you make a schema change. Umm, that's a pretty big disadvantage in my book.
Great job, Carnage4Life!
:)
I didn't think I'd see the day when someone got actual content posted on Slashdot.
Or, for that matter, that you'd post a Java article that I thought was somewhat interesting and useful...
Anyhow, wouldn't it be easier to integrate all this with C? Especially considering the huge body of existing code, and the well-known primitives involved.
And are there any less proprietary OODBMSes out there that anyone would recommend?
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
As I understand, EROS is adding a rdbms but by its nature OODBMS would be a more logical use of its properties.
I actualy can't wait for an EROS OODBMS Network Data Storage system. I think they were meant for each other but it will take 10 years for people to comprehend it. I wonder if in 10 years when this idea is finnaly reaching Linux like momentum, if someone will think back and say "We could have had this 10 years ago".
~^~~^~^^~~^
I agree about the complexity and skill availability arguments, it is still easier (and cheaper) to get several COBOL and VB programmers than Java or C++ ones.
But then you can always get a consultant to help with the design. And as a matter of fact, it will be faster to develop that way than having a bunch of COBOL developers put together some kind of server side app while some VB coders put the client interface together...
Having done both, I can tell you what kind of system scales and which one does not.
Have you done some EJB programming? You would be surprised how much faster and easier it is to go the OODB route.
My opinion on what the biggest problem really is, is mainstream recognition. OODB vendors are vulnerable to FUD from RDBMS vendors as much as Linux was suffering from Microsoft FUD two years ago. Note that for OODB systems (as for Linux 2 years ago) there are some good reasons to stick with the mainstream solution. Going the OODB route is far more risky (from a business decision making point of view).
Black holes occur when God divides by zero.
A lot depends here on what we mean by the OO in OODBMS. Even in programming, the meaning of object-orientation has changed through the years. And the problem of ambiguity is even greater with database management tools.
I am glad there are good and smart people working on a standard for what constitutes an OODBMS. I suspect it will be a few years before a definitive standard is completely figured out.
Consider, for example, some of the very different things people mean by an "Object-Oriented Database Management System":
Some people use it to mean "something which will give me persistence in the OO app I'm currently working on." For them a relational database management product with a few OO tools may be fine (assuming their objects are sufficiently simple).
Some people use it to mean "something that will give me the ability to tie behavior to persistent objects." For them good stored procedures (like Oracle with a third-party product for debugging stored procedures) may be exactly what they want.
Some people use it to mean "a DBMS which implements all the major features of current OO theory." A OODBMS which truly implements standards (as linked to in the original article) is what's needed for these people.
Some people use it to mean "something which will enable me to implement all ideas currently associated with advanced OO theory (including aspect-oriented programming) and anything which may be included in that theory in the future." A DBMS with a dynamic model of object-oriented-ness (along the lines of Perl's dynamic model of what OO is) would be required. I don't know if anyone's actually accomplished this, but I would be both impressed and interested if it's been done (especially if it's language-independent, assuming that's possible).
And some people use it to mean "a DBMS which is fundamentally object-oriented in its underlying structure enabling a variety of powerful table-creation tools." This can be accomplished with some of the better OODBMSs (depending, once again, on just what you mean by "fundamentally object-oriented").
Given all this, I suspect it will be a while before a clear definition is agreed upon. It may even come out of theoretical work in academia. Until that time, the practical reasons listed here will continue to be why many don't use OODBMSs. And the attractive features they offer will continue to be why some people will ignore those practical problems.
Oh, no! It looks like we're back to "it depends on the problem you're working on" just like so many of these debates.
Eternal vigilance only works if you look in every direction.
Actually, data loading/backend-type stuff is easier with an RDBMS, because data-entry is almost alway tabular anyway. However, once entered, it is usually easier to process it through an OO layer. The best way to accomplish this is to have a CORBA layer that the applications always use for talking to the database that actually incorporates all the business logic, but have the CORBA layer talk to an RDBMS.
I don't have a lot of experience with OODBMSs - I'd be curious exactly how they work. The closest I've worked with is PostgreSQL which is Object-relational. Are there any intro guides, especially to schema definition and stuff like that.
Is there a free software OODBMS?
Engineering and the Ultimate
was the ability to use already-existent tools to do data mining, reporting, & other similar (not necessarily insignificant) activities...
in all cases that we had gone through rigourous prototypes of products and used ODBMS', it always seemed to come down to the same few things:
1) critical mass (everyone already knew the relational databases very well)
2) tool robustness (there are a wide variety of good tools (most 3rd party supplied) to MANAGE relational instances. i'm referring to more subtle circumstances than managing users & schema here)
3) reporting and data-mining was ALWAYS more difficult (usually by an order of magnitude or more).
now, my last involvement in a prototype is YEARS ago, so i'm absolutely positive things have changed...
the reality remains that people haven't yet gotten by what they learned in their first few experiences and simply haven't re-examined the landscape, just like myself...
a weak excuse, but i'm certain this is a more common answer than we'd all like to admit.
just my 0.02.
Peter
As someone else has pointed out, OODBMSs require a very different skill set. The problem isn't that your typical SQL developer didn't have these skills. The problem is that the things were ever referred to as database systems.
If you walk into a potential customer selling a "database system", then the database guys come and hear what you have to say. They ask about SQL support and point-and-click development tools. They are going to be looking for very high levels of concurrency, at isolation levels below serializable.
Selling a "database system" meant that once we got past the early adopters, we were selling against Oracle and we hit a wall. What we should have done from day one was to sell persistence for C++. We did start out like this, e.g. trying to convince ECAD vendors to build their products on top of ObjectStore. That had some limited success because the customers knew that they needed persistence, but they were C/C++ hackers at heart, and an RDBMS was a poor compromise. A "database for C++ with no impedance mismatch" sounds great to someone writing a 3d modeler. We then went on to apply the same logic selling to satisfied RDBMS users without changing our strategy, and that's when things stalled.
That strategy was necessary in some ways, because we were venture-funded, and the VCs weren't going to be happy with a small niche. They wanted something that would get into every insurance company and bank. However, by aiming high and failing (by VC standards), we abandoned our natural market too soon and avoided becoming a small success in that market.
The issue is that OODBMSs do not conform neither to current database best practices, nor to theory.
Relating to best practices, you should know already from other, better-rated comments in this thread: you should design your data before your application, OODBMSs make it hard; you should strive for independence of the logical and physical layers of your database, keeping data independence and shifting the performance optimization issues to the DBMS' optimizer; OIDs hinder the designation of candidate keys, of which the primary key is a special case, and thus hinder a lot of data integrity checkings that should be done by referential integrity. And we could go on and on.
As for the practical implications of not conforming to these best practices, any schema change do not only need an application recompilation, but also that you rethink the data access path (also known as a query's access plan); you won't be able to keep several logical schemas to different users, and the identity of the user's view with the physical layout will force you to optimize only for the most common case, instead of leaving it up to the DBMS to create the best access plans.
All this is much better explained in Database Debunkings, a site co-maintained by Chris J Date, author of the best database books I've ever read; you can find a list of his available books also there.
As for theory, there is no real substitute for the relational database model theory. As Linus Torvalds thinks that microkernels were a good idea but misguided, wielding no practical nor theoretical improvements, so OODBMSs sounds nice but offer no real improvements over RDBMSs. This is not to say that everything you will ever need will be handled properly by your SQL DBMS. The point is exactly that people have went for OODBMSs because they thought that SQL was relational, and found it wanting. The problem is that SQL never was truly relational, just an approximation of it. Date has a whole book on it, called The Third Manifesto.
Summing up, what I am really trying to find is some proper implementation of the relational database model ideals, which should give us practically all of the advantages of OODBMSs without their cons. I have just been informed of Suneido, but have not investigated it fully... it's a pity it is Win32, not POSIX.
--
Leandro Guimarães Faria Corsetti Dutra
DBA, SysAdmin
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
While OODBMS were an obvious choice to me for performance and ease of programming, my consultants told me that finding Oracle talent was so much easier than finding Versant talent (for example) that I would be wasting time and money using OODBMS. This is especially true of DBAs.
And that's precisely the same reason it took Linux so long to catch on in the enterprise, and why it still hasn't invaded small to medium businesses with only 1-2 network-savvy people. I'd love to switch to Linux fileservers instead of upgrading our NT boxes to 2k, but since we can't find anybody with the appropriate experience to manage them when I'm not around, we stick with the point-and-click OS's. Don't flame me for the decision, I'm just stating why we don't always switch to things we all know are best. (Reminds me of OS/2 for some reason.)
What's your damage, Heather?
This reminds of the famous GOTO considered harmful issue.
Relational systems are useful for a wide variety of tasks specifically because they are limited in their expressive power. This limitation in their expressive power means that certain desirable properties are maintained.
The objects that are recognized in the relational programming model are scalars, tuples and tables. Most operations are closed on the set of all tables -- that is to say the take tables and produce tables. This means that you can compose operations in various kinds of ways and still have more raw material for further operations.
To take a more modern view of this: relational databases are about the reuse of facts. The process of designing a database is one of analyzing factual relationships so that eventually each fact is stored in one and only one place. This, along with the closed nature of relational operations, facilitates recombining these facts in various novel ways. I believe this is the source of the relational model's sustained popularity.
The cost is that the resultant model is not ideal for any single application. I believe this is the nature of the "impedence mismatch" -- you are dealing with an off-the-rack, one-size-fits-most-applications representation of data. Naturally, for complex applications with severe performance constraints, a more tailored representation is required.
I've never had the cash to hack around with OO databases, so I'd like to learn more. Do they support the kind of composition of operations that you get with relational systems? Presumably objects can be re-used in different applications, but how well does this work in practice?
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Aren't you supposed to design an application before implemnting
it in any way including putting data in a DB? I've worked at two companies and
had a ton of projects in school and none involved implemnting the database
before the application was designed.
I'd like to know what world you are living in. In the real world, most databases
are legacy databases and FULL of data. I've had to design applications around
databases for years now. In my field (Programming for Engineers) the data is king
and people need to access it in multiple ways. True, if you are designing a
system from the ground up, then you will be able to design the DB and
make it nice and pretty. This is seldom the case in any case but web development.
This is simply hogwash. RDBMSs are by their nature
non-generic espoecially when one adds foreign keys and constaints to a system
which are necessary for any decent sized application. On the other hand the
entire point of object oriented programming is creating generic reusable
components. With the ability to use inheritance and polymorphism in an ODBMS I
see no reason why you believe an RDBMS is more generic.
'Generic' may be the wrong word here. A better one would be 'simpler'. A lot of applications
just don't need all the OO stuff. The reason that RDBMSs are so pervasive is
because most data can be represented well and in an easy to understand way with
just tables and keys.
Learning
OODBMS techniques is mainly learning how to use another API in your bject
Oriented programming language of choice (well C++, Java or Smalltalk) versus
learning SQL and relational database theory. If people could learn SQL which is
completely unrelated to any other aspect of their programming experience then
adding OODBMS techniques as a skillset would be trivial. Of course, if people
don't realize that alternatives to RDBMSs exist then they won't learn these
techniques. That is more important because management can't find people who have
these skills if developers don't go out and learn these techniques.
Developers aren't the only ones who have to query the database. In my shop,
we have 10-20 people querying the same database. Many of whom have spent a lot
of time learning SQL. Most of the people who need to look at the data are
not able to pick up a new query language quickly enough. SQL is simple
enough to learn. RDBMSs are simple and easy to understand. With an OODBMS,
these people have to be trained on what the heck OO is. This is not an easy
concept for a non-progammer. On the other hand, tell someone that the database
is a collection of tables, and they can easily understand.
Now I just realized you didn't read the
article. People have measured gains in the range of ten to a thousandfold
increase in performance, these are not incremental. Secondly the primary benefit
is that it means you have to write less code and don't have to worry
about multiple paradigms at once when implementing an application.
Sure. I'll believe it when I see it. This sounds like marketing hype to me.
Sounds like someone who didn't know how to program for an RDBMS wrote some
crappy code. Correctly written code for an RDBMS would not experience these
kinds of gains when converted to an OODBMS. The overhead for the conversion
process could be this large, but only if the original code is crap.
Mad Software: Rantings on Developing So
Cache solves some of the problems you point out. It's accessible relationally or via objects. New object interfaces can be added to existing ones, or to relational and non-relational data stores. So, Cache is generic. Complexity -- because Cache can also be accessed as a relational database, you can write a new Java OO app using its object interface and let older apps continue to use its SQL interface. Skills availability -- start relational, have the choice of trying OO.
- - - - -
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
http://www.e-dbms.com/cache/components/cacheobject s/index.html
- - - - -
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
I used to work for Excite@Home, in their E-Business Services unit (now defunct; those left are just an engineering adjunct to Excite@Home). We created a web-based store hosting product based entirely upon ObjectStore as the back-end using Java for dynamic page generation getting results from C++ query servers.
Unfortunately, the site became very popular, and with all the orders, order information, store products, etc. stored in the database, had hundreds of millions of objects (in some cases, very large objects) in the data store.
We began running up against the 32-bit barrier for address space within ObjectStore. At the time, there was no 64-bit version of ObjectStore (and I don't know if there is now). We would watch performance steadily degrade on our C++ queries over the course of 2 or 3 months, until finally it would nearly grind to a halt because of lack of address space and we would be forced into a 12-14 hour defragmentation routine. Each time we went through this cycle, it would start again, but performance would erode even faster.
Admittedly, we were doing some pretty bizarre stuff. ObjectStore didn't support on-the-fly schema changes, so we hacked some utilities which allowed us to do that (and which ate address space). We also stored all the product orders in the database, and we never fully deleted orders until we defragmented. But fundamentally, ObjectStore had a problem with scalability for extremely large databases (billions of objects).
We went to Oracle, and the problems disappeared. Hello, 64-bit world, hello nearly unlimited address space, bye-bye constant database defragmentation. I'm not saying Oracle is a panacea -- it's not, and is quirky as hell -- but it blew the crap out of ObjectStore in this case.
My two cents.
Matt Barnson
Matthew P. Barnson
I learn what I think when I read what I write
I have used Versant on a medium to large scale development project. For 80-90% of the code, using and ODBMS was a dream. A simple persistant base class for objects which need to go in/out from the DB was an elegant solution that was a joy to use. I was all ready to become the new ODBMS advocate/zelot for all my future development projects.
Then I ran into a wall. The wall was Ad Hoc query. For most of our system, traversing an object model was a very elegant way of accessing data. But for that last 10%, we really needed a fast, efficient Ad Hoc query. Here is where the ODBMS fell flat on its face. The querys were slow, and doing something akin to a "join" was mighty painful. And of course, it turned out that these operations were the most used and the slowest part of our system. Everything came crashing down arround me. What had been a joy to develop, was a nightmare to use.
Our application was a series of seperate distributed apps, all reading and writing to a shared datastore. Although walking between related objects was a dream, finding the "head" of the tree would always be a PITA. Our data had a good parent-child-grandchild-etc has-a set of relationships. But finding the "interesting" parent objects was very, very slow. Once the parents were found, traversing thru the related data was fast and easy, but the startup of each operation was a huge bottleneck.
This may just be poor design experience on my and the other developer's parts. Just like a set of C developers can create a truly horrid C++ design, we RDBMS developers may have just abused the OODBMS. But the fact is we had a group of half a dozen experienced OO developers, and we all thought we had a good approach. If a group of developers with good OO programming experience, and good RDBMS experience can't figure out how to correctly use an ODBMS, then I don't have much hope for the technology. Either the technology has some serious limitations, or the learning curve is very, very steep. Either way, I've been sticking to my tried-and-true RDMBS every since.
6. The world is not object oriented. Even if oo is a usefull tool, it is no silver bullet.
Which makes more sense when writing an application using an object oriented programming language to develop an application? Using a database that is consistent with the programming paradigm and performs database operations transparently or one that requires the developer to go through additional hoops to get data, is generally slower, and involves writing more code?
7. RDBMS are proven technology and rather well standardised, OODBMS aren't. Currently there is a proposal for a standard (java data objects), but even that only addresses one plattform.
Not only is there a standard but the ODMG standard is on version 3, JDO is merely a Java standard. Please know the facts before flaming.
--
Hi, thanks for the responses, I didn't think anyone would be done reading it so quickly. :)
...and in an OO system some changes can be made without no effects on the existing applications. In the general case though an RDBMS is more flexible than an ODBMS.
Complexity. These systems are much more difficult to design than RDBMS. The application must be designed first, then the data structures must accomodate that. This kind of design is very expensive.
Aren't you supposed to design an application before implemnting it in any way including putting data in a DB? I've worked at two companies and had a ton of projects in school and none involved implemnting the database before the application was designed.
RDBMSs are generic. Since an OO system is designed for a specific application, it's difficult to use that system for anything else. A well-designed, properly normalized RDBMS can be used for many different applications. When a DB is going to fill many terabytes, you don't want to have multiple copies of it for each distinct reporting application.
This is simply hogwash. RDBMSs are by their nature non-generic espoecially when one adds foreign keys and constaints to a system which are necessary for any decent sized application. On the other hand the entire point of object oriented programming is creating generic reusable components. With the ability to use inheritance and polymorphism in an ODBMS I see no reason why you believe an RDBMS is more generic.
Schema changes. As mentioned in the article, schema changes are a nightmare with an OO system. In a relational system, some changes can be made with no impact on existing applications. Others are relatively uncomplicated compared to similar OO changes.
Skills availability. Yes, the old management problem. Everyone knows SQL; nobody knows OO.
Learning OODBMS techniques is mainly learning how to use another API in your bject Oriented programming language of choice (well C++, Java or Smalltalk) versus learning SQL and relational database theory. If people could learn SQL which is completely unrelated to any other aspect of their programming experience then adding OODBMS techniques as a skillset would be trivial. Of course, if people don't realize that alternatives to RDBMSs exist then they won't learn these techniques. That is more important because management can't find people who have these skills if developers don't go out and learn these techniques.
It's just not worth it. Given the dramatically higher costs associated with designing and maintaining an OO system, most applications just don't need the incremental performance gains associated with it. Very specialized, very high performance systems would benefit, but smaller or more general systems would not.
Now I just realized you didn't read the article. People have measured gains in the range of ten to a thousandfold increase in performance, these are not incremental. Secondly the primary benefit is that it means you have to write less code and don't have to worry about multiple paradigms at once when implementing an application.
Finally, where the heck are you getting this BS that designing an application with a single data model (i.e. one set of UML diagrams) is more expensive than designing one with 2 data models (i.e. an ER model for the DB, UML for the application).
--
I've got a better question: why aren't you using the RDBMS?
Many of us who crow about the wonders of OO programming environments, don't have a firm grasp of the alternatives, nor do we fully appreciate the problems that those OO environments solve versus the good things they traded away. For building significant, long-lived, scalable, evolveable, administerable, restartable information systems the RDBMS has not been beat.
If we start from the opposite side, i.e. we start with the RDBMS and ask: what is it that is distasteful about programming in this environment, we might actually get somewhere. If I take Oracle as an example and compare it to e.g. Java the only shortcoming I see with Oracle's PL/SQL is that it doesn't (to my knowledge) support polymorphism. It does support encapsulation and abstraction (functions, procedures, packages with data hiding), and the biggie: declarative, optimizable association specification. It certainly supports "structured programming". Are you willing to trade away all that RDBMS goodness just to get polymorphism. Seems like a poor tradeoff.
I'll go even further. It is not at all obvious that the OO "model" is superior to the relational one. These observations from this paper by McCarthy apply just as well now to OO models, as they did to non-relational (accounting) models back in 1982 (pp 554-555):
(2) Its classification schemes are not always appropriate. The chart of accounts for a particular enterprise represents all of the categories into which information concerning economic affairs may be placed. This will often lead to data being left out or classified in a manner that hides its nature from non-accountants.
(3) Its aggregation level for stored information is too high. Accounting data is used by a wide variety of decision makers, each needing differing amounts of quantity, aggregation, and focus depending upon their personalities, decision styles, and conceptual structures. Therefore information concerning economic events and objects should be kept in as elementary a form as possible to be aggregated by the eventual user.
What McCarthy is arguing for is dis-encapsulation! Anti-OO. I think there's an important lesson there.
So the question is: can we have that flexibility along with maintainability?
Also, be careful to avoid reasoning from an outdated view of the data type expressiveness offered by the modern RDBMS. All the major vendors are now offering so-called OO/Relational features such as object identifiers, large objects, arrays, structures, sub-tables.
As patchy as the SQL, ODBC, and JDBC standards may be, they have commoditized the DBMS market. Until object databases can do the same (the ODMG standards don't even come close), they lock you into a proprietary solution. Ultimately, if your database doesn't scale as well as you'd like, that will hurt performance.
If I understand correctly, the idea is that the RDBMS is turned into an object persistance store. You pull the object's data from the data store, manipulate the object (which may or may not update the database), and then you can store the object back away.
The idea seems to be that we should not abstract ("essentialize") database transactions. We shouldn't have to think about transactions with the data store, and by golly, we won't. We'll make the database a 1:1 mapping with the objects, and that'll be that.
It's a terrible idea. You want as tight control over your trips to the database as possible. Sure, if you're running a small app on one machine, you're fine. But if you've got hoards of transactions coming through, it is really important to watch those trips.
When I worked with WebObjects/EOF, us developers were constantly doing a tug and pull with the system, trying to get just the data we wanted. Different object sets and different APIs would have different ways of presenting the very simple information we needed.
For example, say I have 500K entries in the People table. We want to view their name and email, and a couple other things, 50 at a time.
With an object store, even though I only want their name and email, I've got to pull out everything else in there. What an incredible waste! There are so 10-20 fields in there!
In proper OO fashion, these folks are in a list. I can't possibly pull out 500K entries, so the API goes through twists and contortions to let me select out just the first 50, and then page through in 50.
Do you see what's happened? OO programming has "degraded" itself into what should be SQL land, though it's doing a damn poor job of it. Sure, it sounds nice to say, "Oh yeah, we'll just make an object in the OOP program an object in the database", but what happens when that object is a linked list of 500K items? Suddenly you have these lazy bindings from your linked lists, and each time you traverse another item in the list, it's making a query!
We had to have 5-7 test database servers so that we could make sure our performance was okay, every time we made a change!
Ug! What a terrible idea..!
We would have the friggen SQL written down, EXACTLY like we wanted, and it was terribly frustrating to have to wrestle with this system, trying to get our fucking data out, and nothing BUT our data out..! A lot of programmers just didn't bother. "We'll just pack in more RAM." Eeeee! The database guys hated us.
Because I'm too lazy to read 21846 bytes of text that explains why I should.
Well, given that it was priced and marketed like a high-end, enterprise-grade database system, that kind of seems reasonable.
What we should have done from day one was to sell persistence for C++.
Indeed. But a persistence library for C++ might cost a few hundred dollars per developer and have no or minimal per copy runtime costs. ObjectStore was priced out of that market by orders of magnitude.
In addition to the license costs itself, there are training costs, retooling, and the cost associated with the risk of picking a single vendor solution. Even if you had given ObjectStore away for free it would have been difficult to displace RDBMSs.
The best chance for success I see these days would be to have a simple, reasonably good open source OODBMS and make money on management tools and high performance versions. Still not the stuff of billion dollar companies, but a decent living.
That strategy was necessary in some ways, because we were venture-funded, and the VCs weren't going to be happy with a small niche. They wanted something that would get into every insurance company and bank. However, by aiming high and failing (by VC standards), we abandoned our natural market too soon and avoided becoming a small success in that market.
It's unfortunate that good technology like ObjectStore failed, but ultimately the choice was yours when you accepted the money and the business model.
I just went through this decision-making process with the consultants who are going to build my company's OSS. While OODBMS were an obvious choice to me for performance and ease of programming, my consultants told me that finding Oracle talent was so much easier than finding Versant talent (for example) that I would be wasting time and money using OODBMS. This is especially true of DBAs.