Why IBM Open Sourced Cloudscape
An anonymous reader writes "A common and a consistent framework for accessing information enables developers to do more things with more people more often. This article shares how Derby fits into IBM's developer strategy, the Java application stack, its intention to drive more innovation around Java on Linux, and why they want to make the Derby database become as ubiquitous as the Apache HTTP server." (Derby is the new name for the project based on the formerly commercial Cloudscape database.)
Because too much of the underlying code is owned by Microsoft.
IBM cannot just open-source OS/2. There are technologies and copyrights in OS/2 that belong to third-parties (such as Microsoft).
OS/2 is still available and developed as eComStation http://www.ecomstation.com/. I have to say that I think that it is very expensive, on the other hand it is far from dead.
Sometimes on re-start the db process just hangs and you can't connect.
You have to blow away the dbcache directory to get it to start-up. It doesn't occur frequently, but it has happened more than once in an otherwise stable environment.
Remember that little deal a while ago about ibm building some off line web technology that auto syncs when you regain a connection?
The technology we are talking about is called App Play and guess what it uses for data syncronization?
It does not matter if they open sourced it since they where going to be puttting it on tons of clients anyhow.
Got Code?
It requires an implementation of the open (by now) Java specification.
Whether you use a free implementation or a proprietary, it's your problem. There could be trouble finding a complete free Java implementation, but the GCJ team is working on it.
Actually, being IBM code it was probably developed using IBM's JVM not Sun's.
SQLite isn't written in Java; it's C++. The code may be platform-independent, but the binaries it produces aren't.
A fairer comparison would be Hypersonic SQL, a free, open-source small (~100K) database server.
- Despite popular opinion, I am not perfect.
Cloudscape is a long way down the dependency graph, and you shouldnt expect it for a while. We need to get ant to boot first, which is seemingly a compiler problem.
Due to this much of OS/2 is in NT and much of NT is in OS/2, which is why OS/2 could run Windows 3.1 apps natively without and user intervention. OS/2 had a Win3.1 VM that worked so well Microsoft had to implement Win95/NT 4.0 style API's to break the compatibility.
No, none of NT was in OS2. Nor is any OS2 in NT. That was one of the reasons for creating NT. There is Win 3.1 in OS2 in the VM that you mentioned, but I hardly think that played much of a decision in creating a 32 bit WIN API. After the success of win 3.1, Microsoft realised that it could succed with out OS2 or IBM. So it made win 3.1 32 bit and created win 95 until NT was ready for mainstream use.
Well.. maybe. Or Maybe not. But Definitely not sort of.
How does Cloudscape/Derby compare with the other open source Java database engine, HSQL?
The big feature that Cloudscape has that I don't see on the HSQL page is XA support. Uninteresting unless you are working with a TM, but when you are XA can be the difference between "this could be made to work" and "this is a non-starter"
explanation in plain english
Derby seems to be more of a traditional database, in comparison.
True, the SQL syntax for Cloudscape 10.0 is apparently a subset of the DB2 syntax. So there's definitely a migration path there, which is good for DB2. So the focus is still on DB2. This is one way to move customers over to DB2 if Derby doesn't meet their requirements. Similar to what Microsoft has done with MSDE (now SQL Server Express), except that the IBM way is arguably much friendlier since they've open sourced the codebase instead of just allowing free redistribution of Windows-only binaries.
Eric
OS/2 2.0 was a fully 32-bit, reentrant, fully preemptive multitasking kernel in 1992. Linux still has issues with a preemptive kernel! The graphics interface went 32-bit in OS/2 2.1. It is a single-user system, so there is (or was, anyway) little focus on multi-user style security, at least for local users (the HPFS, and especially HPFS386 filesystems were excellent for multi-user security, including full support for extended attributes).
As for the single program locking up the entire system, that was a design decision in the Presentation Manager (the GUI API and program). It had a single input queue: all window messages went through a single queue. This has performance and usability advantages, especially when one window must modify or handle the messages for another.
However, yes, a single program that did not respond to messages could lock the GUI. The computer would run, but the GUI would be locked until you killed it.
That was changed in Warp 4.0. There were a number of user selectable ways that this could be addressed, depending on how much you might need the features of SIQ.
I'm not saying that OS/2 is perfect, or even valuable in the year 2004, but give me a break. You're talking about issues that were addressed between 6 and *12* years ago!
And the Workplace Shell features a level of object orientedness I have never experienced anyplace else, one that worked *extremely* well. The GUI was not pretty, but it was extremely robust, with a collection of very powerful features.
Linux IT Consulting and Domino Development in Michigan
...compared to hsqldb for my purposes. Hsql supports persisting Java objects directly into an Object-type column [preparedStatement.setObject(obj)]. Derby requires that you persist your object manually and stuff it into a (statically-sized) BLOB by manipulating streams - ick!
Also, hsql allowed ps.setObject(1, null) as a shortcut to ps.setNull(1, Types.). This was really handy.
It _looks_ like derby 10 claims JDBC 2.0 support; shouldn't it have the OBJECT data type?
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
IBM picked this up when they grabbed informix.
It is used extensively within IBM java based projects. (WSAD - the websphere IDE come with Cloudscape and works with cloudscape by default).
But its quite difficult to sell for two reasons.
One IBMs database brand is DB2, which these days scales down to small hardware.
Two cloudscapes biggest plus is that it is implemented as a single jar file, but, how do you collect license fees when anyone can copy and use your jar file?
Old COBOL programmers never die. They just code in C.
So, no, the comparison isn't fair at all.
The Raven
We use both HSQL and Derby and our experience was that while HSQL was great for small databases, it started to become impractical for medium-to-large databases. Just doing a SELECT Count(*) FROM Foo (which should be instant) can take 30 seconds or more on a large table. Also, if you do a lot of updating (incrementing statistics records, for instance) the table size can get out of hand quickly since each update effectively adds a new record to the table file (until you compact it).
:
Here are some preliminary notes one of our engineers compiled while investigating adding Derby to our project. They were just preliminary notes so I make no guarantees as to accuracy but they might be helpful...
CHAR/VARCHAR/LONG VARCHAR
Derby strictly enforces the size specification in CHAR and VARCHAR fields. CHAR fields are space extended; non-space data the does not fit in the field raises an exception on insert or update. LONG VARCHAR data cannot be ordered, grouped, or indexed. (Really!) I believe that SQLServer (and possibly MySQL) has these stupid limitations, too. It may go all the back to the SQL-92 spec. HSQLDB, on the otherhand, ignores all size specifications, treating CHAR/VARCHAR/LONG VARCHAR as synonymns for java.lang.String.
TOP/LIMIT
Derby does not support the TOP or LIMIT syntax. There appears to be a "FIRST n ROWS ONLY" syntax that was added to DB2 that never found its way into Cloudscape.
Case sensitivity
Derby appears to treat all columns as case sensitive; and there appears no way to change this. HSQLDB, on the otherhand, can be configured on a field-by-field basis. (SET IGNORECASE is used for the database default; and VARCHAR_IGNORECASE is used as the data declaration.)
IDENTITY fields
Derby uses the bizarre syntax GENERATE ALWAYS AS IDENTITY. This also does not imply that the field is a primary key. So, "IDENTITY" in HSQLDB becomes "GENERATED ALWAYS AS IDENTITY PRIMARY KEY". Derby allows specification of initial value and increment.
GENERATE ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 2)
Performance
Derby is nearly instantaneous for COUNT(*) queries on databases with large number of rows. HSQLDB appears to count the rows, resulting in very poor performance. Derby appears to have a better architecture for large databases. Queries seem to run in time proportional to the size of the result set. Many simple HSQLDB queries run in time proportional to the size of the database.
CHECK constraints
Derby supports CHECK constraints, e.g.,
size INTEGER DEFAULT 0 NOT NULL CHECK (size >= 0)
disposition CHAR(1) DEFAULT '+' NOT NULL CHECK (disposition IN ('+', '-', 'B', 'M', 'Q'))
FOREIGN KEY constraints
Derby supports inline foreign key declarations with implied column matching, e.g.,
smtpID CHAR(17) NOT NULL REFERENCES InboxEvents ON DELETE CASCADE
HSQLDB requires table-level contraints with explicit column matching:
FOREIGN KEY (smtpID) REFERENCES InboxEvents (smtpID) ON DELETE CASCADE
Cheers,
Brien Voorhees
Red Condor
Corporate anti-spam gateway service for less than $2/user/month