Beyond Relational Databases
CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems.
But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware.
According to the ACM, the future of database architecture must include more modularity and configuration.
Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."
SQL, on the other hand:
1) Reasonably simple API
2) Scales to very large databsaes
3) Cross-platform/architecture
4) Performs very well.
Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)
SQL is a language for set operations. By itself it isn't a database or storage utility. There are some different versions similar to what you describe. Oracle's PL/SQL allows you to make temporary tables and materialized views. Neither solves the overall problem the article describes.
SQL by itself doesn't perform. It is based on the database engine, and how good the developer is. I have gotten SQL queries that took minutes to exectue in seconds by adding indexes, analyzing tables, and totally rewriting inefficient code. It is only "cross-platform" if you follow the ANSI SQL standard. Each database has it's own set of handy functions that make the code database centric.
SQL doesn't really have an API. It is a specification that is sometimes followed by database designers, and sometimes ignored. For example, in Oracle you can either use the ANSI joining sytax (LEFT OUTER JOIN) or use the (+) in the where clause.
It scales to large databases only when they are designed properly. I work with 18 terabytes of data. My sql code wouldn't work so hot if the tables weren't designed correctly. Indexing, partitioning, and table structure have more to do with performance at that level than the code. The code can make a large difference too, but if the underlying structure is wrong, even the best SQL won't help you.
/. ++
When relational systems finally began to appear (and I'm thinking specifically about IBM's System R) they were dog slow, and the extant hierarchical and CODASYL network databases of the day ran rings around them. Still do, unless you throw lots of hardware at the RDBMS.
RDBMS have lots of advantages over older technologies, but performance is not among them.
really implementing a relational model to begin with? Then we can decide if the relational model is broken or just the vendor implementation.
How about... a query language that is fully set operations compliant, i.e., something other than ANSI SQL which is a strange mixture of set and bag operations, and a mixture of relational algebra and relational calculas and some other 'extensions'.
How about... realizing that a major design goal for the relational model was data integrity. Modularity and configurability are also good goals but if you are serious about your data, integrity will be at the top of the list.
The biggest problems I see with databases is very few people understand how to use them. Here's a few tips:
1) a table is *not* a class or an object. Tables + constraints + user defined types + constraints etc. when used properly can define domains which are close to classes and objects.
2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule. As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.
3) Point 2 is often the consequence of mindlessly slurping in spread sheets or MS Access database tables. Anyone doing this has no business being within 50 feet of an IDE.
4) Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.
5) Have multiple channels for data, transaction logs, large indices and O/S or user applications to reduce bottle necks. This is expensive but for large databases going cheap will hurt you.
6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble. Do not be afraid of learning about the technology you are using. After all, technology is what you are good at, right?
7) If it is a read only database, turn off logging for speed (impossible to due under SQL Server 2000 btw). Also, if a table is on a purge and load paradigm (many reporting and/or datawarehouse tables are) turn off logging on the table level if your version of database engine allows you to do so. Likewise, turning off logging on a hand held or other single user system may be appropriate, just make sure two people do not try to use the database at the same time.
8) Avoid XML. Too much bloat.
9) Learn how to use indices on tables.
10) Learn how to read a perfomance monitor/top etc.
Postgresql is both working hard to become truly relational AND is adding support for geographic objects and objects. The MySQL crew is working hard to improve. Oracle has some nice perfomance features but I think their 'Object/Relational' implementation is broken. SQL Server is getting 'long in the tooth'. There is also a great need for temporal databases and lightwieght engines. But remember, there is no 'silver bullet', no short cuts. Just hard work to be done.
putting the 'B' in LGBTQ+
Has anyone noticed that the author of the article is from Sleepycat (which sells commercial licenses for Berkeley DB to embedded systems developers)?
.. it just so happens that Sleepycat's flagship products are Berkeley DB (a flat-file database) and DBXML (an XQuery engine built on top of that).
She puts forth a case against SQL and relational databases in general and claims that many applications (like directory services and search engines) have read-heavy, hierarchial access patterns which favour lighter-weight, non-relational, transaction-optional databases.
And