Beyond Relational Databases

← Back to Stories (view on slashdot.org)

Posted by Zonk on Tuesday May 24, 2005 @07:10AM from the data-access-for-fun-and-profit dept.

CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems. But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware. According to the ACM, the future of database architecture must include more modularity and configuration. Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."

9 of 360 comments (clear)

Min score:

Reason:

Sort:

SQL isn't a database by Nytewynd · 2005-05-24 07:30 · Score: 5, Informative

SQL, on the other hand:
1) Reasonably simple API
2) Scales to very large databsaes
3) Cross-platform/architecture
4) Performs very well.
Given the insane amount of inertia SQL has, it will extend into an object model, rather than be replaced by one. (EG: C/C++)

SQL is a language for set operations. By itself it isn't a database or storage utility. There are some different versions similar to what you describe. Oracle's PL/SQL allows you to make temporary tables and materialized views. Neither solves the overall problem the article describes.

SQL by itself doesn't perform. It is based on the database engine, and how good the developer is. I have gotten SQL queries that took minutes to exectue in seconds by adding indexes, analyzing tables, and totally rewriting inefficient code. It is only "cross-platform" if you follow the ANSI SQL standard. Each database has it's own set of handy functions that make the code database centric.

SQL doesn't really have an API. It is a specification that is sometimes followed by database designers, and sometimes ignored. For example, in Oracle you can either use the ANSI joining sytax (LEFT OUTER JOIN) or use the (+) in the where clause.

It scales to large databases only when they are designed properly. I work with 18 terabytes of data. My sql code wouldn't work so hot if the tables weren't designed correctly. Indexing, partitioning, and table structure have more to do with performance at that level than the code. The code can make a large difference too, but if the underlying structure is wrong, even the best SQL won't help you.

--
/. ++
1. Re:SQL isn't a database by mcrbids · 2005-05-24 08:22 · Score: 3, Informative
  
  SQL by itself doesn't perform. It is based on the database engine, and how good the developer is.
  
  A truth I hold to be self-evident. The language of SQL provides all the tools you need to make your application perform well, as you state.
  
  SQL doesn't really have an API.
  
  Realistically, SQL is an API. It's a highly abstracted interface for communicating between two programs. (your app, and the DB server software)
  
  It is only "cross-platform" if you follow the ANSI SQL standard.
  
  Sorta. See, I can write a script using PHP with a particular SQL call, and do the same thing in Perl, Java, ASP, C, C++, Python, Ruby, and even BASH, on Linux, Windows, Mac, or just about anything else with a tcp stack and a compiler. Sure, SQL implementations are different, with various shortcuts and extensions, but I'd call that cross platform if ever there was one.
  
  Let me ask you this: How often do you see an OSS product (EG: phpwiki) that doesn't offer support for numerous databases?
  
  --
  I have no problem with your religion until you decide it's reason to deprive others of the truth.
Synchronization by 3770 · 2005-05-24 07:30 · Score: 3, Informative

Most mainstream databases support replication. They are designed to be as fast as possible under heavy load.

Synchronization for a mobile device has another main requirement, robustness when the connection to the server is lost. A mobile device has to gracefully handle when the owner runs down into the subway.

--
The Internet is full. Go Away!!!
Re:Relational Filesystems by turgid · 2005-05-24 07:54 · Score: 3, Informative

The Pick OS.

--
Stick Men
Improving the efficiency???? by Dammital · 2005-05-24 07:56 · Score: 5, Informative

"Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems.
Huh? Go back and reread some of Codd's papers (in the late 60's, BTW) and you'll see that efficiency was never a motivator. Simplicity was his aim, filesystem details were made irrrelevant, explicit navigation was obsoleted, and a built-in security model was included.
When relational systems finally began to appear (and I'm thinking specifically about IBM's System R) they were dog slow, and the extant hierarchical and CODASYL network databases of the day ran rings around them. Still do, unless you throw lots of hardware at the RDBMS.

RDBMS have lots of advantages over older technologies, but performance is not among them.
How about.... by plopez · 2005-05-24 08:21 · Score: 4, Informative

really implementing a relational model to begin with? Then we can decide if the relational model is broken or just the vendor implementation.

How about... a query language that is fully set operations compliant, i.e., something other than ANSI SQL which is a strange mixture of set and bag operations, and a mixture of relational algebra and relational calculas and some other 'extensions'.

How about... realizing that a major design goal for the relational model was data integrity. Modularity and configurability are also good goals but if you are serious about your data, integrity will be at the top of the list.

The biggest problems I see with databases is very few people understand how to use them. Here's a few tips:
1) a table is *not* a class or an object. Tables + constraints + user defined types + constraints etc. when used properly can define domains which are close to classes and objects.

2) Learn how to normalize. A badly (or flat out not) normalized database threatens data integrity by violating the once-and-only once rule. As a rule of thumb if the table has more than 20 fields in it you should review your data model and make sure it is properly normalized.

3) Point 2 is often the consequence of mindlessly slurping in spread sheets or MS Access database tables. Anyone doing this has no business being within 50 feet of an IDE.

4) Ditch Raid 5. 0+1 will give better perfomance in most cases. Manager like Raid 5 because it is cheap, you get what you pay for.

5) Have multiple channels for data, transaction logs, large indices and O/S or user applications to reduce bottle necks. This is expensive but for large databases going cheap will hurt you.

6) Learn a little theory, it won't hurt you. In fact it can save a large amount of time and trouble. Do not be afraid of learning about the technology you are using. After all, technology is what you are good at, right?

7) If it is a read only database, turn off logging for speed (impossible to due under SQL Server 2000 btw). Also, if a table is on a purge and load paradigm (many reporting and/or datawarehouse tables are) turn off logging on the table level if your version of database engine allows you to do so. Likewise, turning off logging on a hand held or other single user system may be appropriate, just make sure two people do not try to use the database at the same time.

8) Avoid XML. Too much bloat.

9) Learn how to use indices on tables.

10) Learn how to read a perfomance monitor/top etc.

Postgresql is both working hard to become truly relational AND is adding support for geographic objects and objects. The MySQL crew is working hard to improve. Oracle has some nice perfomance features but I think their 'Object/Relational' implementation is broken. SQL Server is getting 'long in the tooth'. There is also a great need for temporal databases and lightwieght engines. But remember, there is no 'silver bullet', no short cuts. Just hard work to be done.

--
putting the 'B' in LGBTQ+
Hmm .. by ghakko · 2005-05-24 08:38 · Score: 5, Informative

Has anyone noticed that the author of the article is from Sleepycat (which sells commercial licenses for Berkeley DB to embedded systems developers)?

She puts forth a case against SQL and relational databases in general and claims that many applications (like directory services and search engines) have read-heavy, hierarchial access patterns which favour lighter-weight, non-relational, transaction-optional databases.

And .. it just so happens that Sleepycat's flagship products are Berkeley DB (a flat-file database) and DBXML (an XQuery engine built on top of that).
JSR 170 - Unified Data Repository by uss_valiant · 2005-05-24 08:59 · Score: 3, Informative

Another approach to the problem: JSR 170: Content Repository for JavaTM technology API
Standardizing the interfaces to various data resources (filesystem, database, cache, ...).

The expert group reads like a who's who in data management. And it seems to be very near to the final draft.
Re:KISS by ejamie · 2005-05-24 13:09 · Score: 3, Informative

I fully agree with this the parent post. I've built several systems (web or otherwise) running on SQL Server over the last 5 years.

I've learned this from experience myself: There is no reason why a database driven application should be slow, provided the database is layed out appropriately and the built-in performance facilities of the RDBMS are utilized.

Show me a slow performing database-driven application, and I will show you a set of indexes, stored procedures, vertical/horizontal table partitioning, or table normalization/denormalization that will fix the problem.

The other cool things about DB optimization (one of my favorite development subject areas), is that generally putting in effort to optimize in these areas will give you an immediate and many-fold increase in performance. Not so easy to do in non-RDBMS environments!

--
Hey! Stop copying my sig!!! Stop copying my sig!!! Stop copying my sig!!! Stop copying my sig!!!