Slashdot Mirror


Beyond Relational Databases

CowboyRobot writes "Relational databases were developed in the 1970s as a way of improving the efficiency of complex systems. But modern warehousing of data results in terabytes of information that needs to be organized, and the growing prevalence of mobile devices points to the increasing need for intelligent caching on the local hardware. According to the ACM, the future of database architecture must include more modularity and configuration. Although no concrete solutions are included, the article is a good overview of the problems with modern data systems."

20 of 360 comments (clear)

  1. SQL Dying...film at 11...NOT! by Gorm+the+DBA · · Score: 4, Interesting
    Wow..."SQL and Relational Databases to be replaced by new technology"...film at 11.

    See "COBOL to be replaced...." for an example of just how unlikely that is...sure, the latest hip "Tres Kewl" software for business might be written in something else, but SQL will be around for a long, long time.

    Consider just the fact that "Relational Database" technology as laid out by Cobb back in the early days specifically says "You don't *HAVE* to do it this way, but it will be more effecient if you do"...realize that SQL handles Denormalized Warehouse and Datamart tables just as well as it does the 5th normal form model of perfection...and relax...it ain't goin nowhere.

  2. Relational Filesystems by Doc+Ruby · · Score: 5, Interesting

    How about just getting filesystems to be relational? Replace the ancient 1960s-era hierarchical inode database that underlies filesystems with a modern relational one. Then distributed databases can provide a more consistent platform for all our distributed apps.

    Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them. Enough fake hierarchies inconsistent with accurate data models, forcing whole technologies like Apple Spotlight, GNU Dashboard, and Google Search just to transact basic relatioships buried in the data. Enough reinvention of the wheel with every initial RDBMS schema, just a layer on top of the DB's actual hierarchical filesystem - a shell for an inode database. Enough empty promises of "WinFS" and "OLEDB" vapor - get relational filesystems into developers' hands, and developers will move beyond them, building apps that meet users actual needs, dragging the database tech along.

    --

    --
    make install -not war

    1. Re:Relational Filesystems by AKAImBatman · · Score: 3, Interesting

      How about just getting filesystems to be relational?

      For what purpose? I used to work with a Unisys NX Series (and a predecessor that I don't remember) and it had all data stored in database tables. PC files were stored in new tables with a record length matching the size of the file. It was more a PITA than actually useful.

      Enough stuffing metadata into filenames. Enough shoehorning all data into a file/folder/cabinet model, now less familiar to people than the networked infosystems that mimic them. Enough fake hierarchies inconsistent with accurate data models, forcing whole technologies like Apple Spotlight, GNU Dashboard, and Google Search just to transact basic relatioships buried in the data.

      You don't need a relational model for that. All you need is a metadata area in the FS, and a metadata indexing scheme. You can then save psudeo-directories as metadata searches. i.e. A bit like the Label system used by GMail. This concept has been done to death in the reseach community, but no one has yet had any success in getting users to accept the idea.

    2. Re:Relational Filesystems by RiotNrrd · · Score: 2, Interesting

      How about just getting filesystems to be relational? Replace the ancient 1960s-era hierarchical inode database that underlies filesystems with a modern relational one. Then distributed databases can provide a more consistent platform for all our distributed apps.

      I think that this is what Microsoft was trying to do with WinFS. The only problem would be that the RDBMS that replaced the FAT would be an embedded version of SQL Server.

      Of course this would be *great* for p2p apps!

  3. Re:KISS by mmkkbb · · Score: 1, Interesting

    There is a language called OOSQL, but looking at it makes me squirm.

    --
    -mkb
  4. Re:KISS by eggoeater · · Score: 2, Interesting

    Very true. My group had to rebuild an overly-complicated MIS system for a large call center about 5 years ago. The old system was built with RedBrick DB engine, informatica data transformations (i.e. pretty GUI..oooooohhhhh), and Brio reporting. It was a mess and NEVER worked. We replaced it with SQL Server and some relatively simple SQL stored procs and dumped the POS informatica and RedBrick.

    IMHO, there are too many GUI tools out there to do data transformations. SQL works great but most people don't want to take the time to learn it!

  5. PostgreSQL is object-relational by Anonymous Coward · · Score: 1, Interesting

    I remember the hype surrounding object databases and xml databases. What happened?

    I believe there is simply too much existing data+code that depends on traditional RDBMS infrastructure.

    To get the best of both worlds, I use PostgreSQL for both traditional RDBMS and the less well-publicized ORDBMS features.

    With version 8.0 available as a native Windows software, I think it'll start grabbing marketshare at a much faster pace. Having it available only on Linux and *nix natively kept adoption rates at bay.

    ps

    For many applications, I found sqlite3 to be sufficient. Low overhead and speeds that blow away client/server solutions like PostgreSQL or MySQL. And cdb is even faster than all of these for read-only data.

  6. It's already done by duffer_01 · · Score: 2, Interesting

    "one size no longer fits all"

    This is absolutely correct when referring to databases for different application. However, why do people always assume that they have to choose the Oracle's, MS and IBM's out there? There are already databases that have been tailored for certain application environments. Take for example http://www.ianywhere.com/ who has databases like SQL Anywhere and UltraLite which are tailored for smaller workgroups and mobile devices.
    I don't think the solution to the problem is to build a more complex non-relational system but rather to choose the right tool for the job. Why reinvent the wheel when you don't have to?

  7. A thought on XML documents by It+doesn't+come+easy · · Score: 2, Interesting

    I woke up and had an interesting thought...

    I can imagine XML documents created in such a manner that they could constitute an object from an OOP (Object Oriented Programming) perspective, containing their own schema, characteristics, relationships and data. Further, I can imagine the ability of accepting such an XML document object into a range of other things, such as a modular program (dynamic program extensibility) or a database (temporary database extension). I can imagine the ability to package up such XML documents so that databases can be built simply by linking the XML documents together.

    So (for example), one could send a request to the XML index in the sky, find and link documents containing a subset of know medical facts, and then kick off a data mining process that could discover previously unknown medical relationships. All without needing to know anything other than where to find the XML files.

    Now imagine a tool that could convert all of the terabytes of data the world is generating every day into small, linkable, OOP like XML files... Sounds like a great open source project to me...

    --
    The NSA: The only part of the US government that actually listens.
  8. Re:Multi-Terabyte Data Warehouse and MySQL by AKAImBatman · · Score: 4, Interesting

    I have no doubt that terabytes could be stored in MySQL. My overall point is that MySQL is not designed to effectively manage that much data. For example, the presentation you link to shows that Terabase is the workhorse of the business. Data is then offloaded to a disposable MySQL database for data warehousing analysis. The database is then purged after one week.

    The holy grail of information technology would be to eliminate the need for such cumbersome replications, and instead have a single, reliable data source that can be queried for any information needed at any time. Unfortunately, MySQL isn't it. ;-)

  9. MUMPS by stuffduff · · Score: 3, Interesting
    MUMPS had one data type, a string and one data structure, an array. Commands abbreviable to a single letter. MUMPS ran whole Hospital Information Systems on DEC PDP-11's where MUMPS was both the Operating System and the Language and was called Digital Standard MUMPS or DSM for short. It was way ahead of it's time, it predated SQL, and was one of it's progenitors. Anyone who offers a survey of databases and fails to include it is seriously myopic. Google 'MUMPS Language' for the tip of the iceberg.

    As for the future, I think that while OO databases are all the vogue, that O databases, where each bit of information is represented by it's own object and will have the ability to demonstrate autonomous agency is a good area for research. Then we can just let the data speak for itself! ;^)

    "Every person takes the limits of their own field of vision for the limits of the world."
    - Arthur Schopenhauer

    --
    "Can there be a Klein bottle that is an efficient and effective beer pitcher?"
  10. Sleepycat? by halosfan · · Score: 3, Interesting

    To me, this entire article reads like a plug for Sleepycat, written, not surprisingly, by a Sleepycat person.

    Relational data model is just that -- a data model. It doesn't concern itself much with implementation (and therefore, performance in any particular environment) or with how applications use the data. And that's really the point -- relational databases are application-agnostic. They are designed to store the data that will be possibly accessed by applications that are not yet conceived. That's the reason they put great emphasis on internal correctness of the data. Once you have a database specialized for one application, that's not really a database in the relational sense -- that's a way to persist your application data. And that's where Berkeley DB shines. It doesn't replace relational databases, it just serves a totally different purpose.

    --
    My only problem with Microsoft is the severity of bugs in their software.
  11. Re:KISS by NoOneInParticular · · Score: 2, Interesting

    Yes, you can expect the DB to use the index, which is subtly different from instructing the database to use the index. The whole idea behind SQL is great: a declarative language for specifying the job, and a procedural, smart, engine for performing the job. Unfortunately, databases have failed to deliver on the latter part, leading to implicit techniques to get the performance out. What I claim here is that we should take a step back, not forward, with SQL and get back to explicit programming of the databases, with strong guarentees about the performance. No more "let's add an index and I'm sure it will go faster", but, "instruct the query to first make and then use the index". Much easier to comprehend, and much faster.
    The relational model is great, but what we need for speed is absolute control over the order in which records are stored. If you know that order and you have an underperforming "general" query, you might be able to speed up the thing with smart programming instead of trying to prepare the environment in such a way that the procedural engine does its job. Say you want to insert a bunch of records which have consecutive primary keys. Writing the procedure for adding them is trivial, letting the database sort it out often leads to dissapointing performance.

  12. Issue of dimensions in RDBMS by jonbrewer · · Score: 2, Interesting

    One of the problems of current databases when is that a typical relational database doesn't have enough dimensions. Designing a table to store data is trivial - but what happens when you need to know the intersection of X and Y at time Z?

    This is a fairly common question in data warehousing: What is the data today, what did it look like yesterday, last week, and last year?

    I have seen it worked around in silly ways (snapshot and rename a table every day/week/month) and more clever ways (use separate transaction tables to record changes), but never in a particularly elegant way.

    Wiser colleagues whispered to me the dirty answer "object relational" and scurried away to their dens of Rob Zombie and J2EE. I never got my head around object relational databases before leaving that world, and so am left to ponder papers from IDC with statements like this one:

    "putting object extensions on RDBMSs is tantamount to adding stereo radios and global navigation systems to horse-drawn carriages"

    Ouch, is that a swipe at Oracle? Seems that as far back as 1997 pundits have said that the future is in ODBMS, and not RDBMS or ORDBMS. Hmm...

  13. Re:KISS by WaterBreath · · Score: 2, Interesting
    instruct the query to first make and then use the index

    build multiple indices on a 20M row table, on the fly? Is it worth the effort to build it on the fly if the database can figure out the best way to make and use the indices on its own 99% of the time?

    I think that what you're looking for is a database that will allow, but not require, you to tinker programmatically with creating and using temporary indices on the fly. It's still indespensible to have optimizing index functionality in the database for the 99% of proper uses that don't require the extra oomph. And insisting on reinventing the wheel every time simply because "I can do it better than some stupid program someone else wrote" is a rather naive development philosophy. Yes there are times when it's better. But they are vastly outweighed by the times when it's just not worth it. The need for packaged indexing functionality is there.

  14. No data integrity by Tony · · Score: 2, Interesting
    The problem with MUMPS is the data integrity issue. As essentially a collective of hashed indexes where nodes can contain both data and more branches, it sucks for *any* kind of serious data retrieval. You basically have to create and maintain your own indexes, unless you go with Fileman, which is a massively moby hack, but a hack nonetheless.

    The idea of the relational model is simple: it is based on set theory, which has a strong mathematical model. There is no equivelent model for object databases, nor for tree-based databases like MUMPS. There is no strong mathematical basis by which you can judge the integrity of your data.

    Cache', by Intersystems (a Post-Relational Database!) is based on MUMPS. You've seen their adverts here on Slashdot. They claim to be object-relational, but they are no such thing: they are MUMPS. They went on a buying spree and purchased up most of the failing MUMPS vendors (DSM, MSM, etc), and now they are the big guys in the M world.

    They have some pretty nify hacks which compiles their "object-oriented MUMPS" programming language (I forget what it's called) into straight M. Fun. Doesn't stop it from sucking hard.

    MUMPS is, at best, a fairly bizarre language with persistent storage of global arrays.

    MUMPS drives me nuts. It uses whitespace for blocking just like Python, but they had so much trouble with it, they eventually allowed a '.' to replace the whitespace, so you end up with code like this:
    S node=""
    F S node=$O(^ZZZYYY(node)) D
    .. Q:node=""
    .. W "^ZZZYYY(node)=",node,!
    .. R Pause
    (I stole that from this duscussion.)

    (Sidenote: I have to admit, my exposure to MUMPS is one of the primary reasons I despise Python's whitespace-as-blocking. It seems replaces the poor aesthetics of brace-blocking with something more error-prone and stupid-looking, though more aesthetically pleasing. But all that's just opinion. I'm sure Python is a good language, just as I'm sure MUMPS is not.)
    --
    Microsoft is to software what Budweiser is to beer.
  15. What I'd really like to be able to do... by thewils · · Score: 2, Interesting

    If any smart DBMS developers are listening, is to define a set of queries within the database (like for a _simple_ example "male" and "over 60" and "salary x") and then be able to refer to these criteria by name only, having the database build the query based on these rules as I choose to combine them (select xxx from yyy where 'male' or where 'male' and 'over 60').

    Sort of like stored procedures in implementation - they could be called stored query definitions.

    Because these query definitions would already be parsed, they don't require overhead to re-parse each time the stored query definition is executed.

    Please have this feature ready in about 6 months :)

    --
    Once I was a four stone apology. Now I am two separate gorillas.
  16. Re:I did not RTFA by AKAImBatman · · Score: 2, Interesting

    Also, until db2 v8 came out two years ago, it was pretty far behind oracle 8 & 9. Now, it's in a great competitive position.

    Last time I did a comparison, it was DB2 7.1 vs. Oracle 8. Most of what bit me in the ass with DB2 was its flakey management tools and multitude of minor details that needed tweaking from the days when it was a mainframe database. (e.g. Why does it need a buffer large enough to hold the entire blob chunk that's going to be transferred? That's just stupid. It should pull across as much as the buffer can hold, fill the requesting array, then go back for more and repeat.)

    As I said, it would only start a pointless discussion on who likes what database.

  17. Re:the usual database blatherings by kimanaw · · Score: 2, Interesting
    This "paper" (wasn't there one a couple weeks from some Microsoft dude, which was equally useless?)

    I suspect you mean this article by (the highly regarded) Jim Gray, whose opinion I'm much more inclined to seriously consider than the ramblings of the subject article's author (someone who, based on his company's product, apparently considers a B-Tree to be an innovative solution).

    But I admit I found Gray's article equally disconcerting (perhaps he's been spending too much time on the sailboat in the SF Bay, and MSFT needed a shill to push their "XML Uber Alles" and Longhorn vision?).

    SQL has managed to overcome a lot of derision over the years, and, tho the language has often been a bit slow on the uptake, it has managed to adapt admirably to abstract datatypes, recursive queries, OLAP, etc. Perhaps as importantly, the 90's left us with empirical evidence that the O-R juggernaut may have been ill conceived, with the failures of GemStone (which has since re-invented itself), and the disasterous impact of Informix's Illustra (nee Postgres) adventure.

    Let's face it, SQL just works. Most people I've met who complain about SQL usually

    • (a) don't know how to tune a database and/or
    • (b) don't know how to properly use SQL and/or
    • (c) are stuck using a half-assed DBMS
    There may be an inherent beauty/advantage to a declarative language, in that it can adapt to change by adding a few nouns, some adjectives and/or adverbs, and, as a last resort, one or two new verbs, rather than forcing a redefinition of the entire language (C -> C++, COBOL -> ObjectCOBOL, etc.).

    So, to paraphrase Twain, "The rumors of SQL's death have been greatly exaggerated."

    --
    007: "Who are you?"
    Pussy: "My name is Pussy Galore."
    007: "I must be dreaming..."
  18. Re:KISS by Stu+Charlton · · Score: 2, Interesting

    I disagree that a programmer can manually create a better query plan than (for example -- others apply) Oracle 10g. In certain cases -- perhaps. A couple of questions though....

    A) will a programmer be expected to do statistical analysis of the data, checking for value skew, and also noting the current load of the system before coming up with the best way to perform a query? This is what happens automatically today, each time you parse a SQL statement.

    B) if the skew of data changes, do they have to rewrite their query to switch access algorithms?

    C) if I want to some day perform the query in parallel, do I also have to rewrite it?

    D) if the business needs change, and I have to change the way I look up the data (its access path), do I once again have to rewrite the query?

    E) if I rely on an exact order of rows in the database, doesn't that imply an O(log n) insert / update / delete penalty (assuming a B*Tree structure, which generally is the means of preserving order on disk).

    F) if the volume of data in the table grows, wouldn't I have to rewrite the query at some point?

    These are the reasons we have SQL today. And we have things like optimizer plan stability and now (with Oracle 10g) the ability to actually re-write the plan for a poorly written SQL statement. I think it's best to stick with declarative, logical querying. I'd love a replacement for SQL, but moving back to 3GL-based physical access is a step backwards.

    --
    -Stu