Slashdot Mirror


Are Relational Databases Obsolete?

jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."

76 of 417 comments (clear)

  1. They're not mutually exclusive. by KingSkippus · · Score: 5, Insightful

    Okay, at the risk of sounding stupid...

    Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

    Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

    1. Re:They're not mutually exclusive. by XenoPhage · · Score: 5, Interesting

      Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

      Agreed. It definitely looks like a storage preference. Though column-based storage has definite benefits over row-based when it comes to store once, read many operations. Kinda like what you'd find in a data warehouse situation...

      Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

      Hrm.. You must be new here....

      --
      XenoPhage
      Technological Musings
    2. Re:They're not mutually exclusive. by stoolpigeon · · Score: 4, Interesting

      You are exactly right and this is backed up by the home page for c-store. It says: "C-Store is a read-optimized relational DBMS " - c-store is the open source project that apparently is the basis for Vertica - Stonebraker's commercial offering.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    3. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 5, Funny

      Well I just turned my server on its side and now all my tables are storing in columns. I love new technology.

    4. Re:They're not mutually exclusive. by OECD · · Score: 5, Insightful
      Congratulations! You figured out what an opinion is.

      An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.

      --
      One man's -1 Flamebait is another man's +5 Funny.
    5. Re:They're not mutually exclusive. by homb · · Score: 4, Interesting

      I wish we could put this thing to rest once and for all. And I wish so-called "experts" in the field actually were.

      Rule of thumb:
      - you use row dbs for OLTP. They're great for writing.
      - you use column dbs for data mining. They're amazing for reading aggregates (average, max, complex queries...)

      The major problem with column dbs is the writing part. If you have to write one row at a time, you're screwed because it needs to take each column, read, insert into it and store. If you can write in batch, the whole process isn't much more expensive. So writing a single row could take 500ms, but writing 1000 rows will take 600ms.
      Once the data's in, column dbs are the way to go.

    6. Re:They're not mutually exclusive. by Bonewalker · · Score: 2, Interesting

      So, what we need is a morphing database structure depending on need. While I am writing, I want it to be row-based. As soon as I am done and ready for reading, it should switch to a column-based structure. Who's up for starting a new project on sourceforge.net? The shape-shifting relational database! SSRD is the wave of the future!

  2. C'mon, the guy is biased! by winkydink · · Score: 4, Funny

    The name of his blog is The Database Column after all.

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

    1. Re:C'mon, the guy is biased! by everphilski · · Score: 4, Funny

      database row would sound too much like prison :P

  3. Mod Article -1 (Author doesn't get it) by DrinkDr.Pepper · · Score: 5, Informative

    Relational databases aren't being obsoleted. Some schema design heuristics are.

    --
    0xfeedface
  4. dual-mode db? by 192939495969798999 · · Score: 4, Interesting

    Is there a dual-mode db, that lets you create a row-based or column-based "table"? I imagine cross-mode queries would kill performance, but at least you could have a system front-loaded with row tables, where data comes in, and then archive this data over time into the column-based tables, so that reads were fast.

    --
    stuff |
    1. Re:dual-mode db? by XenoPhage · · Score: 2, Informative

      I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.

      --
      XenoPhage
      Technological Musings
    2. Re:dual-mode db? by stoolpigeon · · Score: 2, Funny

      You can get MS SQL Server to store tables differently than the default? It will write columns to disk as opposed to rows? You can store columns in their own files? It's been a few years since I worked with SQL Server - but I really don't remember those features. Is it a SQL Server 2005 thing?

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    3. Re:dual-mode db? by tinkertim · · Score: 2, Informative

      I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.

      The FA threw me for a loop a couple of times, I honestly _did_ try to read it :) Correct me if I'm incorrect, but wouldn't having a service for column stores be (usually) not needed on most Unix-like platforms? Since this is mostly reading, I would think such efforts might be better spent on sqlite (or similar)?

      If your in a situation where you're mostly reading with (likely) only one infrequent writer, wouldn't eliminating the overhead of a database service entirely be desirable?

      I can't think of a situation where you would want many frequent writers to a column store schema, again, correct me if I'm off.
    4. Re:dual-mode db? by mr_mischief · · Score: 2, Interesting

      That would actually be a smart way to handle it. Lots of apps write only to a single master (or its fallback) and read only from the slaves already. If you had a row-based master for fast writes and replicated to column-based slaves that could be a real win.

      RLE on the data columns is a pretty big win for column-based stores, too. If the slaves manage RLE during a replication, you could have one hell of a DB farm.

    5. Re:dual-mode db? by einhverfr · · Score: 2, Informative

      Ok, think about it this way:

      If you are doing killer aggregates (tell me the sum of the sales in every month for the last 25 years), you are going to be limited by possibly 2 things: CPU cycles and disk I/O throughput.

      There are several ways of addressing these issues. Basically this means either optimizing or parallelizing. Column-oriented stores are likely to help optimize the disk i/o throughput so you can just thow more processor effort at the problem.

      You can also do what Teradata and BizgressMPP do which is basically break the database into lots of little db's running on different servers and distribute the disk and processor time that way. Iirc Oracle and DB2 also offer this sort of option. Depending on how this is set up, it may be possible to use in an OLTP environment as well.

      On the other hand, suppose you are doing a lot of updates in a high transaction load environment. Which would you rather do? Update each column value (and skip around the disk doing this) or update the entire row on one disk block?

      Column-oriented databases are helpful for some things. They are not the only solution to the problem out there. And they are still relational. Hint: They are still based on Edgar Codd's relational mathematics.

      --

      LedgerSMB: Open source Accounting/ERP
  5. well by stoolpigeon · · Score: 5, Informative

    every article linked makes it clear that this is about warehousing as opposed to oltp. so is the technology dead? no - can it do everything? no

    --
    It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
  6. Rotate by Kozar_The_Malignant · · Score: 5, Funny

    >"column stores will take over the [data] warehouse market over time, completely displacing row stores."

    Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy.

    --
    Some mornings it's hardly worth chewing through the restraints to get out of bed.
    1. Re:Rotate by Hijacked+Public · · Score: 3, Informative

      The most recent release (2007) will handle 2^14 columns.

      --
      "Sacrifice for the good of The State" - The State
    2. Re:Rotate by ben+there... · · Score: 5, Insightful

      Excel only handles 255 Columns. It should be noted that if you've designed a database (rather than an Excel abomination) with more than 255 columns, chances are, you're doing it wrong.
    3. Re:Rotate by kpainter · · Score: 2, Funny

      Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy. I tried that but my neck gets tired after a few minutes.
    4. Re:Rotate by Anonymous Coward · · Score: 2, Funny

      ". . . more than 255 columns, chances are, you're doing it wrong"

      Because there are only 254 days in the year

  7. The guy... by AKAImBatman · · Score: 5, Interesting

    ...is duping himself and thus Slashdot is duping the stories by extension.

    Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior. While I think he has a point, I'm not sure if he really goes far enough. Our relational databases of today are heavily based on the ISAM files of yesteryear. Far too many products threw foreign keys on top of a collection of ISAMs and called it a day. Which is why we STILL have key integrity issues to this day.

    It would be nice if we could take a step back and re-engineer our databases with more modern technology in mind. e.g. Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars.

    The demands on our DBMSes are only going to grow. So there's something to be said for going back and reengineering things. If column-oriented databases are the answer, my opinion is that they're only PART of the answer. Take the redesign to its logical conclusion. Let's see databases that truly store any data, and enforce the integrity of their sets.

    1. Re:The guy... by AKAImBatman · · Score: 2, Insightful

      Or in other words column wise read would be fast but write slow. I could see this being an option within DBs in the future. As it is just a data layout problem not a language problem.

      An interesting idea for improving database technology is to actually change the way that database data is mirrored in a disk array. Rather than writing EXACT duplicates of the data, perhaps one set could be written in row-oriented form, while the other set would be written in column-oriented form. This guarantees that the data is always duplicated, but offers a new option to the query engine for retrieving the data.

      The primary issue I see is not creating an issue of "the worst of both worlds". Obviously row-oriented data is going to be faster to write. Thus you can't wait for the column oriented data to finish writing. A secondary process will have to manage that in parallel, with the assumption that the database will report the data as "committed" as soon as the row-oriented write is complete.
  8. IMS--Hierarchical DB Still Exists by curmudgeon99 · · Score: 5, Insightful

    You've all heard of the IBM product called DB2, right? So what was DB1? Answer: IMS, which is a hierarchical database. They were a pain in the ass to use--PSBs and all--but they were/are faster than hell and I doubt any company is going to throw them out for any reason. Same goes for relational databases. They're going nowhere. Sure, we have room for more but nobody is going to displace the RDBMS anytime soon.

  9. Re : Are Relational Databases Obsolete? by littlefoo · · Score: 5, Funny

    No. There, that was easy !

    It's like the packet of crisps that says "Is there a 20 pound note in here !!?" - the answer should always be 'No'.

    Except maybe for one person.

    sed -e 's/crisps/potato chips/' -e 's/pound/dollar/'

  10. that doesn't mean they're going to become obsolete by Arathon · · Score: 5, Insightful

    Obviously, he's biased. But more importantly, he just said that column-store databases are going to take over the WAREHOUSE market. That doesn't mean that row-store databases are going to become obsolete, because there will always be applications out there that do a substantial amount of writing as well as reading.

    In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.

    So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...

  11. Marketing hype by FUD.. typical by cwford · · Score: 2, Informative

    From TFA:

    "Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. "

    Marketing hype for his startup.

    What a sleezeball.

  12. Yea, it's all the same. by SatanicPuppy · · Score: 5, Insightful

    Column stores are great (better than a row store) if you're just reading tons of data, but they're much more costly than a row store if you're writing tons of data.

    Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.

    On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.

    Why do people insist that one size really does fit all?

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    1. Re:Yea, it's all the same. by theGreater · · Score: 5, Interesting

      So it seems to me the -real- money is in integrating an RDBMS which, for usage purposes, is row-oriented; but which, for archival purposes, is column-oriented. This could either be a backup-type thing, or an aging-type thing. Quick, to the Pat(ent)mobile!

      -theGreater

    2. Re:Yea, it's all the same. by stoolpigeon · · Score: 4, Interesting

      Maybe, but I doubt it. The money is in the data warehouse market and the etl tools that move the data from the oltp environment to the warehouse environment. I think what the author points out is not that people are trying to use the same database to do both, but rather that they are trying to use the same product to both. He says it would make more sense to use Oracle (for example) for oltp - and something else for the warehouse, rather than trying to get Oracle to do both well.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    3. Re:Yea, it's all the same. by KingSkippus · · Score: 5, Insightful

      Why do people insist that one size really does fit all?

      I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.

      He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"

    4. Re:Yea, it's all the same. by GeckoX · · Score: 5, Funny

      I've got it!

      Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!

      Just think about it, what could be better than one file for each column in each row?

      And they said I couldn't have my cake and eat it too, sheesh ;)

      --
      No Comment.
    5. Re:Yea, it's all the same. by jkrise · · Score: 5, Funny

      I went back and read the original article....

      Oh, the horror! That's a heinous crime on Slashdot! Not even the editors do that!!!

      --
      If you keep throwing chairs, one day you'll break windows....
    6. Re:Yea, it's all the same. by Jerry · · Score: 4, Interesting
      Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!


      You are years late. The PICK operating system/db already does that. Back in 1985 I used the DOS based Advanced Revelation to write GAP accounting packages. It used the ASCI 253 character to separate "columns" of data in a cell. Reading and writing was parsed automatically. Invoice information, for example, was stored in a Customer's info table, not in a invoice table, and doing a query on accounts receivable produced very fast results. Symbolic dictionary definitions containing formulas allowed for easy column and row totals.

      In fact KDB/K looks a lot like a PICK system that uses FORTH as the language.

      --

      Running with Linux for over 20 years!

    7. Re:Yea, it's all the same. by SixDimensionalArray · · Score: 2, Informative

      Just to add to that - PICK/D3 is called a "multivalue" database, and there are a lot of interesting kinds of databases out their that use this "multidimensional"/sparse array storage concept, such as the ancient MUMPS legacy system/database (now implemented as Intersystem's Cache product) found in old Veteran's Affairs (VA) systems in the US.

      I actually wonder if some of the current databases such as Microsoft SQL Server, etc. aren't going to actually start morphing into these older styles of databases due to the increase in use of XML files/data, which by their very nature are hierarchical and kind of "multidimensional". Actually, the company that now maintains PICK/D3 (Raining Data) has an interesting XML database (Tigerlogic) that uses some of the old technology and new technology combined.

      This could be a great and possibly painful experience of history repeating itself!

      SixD

    8. Re:Yea, it's all the same. by bjourne · · Score: 2, Interesting

      The sane thing to me seems to be to just skip the oltp step since the data contained therein is a strict subset of the data in the warehouse. Let's say you design a relation database for a company. You have a table with Employees and another with their Positions, a Project table and so on like the orthodox relational model tells you to. Which works great until someone quits, then you have a problem.

      Naturally, you don't want to delete that person because then you lose lots of important archival data. So you introduce two fields in the Employee table which contains the dates when the employee started and quitted. You also need to do the same for almost all other data. If Bob gets a raise, you don't want to simply update his salary field, you want the data to state that Bob had salary $x during the dates $A-$B and $y during the dates $B-$C. So you need even more fields for validity intervals to fully describe your data. It gets incredibly messy, all your queries becomes really complicated and you need to employ stored procedures and triggers to keep your data intact. The relational model breaks down.

      What you need at this point, is some kind of "event based data storage." Like "Bob got a 5% raise 20070801" or "Company address changed from $x to $y 20001010." For these kinds of data, a warehouse is much better suited. A column based storage would probably also be much faster than a row based for it.

    9. Re:Yea, it's all the same. by SatanicPuppy · · Score: 2, Informative

      Not sure how that follows...The column style database would be functionally identical to the row style database, only you'd have column proliferation instead of row expansion.

      The easiest way to deal with proliferating events is to create a very simple table that has a timestamp, your basic audit information (user who made the change, change the terminal was made from, etc), and the change itself.

      So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee record id, bob's salary, the date, and the audit crap. That's it. Then when bob gets a raise to 55,000 there is another simple entry, id, salary, date, audit crap. Etc, etc. All your data is there, you can easily retrieve the history, you know when the changes were made and by whom.

      It's all about normalization. Why put in two date fields if you don't have to? Two records, each with one date, will give you all the info you need, simplify your queries, whiten your teeth, etc. Whenever you have an event driven model, just throw the event, in the simplest possible form, in a table. If your tables start proliferating out of control, check your normalization and make sure you're not duplicating data across multiple tables. If that's not the problem, try to refine the scope of your database. If it's doing too many things, then try to break things out by their relevance.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    10. Re:Yea, it's all the same. by Anonymous Coward · · Score: 2, Informative

      No, it's not "PIC", it's "PICK", named after it's creator Richard Pick who originally wrote Generalized Information Retrieval Language System (GIRLS) in 1965 with Don Wilson to help inventory helicopter parts during the Vietnam War.

      http://en.wikipedia.org/wiki/Pick_operating_system

      PIC is something completely different.

    11. Re:Yea, it's all the same. by CodeBuster · · Score: 2, Informative

      PICK has some very serious disadvantages over the relational databases, not the least of which is the rather annoying property that queries often do not return *ALL* of the related records unless the person writing the query has a very intimate knowledge of the data cubes and how the database is connected and even then there are no guarantees. The brilliance of atomicity defined by Codd and the query guarantees that one is able to make when the "one value per cell" rule is enforced are not of trivial benefit. PICK and multidimensional can be convenient in some circumstances, but the marketplace has basically decided that those niches are not worth the hassle of giving up the query benefits of the relational model for more general cases, especially when absolute speed of the query is less important than completeness of the information returned. It has also not helped that multivalue database vendors, the few that are left anyway, have generally lagged behind the relational database vendors in terms of tool support and integration with other systems.

      Note: I may be somewhat biased because of my long use of relational databases and a really bad experience updating a jBase solution that really was "long in the tooth". You might argue that I just saw a bad example of multivalue database application, but I think that it goes deeper than that or else we would all be using PICK today and not some flavor of SQL.

    12. Re:Yea, it's all the same. by c.gerritsen · · Score: 2, Informative

      Actually, Stonebraker did say, and I quote the original article that you went back and read (with a dereferenced pronoun in []),

      [The current major relational DBMSs] should be considered legacy technology, more than a quarter of century in age and "long in the tooth".

      He called the existing major RDBMSs legacy and long in the tooth for not implementing the feature he is trumpeting, column storage, as an option when setting up a DB for use. He laments the fact that these vendors don't give you the option of setting up your DB in a way that provides huge performance gains for some usages that are becoming more common everyday, and says that there will be a revolution.

      He is not saying that we don't still need row-oriented RDBMSs...which makes me wonder, does Vertica support both row and column stores?

    13. Re:Yea, it's all the same. by Allador · · Score: 2, Insightful

      The answer to what you're describing is not to give up relational dbs, but to design your schema correctly.

      There are tried and true approaches to the problems you describe; several actually for most of them, depending on your needs.

      If you run into these problems due to an evolutionary growth into these features, then its time to stop, take a step back, and re-architect your schemas to handle these needs from the get go.

      There's no reason at all to resort to hacks like stored procedures and triggers. These are only used when your schema is fundamentally mismatched to your needs, and so you have to be continuously cleaning up after your data mods.

      Any mature system doesnt ever delete 'entity' objects from the DB. That would just be silly (and wouldnt be possible if you had referential integrity set up correctly). So all of these sorts of entities have an 'active' column, that determines whether they're active or not.

      In the salary example you give, you DO want to store the 'current' salary field in the employees table, but then you also have a 'positions' or 'incumbents' table to store the history of things that change over time.

      I think the core problem you're describing is due to the fact that you're trying to store all of this stuff in one table. And thats not the correct approach.

      The 'event based storage' you're talking about is another table, with one row per event. Some folks advocate another approach, where entity tables are versioned, and all old versions are kept (sometimes moved to another table), so you can see the history of all changes for all time. Thats not a very well normalized solution, but there are times when its appropriate.

  13. Aha! by Stanistani · · Score: 4, Funny

    The next big thing in DBMS:
    turning your head sideways.

  14. Should be, but isn't, and won't. by dada21 · · Score: 4, Interesting

    In my IT business, a vast majority of our top tier clients (grossing over US$100 million annually) are still using antiquated software that is still using a relational database backend. While these companies are generally VERY efficient in terms of providing services or products to their market, their accounting, purchase orders and project management software is decades outdated. Many of the companies that maintain these packages have merely made the interface more current (but still 5+ years old, but are still using terribly outdated software. I can't begin to tell you how often the words "FoxPro" and "MS SQL" come up and it ends up being a relational database "solution" or even worse.

    It is very frustrating because we do have programmers on staff that create third party plug-ins to these databases to try to make solutions that the OEM code doesn't. When you meet younger programmers, many of them are frustrated themselves to work on ancient solutions that have no hope of being upgraded, because these industries we work in are not in a rush to try anything new and shiny, but instead are happy with the status quo.

    I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is." That's money thrown out the window, employees who are frustrated (without knowing why), and forcing the company to lose efficiency by not being able to compete with newer companies that are utilizing newer technology to better their bottom line.

    Ugh.

    1. Re:Should be, but isn't, and won't. by Orange+Crush · · Score: 2, Insightful

      I've been in the banking industry for the past 6 years and every bank I've worked at has relied on text-only server side applications that we connect to via various terminal emulators. The workstations are all modern, but we don't use anything more taxing than excel and an e-mail app.

      Why have none of them changed beyond a few interface bolt-ons? Well . . . one of them actually did once . . . and it wasn't pretty. Sure it was graphical and point-and-click and more "user friendly" in appearance. But the fact of the matter is that we were a production environment and what could be done with hotkeys in 3 steps was now a 12 step process clicking widgets etc. To be fair, there were still a few hot keys, but they were all different and everyone had to relearn. Productivity never quite got back to what it was under the old system.

      Larger companies will kick and scream to avoid "upgrades." Many of them have had horrifying experiences and they just can't risk getting stuck with software worse than what they have now.

    2. Re:Should be, but isn't, and won't. by Doctor+Faustus · · Score: 2, Informative

      I can't begin to tell you how often the words "FoxPro" and "MS SQL"
      You do know those aren't remotely comparable, right? FoxPro scales to more users than Access (due to tables separated into different files), but they're otherwise on a similar level in terms of what sort of jobs they're appropriate for. MS SQL Server is a full-fledged enterprise RDBMS. It may not scale quite as far as Oracle or DB2, but it get closer every generation, and having worked mostly in Oracle for the last year or so, I've been missing SQL Server.

      I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is."
      That was probably a polite way of saying "We don't believe you.". Maybe you made a case to them for why relation is obsolete, but you certainly didn't here.

  15. Simple solution. by fahrbot-bot · · Score: 3, Funny

    I tried turning my Oracle server on its side to get column-store access. Strangely, I didn't see any increase in performance. Perhaps I'll try the other side...

    --
    It must have been something you assimilated. . . .
  16. Object Databases by jjohnson · · Score: 3, Interesting

    Are they now officially an also-ran? Has the whole concept failed to be usefully implemented commercially, or will it be another Lisp--elegant, beautiful, and largely unused because it's kind of weird?

    --
    Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
    1. Re:Object Databases by SashaMan · · Score: 2, Insightful

      In a word, yes. I think there are a couple reasons for this:

      1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
      2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker. Suppose an object database can handle 95% of my queries with adequate performance. All well and good, but I'm totally screwed on those other 5%. On the other hand, if I was using a relational database with hibernate, hibernate might handle 95% of the queries with adequate performance, but for those other 5% I can workaround by writing custom SQL. With that setup I get the best of both worlds.

      I don't know of any attempts to use object databases on large enterprise projects that haven't been complete failures, with the failure always due to performance issues.

  17. He may have a point by duffbeer703 · · Score: 2, Interesting

    For data warehousing, a higher or different level of abstraction may be useful and make database design easier, particularly as paralellism becomes more and more common. Storing rich markup language or media in a database might be problematic as well.

    But there's no way that RDBMS's are going away -- relational algebra simply solves too many data storage problems.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  18. Are relations obsolete? by roman_mir · · Score: 4, Informative

    Once someone shows that there is no longer a use for any relationship between data entries, then we'll be able to say that RDBMSs are obsolete. Actually both headlines (/. and the linked article) are mistaken about what Michael Stonebraker is saying. He is talking about read intensive applications mostly and he is talking about optimization of data for reading purposes. This does not mean that RDBMSs are obsolete for all uses, just that he sees a faster way to retrieve data for certain uses.

  19. Re:Re : Are Relational Databases Obsolete? by ahem · · Score: 2, Funny
    Your sed is missing two expressions for full i18n:

    sed -e 's/crisps/potato chips/' -e 's/pound/dollar/' -e 's/note/bill/' -e 's/packet/bag/'

    --
    Not A Sig
  20. paradigm shift! by sohp · · Score: 4, Funny

    Along with Procedural Programming, this could REVOLUTIONIZE the software industry!!

  21. rtfa before posting by jilles · · Score: 3, Informative

    I can understand people not reading every link on a slashdot article they comment on. But if you post the bloody link, is it too much asked to actually RTFA?! It's an article about a column. The actual column is quite interesting.

    To add some content, this is about optimal storage for SQL databases in a data warehouse context where there are some interesting products that use something more optimal than the one size fits all solutions currently available from the big RDBMS vendors. The API on top is the same (i.e. SQL and other familiar dataware house APIs), which makes it quite easy to integrate.

    Regarding the obsolescence question, one size fits all will be good enough for most for some time to come. Increasingly people are more than happy with lightweight options that are even less efficient on which they slap persistence layers that reduce performance even more just because it allows them to autogenerate all the code that deals with stuffing boring data in some storage. Not having to deal with that makes it irrelevant how the database works and allows you to focus on how you work with the data rather than worrying about tables, rows and ACID properties. Autogenerating code that interacts with the database allows you to do all sorts of interesting things in the generated code and the layers underneath. For example, the hibernate (a popular persistence layer for Java) people have been integrating Apache Lucene, a popular search index product, so that you can index and search your data objects using lucene search queries rather than sql. It's a quite neat solution that adds real value (e.g. fully text searchable product catalogs are dead easy with this).

    Column based storage is just an optimization and not really that critical to the applications on top. If you need it, there are some specialized products currently. The author of the column is probably right about such solutions finding their way into mainstream products really soon. At the application level, you'll still be talking good old SQL to the damn thing though.

    --

    Jilles
  22. Wrong approach? by Aladrin · · Score: 3, Interesting

    Maybe his approach is all wrong. The database my company uses has MANY tables that are rarely written to, but a few that are written to ALL the time. Instead of trying to cram his 'one size fits all' database scheme down our throats and replace the current 'one size fits all' database scheme, maybe he should be trying to create a database engine that can do both.

    I think you would have to determine the main use of the table beforehand (write-seldom or write-often), but the DB engine could use a different scheme for each table that way. I know some will claim that it can't be more efficient to split things this way, but remember that this guy is claiming 50x the speed for write-seldom operations.

    As for Relational Databases... How is this exclusive to that? This is simply how the data is stored and accessed. If he is claiming 50x speed-up because he doesn't deal with the relational stuff, that's bunk. You could write a row-store database with much greater speed as well, given those parameters.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
  23. Specialized versus generalized? by dpbsmith · · Score: 3, Interesting

    '"In every major application area I can think of, it is possible to build a SQL DBMS engine with vertical market-specific internals that outperforms the 'one size fits all' engines by a factor of 50 or so," he wrote.'

    I know very little about DBMS systems, but I thought it has always been true that you can achieve monumental performance increases by building somewhat specialized database systems in which the internals of the system make assumptions, and are tied to, the structure of the data being modelled. In fact, when RDBMS systems came in, one of the knocks on them was that they were far more resource-intensive than the hierarchical databases they displaced. However, the carved-in-stone assumptions of those models made them difficult and expensive change or repurposed.

    I'm sure I remember innumerable articles insisting that "relational databases don't need to be really all that much terribly slower if you know how to optimize this that and the other thing..."

    In other words, as an outsider viewing from a distance, I've assumed that the increasing use of RDBMS was an indication that in the real world it turned out that it was better to be slow, flexible, and general, than fast, rigid, and specialized.

    So, what is a "column store?" It sounds like it is an agile, rapid development methodology for generating fast, rigid, specialized databases?

  24. Perl Objects have both column and row DB advantage by goombah99 · · Score: 4, Interesting

    Traditionally perl-objects are hashes with one blessed hash per instance. The hash contains all the instance variable values using their names as keys.

    instead one can use blessed scalars holding a single integer value for instances and let the class variable contain all the instance data in arrays indexed by the instances scalar value.

    This technique was originally promoted as an indirection to protect object data from direct manipution that bypassed get/set methods. But it also allows the object to be either row or column oriented internally. that is the class could store all the instance hashes in an array indexed by the scalar. or it could store each instance variable in a separate array that is indexed by the scalar value.

    Thus the perl class can, on-the-fly, switch itself from column-oriented to row-oriented as needed while maintaining the same external interface.

    Of course this is not a perl-exclusive feature and it can implemented in other languages. It just happens to be particularly easy and natural to do in perl.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  25. Soon... by lpangelrob · · Score: 3, Funny

    The near future. Mr. Stonebraker walks into a store.

    Mr. Stonebraker: How much are these plums?
    Checkout girl: Plums? They're $0.99, $1.39, $12.49, $15.99, $26.38, $13.37...

  26. Are Relational Databases Obsolete? Not at all by SpaceLifeForm · · Score: 3, Informative
    The Slashdot article headline is trollish.

    The relational concept will still exist regardless of the underlying storage methods.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
    1. Re: Are Relational Databases Obsolete? Not at all by lgw · · Score: 2, Interesting

      The "relational" concept existed before SQL, really. It's just a question of whether you want to do the work on the client side or the server side.

      The modern RDBMS is good when the pipe from client to server is much smaller than the pipe from server to backing store/cache. Minimal communication for maximum results. The trade-off, of course, is that the server needs lots of resources because it's doing significant work on behalf of every client.

      "Non-relational" databases still have their place today, however, in embedded/appliance environments. When the pipe between the client and server is very large (shared memory), and the server has only a token amount of memory for cache, a RDMBS is a poor choice. The distinction is only the implementation: nothing stops you from using relational concepts and pushing the work to the client side. But the implementation difference really changes what's fast and what's slow.

      Similarly, a column-based store is just an implementation difference from a traditional RDBMS, but it also makes a real change in what's fast and what's slow, which of course means a real change in what makes a good DB design.

      --
      Socialism: a lie told by totalitarians and believed by fools.
  27. Re:IMS--Hierarchical DB harder to use? by cdn-programmer · · Score: 5, Interesting

    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    I gave up on that interface.

    ---------------

    Oracle had pre-compilers. They did the same damn thing. The code generated by the pre-compilers was just awful.

    ---------------

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Nooo...

    In the old days one could read and write into a flat file at a given location with a single statement or function call depending on the language. Of course "where" to read and write became a real issue and I do fully understand the complexity of file based tree structures and so forth, especially since I wrote a lot of code to implement these algorithms.

    The thing is now we have RDBMS and other solutions that give us the data organisational abilities we need - and we lose the ease of mapping these structures into a suitable structure or object in the programming language.

    I for one do not think we have stepped forward very far at all.

    -------------

    I'll toss in a case in point made by a good buddy of mine who just happens to be one of the top geophysical programmers in this city.

    One of his clients was running an application hooked to an Oracle database running on a fast SUN. Run times were measured in close to a day.

    Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.

    As my buddy says - He will NOT use any RDBMS. He can take 5 views of the data comprising 1000's of seismic lines and the user can click on any trace number, line number, well tie and so forth and in real time he can modify all views of the data on as many as 5 s

  28. Re:IMS--Hierarchical DB harder to use? by Sharkeys-Day · · Score: 2, Informative

    I think a place to start is to ask how to map language structures and RDBMS structures into a common denominator. One should never be looking at function calls with over 1000 parameters. That is just plain stupid. One should also never be dynamically mapping each and every tidbit of every field in a row on the fly at run time and especially so for each row in a table. Quite right, which is why programmers who still have have their sanity use JDBC or DBI. This part of your problem has already been solved at least twice.

    Here's how I write out a customer record:

      $dbh->do(
          'insert into customer (id,name,yada1,yada2,yada3) values (?,?,?,?,?)',
          undef,
          @customer{id,name,yada1,yada2,yada3}
      );

    I think that's even easier than your 3rd-gen code, and I didn't have to write my own indexing code.
  29. Far from it! by jbeaupre · · Score: 2, Funny

    Living in Kentucky, I can tell you a relational database would be handy if actually used. Why I've got in-laws that, well, I won't go into details. Let's just say I suspect my wife married an out of stater for fresh genes.

    What? Not that kind of relational?

    --
    The world is made by those who show up for the job.
  30. Stonebraker's current track record by WindBourne · · Score: 5, Insightful
    1. He helped created THE first relational DB.
    2. He later moved to creating an Object-Oriented Relational DB with Postgres in the 80's. Much of that tech has found its way into other DBs such as Oracle and even helped create the OODBMS world.
    3. Now, he is creating the Column store DB and announces that this will be the next big thing.
    I would listen to him. Biased or not, He has a better track record than most intelligent ppl (and all the wanna-be/hasbeens; dvrack comes to mind) in the tech field.
    --
    I prefer the "u" in honour as it seems to be missing these days.
    1. Re:Stonebraker's current track record by WindBourne · · Score: 4, Informative

      Not Quite. Stonebraker was THE core developer on Ingres which was the second relational DB created (System R was the first developed by Codd et.al using SQL). He operated at both ends of the spectrum. Gray/Mohan did the same as well. While System R disappeared, Ingres was developed into a major company (ingres) and was ultimately bought by CA.
      Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.

      Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.

      Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.

      So armed with that knowledge, exactly WHY would you discount his work and his statements?

      --
      I prefer the "u" in honour as it seems to be missing these days.
  31. Common Business Mistake by C10H14N2 · · Score: 2, Insightful


    #1: Assuming what you think your customer needs is what your customer wants.
    #2: Assuming they are the ones who made the mistake when you lost the job.

  32. Re:IMS--Hierarchical DB harder to use? by einhverfr · · Score: 3, Interesting

    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    I think there is a key distinction here. Application object store vs data management. Hierarchical db's are far better at storing object information, but *far* worse at real data managment.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    In LedgerSMB, we solved this by putting a functional interface in the db. Then we dynamically map the objects and their properties into functions and arguments. Works great :-)

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    You are either trolling or you need to fire the DB architect who designed that. THere is *no way* that a 100-column table is good DB design. (Ok, mathematically, there is nothing that precludes it being good db design, but I can't even imagine a scenario where this would be OK).

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    Oh, you were the one who designed the 100-column table. Sorry..... Please go out and get some books on db design. You will thank me :-)

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    IMO, your problem honestly is in the fact that you are using a monkey wrench as a ball peen hammer. It may sorta work but you are using the wrong tool for the job. If you want a simple object store use BDB or something like it. If you want a real data management solution, build your db *first.* If that is not your goal, use something other than an RDBMS.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    How is this an issue with RDBMS's?

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    Again, this is not a PostgreSQL problem ;-)

    I gave up on that interface.

    From your description, that sounds like a wise choice.

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Again, this is an issue with the frameworks you are using. Personally, I tend to do the

    --

    LedgerSMB: Open source Accounting/ERP
  33. ODBMS by Lodragandraoidh · · Score: 2, Informative

    I think the object database management system (ODBMS) will overtake RDBMSs in the future for several reasons (from the link and my own musings):

    1. Object-oriented databases are designed to work well with object-oriented programming languages such as Python, Java, C#, Visual Basic .NET, C++ and Smalltalk. This makes implimentation quick and easy - yet stable and scalable at the same time.

    2. ODBMSs use exactly the same model as object-oriented programming languages.

    3. It is also worth noting that object databases hold the record for the World's largest database (over 1000 Terabytes at Stanford Linear Accelerator Center).

    4. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.

    5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.

    I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets. ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.

    Right now my favorite ODBMS is the ZODB (Zope Object Data Base) - an ODBMS system tightly integrated with both Python (implimented using Python's native 'pickle' object persistence functionality), and the Zope web application development system - which itself is built with and uses Python. You can learn more about Zope at Zope.org.

    --

    Lodragan Draoidh
    The more you explain it, the more I don't understand it. - Mark Twain
  34. We're talking about this topic at geekSessions by Jim+Ethanol · · Score: 2, Informative

    The next geekSessions is on this topic. If you're interested in hearing about alternatives to RDBMS in and using them in practice, take a look at the site. The event will be held on October 2nd in downtown SF and will also be available via webcast. In addition to the presentations, we'll have a Q&A session along with some food and FREE BEER. Speakers at the event are:

    Josh Berkus from the PostgreSQL core team

    Paul Querna from Apache and Bloglines (wrote his own filesystem for Bloglines)

    Chad Walters from Powerset who is implementing BigTable there.

    Hope to see you there!

  35. Index on every column, how revolutionary! by porneL · · Score: 2, Insightful

    A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about

    That's why row-oriented databases have indexes and perform index scans.

  36. Arther makes great point, complex subject by yl_mra · · Score: 3, Informative

    To be a relational database the database must meet a very specific set of requirements. While a standard view of the databases from the DB administrators and normal users view may allow limited ways to manipulate the data, looking at the actual storage structures and how to efficiently use the resultant files can provide some extremely efficient computational methods. For example, if one structures data in a relational database with few fields, perhaps as low as two, per representation where each representation represents a single file then the data read directly from the file, the reads may be via flat file techniques, into an array or efficient storage table, then you achieve the best of both worlds. If the above example requires compressed data, then the relational data must be read from a single file via DB operations with two different mappings overlaying the single database file. The programmer must understand the data orientation so that compression/decompression occurs correctly. Extremely fast reads/writes may be achieved this way. The advantage of a column oriented database is that the files are inherently optimized for data mining without the need to hire an expensive programmer. If the company had multiple requirements for the same data then multiple databases may be required, unless they are willing to hire the expensive programmer. A key problem results when multiple databases loose synchronization. BTW, row based databases optimized for storage size (footprint) compress data by column, not by row or record. Using these techniques, I have achieved far greater data compression than comparable Google stored data.

  37. For most users this does not matter by ChrisA90278 · · Score: 2, Insightful

    Michael Stonebraker is certainly a well respected nae and he was been right on these issues in the past. Coinsidently I'd testing my software with a new version of PostgreSQL as I type. I think colum vs. row storage can be considered simply a option. I can even see it being an option that you specify at the table level. Most DBMS users really don't have much data. Today a 1,000,000 row table can be cached in RAM on even a low-end PC based server. Once cached in RAM row vs. column storage does not matter. I would imagine that 99& of the database table in te world have far fewer then a million rows. This discussion applies only to the very few that are really large.

  38. SSDs and column stores... by Kjella · · Score: 2, Insightful

    Will the imminent transition to SSDs make any difference? Because row-based DBs means you're typically reading large chunks (one raw) sequentially, while columns stores means you're reading many small (number of columns) chunks for every row. I'd think that if random access time was almost none, you'd get almost the same write performance while read performance could be greatly improved because you only read the columns you need. It'd certainly make DB design easier too if you didn't have to worry about putting very light information in the same table as heavy blobs.

    --
    Live today, because you never know what tomorrow brings
  39. Re:IMS--Hierarchical DB harder to use? by jadavis · · Score: 2, Insightful

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    You are confusing an RDBMS with a persistent storage mechanism. It's really not hard to just keep data persistent any more. You don't need a 3GL or anything fancy, just some hooks to record your modifications on permanent storage, and keep a small working set cached in memory. It's an easy, trivial, solved problem. And it was solved before relational databases were invented.

    RDBMSs do a lot more. Here are just a few advantages:

      * different applications can access the same data
      * guarantee integrity via declarative constraints that can validate against all of the data at once, not just the single record in question
      * different applications can have the same guarantees of integrity, and a bug in the first application can't break the guarantee for the second application

    RDBMSs were invented for a reason. Many, many software bugs can be traced back to a bad data state -- some invariant that was broken and uncaught. Often, these bugs are not caught until long after the insert has taken place, and often cause a cascade of new bad data and you don't find out until many records are wrong. A lot of code is imperative, and re-stating the invariant declaratively (i.e. a database constraint) helps catch a lot of those bugs.

    Trying to put these declarative constraints in the application is a bad idea. When should they be checked? And in which applications should they be checked (all of them, one would hope)? If you see a declarative constraint, are you sure it's correct, or might it have been added after inconsistent data was entered and before the constraint was actually run?

    Databases solve this by making some promises. If you put a ".. CHECK (age > 0)" on an attribute, it will check all the records before applying it, and then all the records afterward will need to pass through that constraint. That's a lot better of a guarantee, and you know it's true for all applications. Someone else's bug or quick hack won't violate it, so your application can rely on that as the truth. Same with UNIQUE or FOREIGN KEY.

    If you think about your reasoning for a moment, it's very narrowly focused on storing and retrieving single records. Presumably, anything needing to look at the data as a whole would need to read it fully into the application and process it from application code.

    You don't take into account other readers of data who might require consistent reports or anything else that needs to look at more than one record. You also don't take into account the horrible mess you have when the application is wrong and stores bad data, or when you need to do data format changes. In the types of databases you describe, almost any change requires reorganizing the data physically. In an RDBMS, you can make many changes without physically changing the physical layout.
    --
    Social scientists are inspired by theories; scientists are humbled by facts.
  40. Re:Perl Objects have both column and row DB advant by tgv · · Score: 2, Insightful

    I don't think that has anything to do with the article. That is about storage on disk, not about manipulating pointers in memory to such an extent that a programming language that should never have been invented in the first place becomes even more ununderstandable.

  41. Re:IMS--Hierarchical DB harder to use? by NoOneInParticular · · Score: 2, Insightful

    Ouch. And this exactly why SQL should die as the primary interface to the RDMBS. How the hell is my compiler going to help me find simple typing errors when the interface to use the RDBMS is built upon ... strings! You are generating code from code, that's not integration, that's a hack! It's useful, as there is no sound other way to approach an rdbms, but it's not a pretty sight.