Slashdot Mirror


Are Relational Databases Obsolete?

jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."

417 comments

  1. They're not mutually exclusive. by KingSkippus · · Score: 5, Insightful

    Okay, at the risk of sounding stupid...

    Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

    Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

    1. Re:They're not mutually exclusive. by XenoPhage · · Score: 5, Interesting

      Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.

      Agreed. It definitely looks like a storage preference. Though column-based storage has definite benefits over row-based when it comes to store once, read many operations. Kinda like what you'd find in a data warehouse situation...

      Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

      Hrm.. You must be new here....

      --
      XenoPhage
      Technological Musings
    2. Re:They're not mutually exclusive. by theGreater · · Score: 1

      Exactly what I was thinking. There is nothing (to my knowledge) in the Relational Model which specifies row vs. column store....

      -theGreater.

    3. Re:They're not mutually exclusive. by stoolpigeon · · Score: 4, Interesting

      You are exactly right and this is backed up by the home page for c-store. It says: "C-Store is a read-optimized relational DBMS " - c-store is the open source project that apparently is the basis for Vertica - Stonebraker's commercial offering.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    4. Re:They're not mutually exclusive. by Denis+Troller · · Score: 1
      Completely true.

      The article itself even points to http://en.wikipedia.org/wiki/Column-oriented_DBMS that states that Row storage and Column Storage are two technologies for RDBMS.

      --
      That's not a nick, that's my NAME.
    5. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 5, Funny

      Well I just turned my server on its side and now all my tables are storing in columns. I love new technology.

    6. Re:They're not mutually exclusive. by reddburn · · Score: 0

      Right or wrong, his opinion can't exactly be considered unbiased... Congratulations! You figured out what an opinion is.
      --
      "Those who believe in telekinetics, raise my hand" - Kurt Vonnegut, Jr.
    7. Re:They're not mutually exclusive. by OECD · · Score: 5, Insightful
      Congratulations! You figured out what an opinion is.

      An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.

      --
      One man's -1 Flamebait is another man's +5 Funny.
    8. Re:They're not mutually exclusive. by homb · · Score: 4, Interesting

      I wish we could put this thing to rest once and for all. And I wish so-called "experts" in the field actually were.

      Rule of thumb:
      - you use row dbs for OLTP. They're great for writing.
      - you use column dbs for data mining. They're amazing for reading aggregates (average, max, complex queries...)

      The major problem with column dbs is the writing part. If you have to write one row at a time, you're screwed because it needs to take each column, read, insert into it and store. If you can write in batch, the whole process isn't much more expensive. So writing a single row could take 500ms, but writing 1000 rows will take 600ms.
      Once the data's in, column dbs are the way to go.

    9. Re:They're not mutually exclusive. by bobcat7677 · · Score: 1

      Yeah, I looked into Vertica. Was pretty excited at first as my company does alot of data warehouse and data mart stuff. The potential performance gains were significant enough to start looking at converting alot of our reporting, ect to it. So I gave them a call and started asking some questions related to the key usage one would expect to make of a "warehouse" or datamark type of database.

      1st Q: "Can you run MDX queries against the Vertica DB?" A: "No, we don't support MDX queries."

      The conversation went downhill from there. At the time you could only get the DB as a limited beta and looking at their website it doesn't appear that has changed. Basically the DB is no where near ready for prime time. Even when it is ready, it's not going to make traditional RDBMS obsolete, just create a niche for itself in the warehouse/datamart corner. And I wouldn't be surprised if Microsoft came out with a new version of SQL server in short order (as well as others) that allowed you to create column oriented DB objects which would make maintaining two different database server technologies for the different purposes somewhat silly.

    10. Re:They're not mutually exclusive. by Lars+T. · · Score: 1

      Okay, at the risk of sounding stupid...

      Since when is a column store database and a relational database mutually exclusive concepts?

      It doesn't. The original blog is about Row-oriented DBMS vs. Column-oriented DBMS, and the author of the article (or his know-it-all-better editor) confused himself enough to believe somebody abbreviated that as RDBMS which of course means Relational DBMS. The submitter probably not reading the Wikipedia article he linked to didn't help either.
      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    11. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 0

      his opinion can't exactly be considered unbiased...

      Neither can mine, since I've been using row-stores since dBase II. I've used that, Clipper, FoxPro, Access (yuck), and Nomad (on the mainframe) and probably some I forgot about and there is a horrible problem with every one of them, which is that the data aren't readable without the underlying program.

      Damn it, people, there is no excuse for that, and no reason save product lock-in. If only these row-based databases were like the examples in the linked Wikipedia article - "1,Smith,Joe,40000;2,Jones,Mary,50000;3,Johnson,Ca thy,44000;", although I'd prefer fixed length fields; hard drives are dirt cheap these days.

      My favorite DBMS was Nomad. I absolutely fucking HATE Access. But the only way I'd go to a column based DBMS would be if my employer forced me to.

      BTW, I notice that the Wiki article says Sybase is column based, so column based DBMSes aren't exactly new (weren't they around last decade?) and there is little reason to think they'll be the "next big thing", save perhaps Sybase's advertising budget. Yes, if you aren't indexing your data (what DBMS doesn't?) and you're not writing much data, I'll take someone's word that column based is faster, even though the how and why wasn't explained, but that is a very narrow range of uses.

      I'm incredibly skeptical. Now if you'll excuse me I have to get back to this (shudder) Excel spreadsheet...

      -mcgrew

    12. Re:They're not mutually exclusive. by fm6 · · Score: 1

      Okay, at the risk of sounding stupid...

      Since when is a column store database and a relational database mutually exclusive concepts?
      Well, if "stupid" means grasping that storage is an implementation feature and that what makes a database relational is how you access the data, then yeah, you're a total idiot.
    13. Re:They're not mutually exclusive. by B'Trey · · Score: 1

      Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...

      It's actually worse than that. From the linked article:

      NDOVER, Mass., Sept. 5, 2007 - Vertica Systems today launched a unique, multi-author blog that offers database architects and administrators, CIOs and other IT professionals expert insight and opinion about database evolution and emerging technologies. The Database Column (www.databasecolumn.com) features weekly contributions from six well-known database visionaries discussing enhancements that improve database performance and scalability, decrease database administration overhead and system costs, and help organizations make more informed decisions.

      The entire blog on which it appeared is a corporate shill for Vertica, the makers of a column store database. Which isn't to say that what's said there is incorrect. It may not be. But there should be absolutely no expectation of lack of bias.

      --

      "The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.

    14. Re:They're not mutually exclusive. by Bonewalker · · Score: 2, Interesting

      So, what we need is a morphing database structure depending on need. While I am writing, I want it to be row-based. As soon as I am done and ready for reading, it should switch to a column-based structure. Who's up for starting a new project on sourceforge.net? The shape-shifting relational database! SSRD is the wave of the future!

    15. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 0

      Hrm.. You must be new here....

      Know, ewe muss bee knew hear, Looser!

    16. Re:They're not mutually exclusive. by jd · · Score: 0, Offtopic

      How on earth can anyone call C-Store open source? It hasn't featured on Slashdot, there are no Freshmeat records, it's not part of Fedora Core, the author hasn't declared himself a god - I mean, these are vital features present in all true F/L/OSS projects!

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    17. Re:They're not mutually exclusive. by georgewilliamherbert · · Score: 1

      BTW, I notice that the Wiki article says Sybase is column based, so column based DBMSes aren't exactly new (weren't they around last decade?)

      Sybase IQ / IQ-M, not basic Sybase.

      Sybase has two completely different code base DB systems. "Sybase", the row-store RDBMS, and "Sybase IQ", the column-store data warehousing RDBMS.
    18. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 0

      Some people insist on doing things the hard way! Yeesh. I just put my monitor in landscape mode.

    19. Re:They're not mutually exclusive. by Goldarn · · Score: 1

      Right or wrong, his opinion can't exactly be considered unbiased... But, is he putting his money where his mouth is, or his mouth where is money is?
    20. Re:They're not mutually exclusive. by Cal+Paterson · · Score: 1

      Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
      This is no logic. His biases don't really matter; if what he says makes sense, and you agree with it, then he's right (IYO). This is like saying to Einstein; "Well, you would say E=MC^2!", "You've been researching it!".
    21. Re:They're not mutually exclusive. by Junior+J.+Junior+III · · Score: 1

      I thought that's what the perpendicular bits stored on my new SATA HDD did.

      --
      You see? You see? Your stupid minds! Stupid! Stupid!
    22. Re:They're not mutually exclusive. by sco08y · · Score: 1

      Okay, at the risk of sounding stupid...

      Not as stupid as the author of the column. Seriously, what he's saying is like saying that the advent of powerful graphics cards means that the von Neumann model is going away. Well... I'm interpreting "the one-sized fits all concept" to mean "the theoretical underpinnings of general purpose DBMSs."

      But since he's so vague about exactly what he's making obsolete, it's impossible to really contradict him. So give yourself credit for this: by risking sounding stupid, you said something worth saying.

      Since when is a column store database and a relational database mutually exclusive concepts?

      They're not comparable. Stonebreaker is making the standard mistake of confusing logical conception with physical implementation.

      The relational model dictates a mathematical structure, relations, and an algebra that works with and only with those structures. It deals with the logical aspect of a DBMS (where S stands for System!) and says nothing about how the data is to be stored on a computer. It doesn't require that the DBMS use SQL (which is not really relational to begin with) and it doesn't care what kind of data you put in the relations, whether it is numbers, text, pictures or video.

      A column store DBMS (btw, a DBMS is different from a database in the same way word.exe is different from foo.doc) is simply a particular implementation of the relational model. It shouldn't even be called a column store or row store DBMS because a DBMS shouldn't be classified by how it stores data. (And there's really no reason why one DBMS can't do both!) I understand that operating systems are often called, for example, 32-bit or 64-bit but a DBMS is specifically designed to hide implementation issues. It should be named after the logical model it implements because that's going to tell you what you can do with your data.

    23. Re:They're not mutually exclusive. by OldHawk777 · · Score: 1

      mit URLink vielen dank.

      I am very old school ... I look at an array as a table/2D-matrix of discreet values to be treated/used as defined by a FORTRAN program.
      The row for me (long ago) was always in the old dBaseI, I used to define a releated dataset, and the column was a variable identifier.

      The last time I created a database, most folks had just started calling 'rows' records and 'columns' fields. I forget what they called an nD-array/matrix. But to this day it drives me a little crazy when someone sends me a ledger/spreadsheet and call it a database.

      I will look/read at mit's c-store for more knowledge ...?

      --
      Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
    24. Re:They're not mutually exclusive. by Anonymous Coward · · Score: 0

      It is also interesting that the leaders in MPP type systems such as Teradata and DB2 still base their technology on row based systems. I agree that the poster of this article doesn't understand that whether an entity in a database is row or column based, its still relational. The article also noted that Sybase IQ is based on column based relations and it IS a relational database. If relational databases are obsolete it is not for the reasons posed in this article.

    25. Re:They're not mutually exclusive. by tm2b · · Score: 1

      I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased.
      People love to chain things together like this, and it's deceptive and unfair. Why assume that he didn't think about the topic before deciding what to work on? Instead, try this:

      Having concluded that column store databases are going to be the Next Big Thing, it's no surprise that Michael Stonebraker co-founded a company to develop column store database software. He seems to be putting his money where his mouth is.
      --
      "It is our blasphemy which has made us great, and will sustain us, and which the gods secretly admire in us." - Zelazny
    26. Re:They're not mutually exclusive. by Pollardito · · Score: 1

      that must be what those accounting types mean when they talk about pivot tables

    27. Re:They're not mutually exclusive. by stonecypher · · Score: 1

      Since when is a column store database and a relational database mutually exclusive concepts?
      Uh, since always. It's kind of like how noobs always get confused by the difference between a map and a hash: why, they both store key->value pairs, how are they different? Just because they serve the same end goal doesn't mean the underlying implementation is remotely similar. For example, column tables - which, amusingly, are actually older than RDBMSes, showing exactly the depth of the expertise of the original author, whose sole reason for thinking RDBMSes are old hat is because of when they were invented - store data by column. "This file contains all the address 2 fields." "This file contains all the phone numbers." That's how most databases worked in the Really Old Days. Then in the 1970s, someone realized that most data was fetched a row at a time, and started orienting the databases towards identity fields that were tagged by field correlation - relationships.

      The difference is this: you have a 20x2000 table. Are you going to store horizontal stripes, vertical stripes or something fancy? Column is vertical, almost all RDBMSes are horizontal, and the real database evolution is taking place in products the article author has never even heard of.

      Welcome to 1960s tech.
      --
      StoneCypher is Full of BS
  2. C'mon, the guy is biased! by winkydink · · Score: 4, Funny

    The name of his blog is The Database Column after all.

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

    1. Re:C'mon, the guy is biased! by everphilski · · Score: 4, Funny

      database row would sound too much like prison :P

    2. Re:C'mon, the guy is biased! by Tablizer · · Score: 1

      database row would sound too much like prison

      That's where they send us if we do a Cartesian Join.

    3. Re:C'mon, the guy is biased! by Refenestrator · · Score: 1

      Not as much as database cell.

    4. Re:C'mon, the guy is biased! by jsiren · · Score: 1

      So he's making a row in his column. What is he, an oracle preaching at his ALTER TABLE...?

      --
      Usage: km/h for speed (kilometers per hour); kph for very slow impulses (kilopond hours).
    5. Re:C'mon, the guy is biased! by ScrewMaster · · Score: 1

      database row would sound too much like prison :P

      No, a row is what were having in this thread right now.

      --
      The higher the technology, the sharper that two-edged sword.
  3. Mod Article -1 (Author doesn't get it) by DrinkDr.Pepper · · Score: 5, Informative

    Relational databases aren't being obsoleted. Some schema design heuristics are.

    --
    0xfeedface
    1. Re:Mod Article -1 (Author doesn't get it) by dami99 · · Score: 0

      Yep.

      Stupid article, skip to the blog only to see what they are really talking about.

      It's not about relational databases being obsolete at all.

    2. Re:Mod Article -1 (Author doesn't get it) by ben+there... · · Score: 1

      And the important part left out with all the sensationalism: the API wouldn't change. It would still be SQL. It would still be an RDBMS. It would essentially just have a different storage engine. Products like MySQL seem to get along just fine supporting multiple options for storage engine. I don't see why an additional option to improve performance in certain cases would obsolete anything.

    3. Re:Mod Article -1 (Author doesn't get it) by jadavis · · Score: 1

      Products like MySQL seem to get along just fine supporting multiple options for storage engine.

      Except that in MySQL, changing the storage engine means that it will not "still be SQL" it will be a different variant of SQL, with different features available and different semantics. The storage engines aren't independent of the logical level. That is one of the biggest failings of MySQL.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    4. Re:Mod Article -1 (Author doesn't get it) by krow · · Score: 1

      Hi!

      MySQL is compliant with the SQL core 92 specification, and has some of 2003. It is as compliant as any other database on the Market, and just like all of the other databases it has extended the SQL grammar.

      --
      You can't grep a dead tree.
    5. Re:Mod Article -1 (Author doesn't get it) by jadavis · · Score: 1

      My point had nothing to do with adherence to the SQL standard (although in my opinion, MySQL's default behavior is further away than other databases). My point was that changing the storage engine (the physical storage mechanism) changes the semantic behavior of commands. MyISAM behaves differently than InnoDB, and you might need to change your application code if you want to change storage engines.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  4. dual-mode db? by 192939495969798999 · · Score: 4, Interesting

    Is there a dual-mode db, that lets you create a row-based or column-based "table"? I imagine cross-mode queries would kill performance, but at least you could have a system front-loaded with row tables, where data comes in, and then archive this data over time into the column-based tables, so that reads were fast.

    --
    stuff |
    1. Re:dual-mode db? by XenoPhage · · Score: 2, Informative

      I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.

      --
      XenoPhage
      Technological Musings
    2. Re:dual-mode db? by jellomizer · · Score: 1

      It can be done with MS SQL. and other you build a cursor that creates a SQL Call. and you execute the SQL Call mixing case statemts and agrate functions and you are all set. Is it Easy no not really is it supported no but it can be done, and I have done it.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    3. Re:dual-mode db? by Anonymous Coward · · Score: 0

      Do you mean like using replication... row-based as the master and column-based on the slaves?

    4. Re:dual-mode db? by stoolpigeon · · Score: 2, Funny

      You can get MS SQL Server to store tables differently than the default? It will write columns to disk as opposed to rows? You can store columns in their own files? It's been a few years since I worked with SQL Server - but I really don't remember those features. Is it a SQL Server 2005 thing?

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    5. Re:dual-mode db? by stoolpigeon · · Score: 1

      Nice - in other words - no, but you could write one on (open source rdbms of your choice)

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    6. Re:dual-mode db? by jellomizer · · Score: 1

      Yes in Windows 2000 if you don't want to look behind the curtains.
      No but you can make views and stored procedures that can do the trick, that makes it look like it and aids in programming and can save time.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    7. Re:dual-mode db? by tinkertim · · Score: 2, Informative

      I believe you can build a storage engine in MySQL that deals with column-based storage. I'm not sure if it's been done yet, but I don't see why it couldn't be.

      The FA threw me for a loop a couple of times, I honestly _did_ try to read it :) Correct me if I'm incorrect, but wouldn't having a service for column stores be (usually) not needed on most Unix-like platforms? Since this is mostly reading, I would think such efforts might be better spent on sqlite (or similar)?

      If your in a situation where you're mostly reading with (likely) only one infrequent writer, wouldn't eliminating the overhead of a database service entirely be desirable?

      I can't think of a situation where you would want many frequent writers to a column store schema, again, correct me if I'm off.
    8. Re:dual-mode db? by Vancorps · · Score: 1

      Actually yes it is a SQL 2005 thing although there was a way to do it SQL 2000 called Data cubes from my understanding. You end up having multiple data files just like you would in an Oracle situation. It's easier to explain in Oracle terms as you'd just create a tablespace for column based tables and a tablespace for row-based tables. Then away you go, both storing files as you see fit.

      I guess in SQL 2005 terms you'd be creating another database on the same server and just use server linking to get your data from one place to another. Either through an ISIS package or a SQL Job.

    9. Re:dual-mode db? by mr_mischief · · Score: 2, Interesting

      That would actually be a smart way to handle it. Lots of apps write only to a single master (or its fallback) and read only from the slaves already. If you had a row-based master for fast writes and replicated to column-based slaves that could be a real win.

      RLE on the data columns is a pretty big win for column-based stores, too. If the slaves manage RLE during a replication, you could have one hell of a DB farm.

    10. Re:dual-mode db? by einhverfr · · Score: 2, Informative

      Ok, think about it this way:

      If you are doing killer aggregates (tell me the sum of the sales in every month for the last 25 years), you are going to be limited by possibly 2 things: CPU cycles and disk I/O throughput.

      There are several ways of addressing these issues. Basically this means either optimizing or parallelizing. Column-oriented stores are likely to help optimize the disk i/o throughput so you can just thow more processor effort at the problem.

      You can also do what Teradata and BizgressMPP do which is basically break the database into lots of little db's running on different servers and distribute the disk and processor time that way. Iirc Oracle and DB2 also offer this sort of option. Depending on how this is set up, it may be possible to use in an OLTP environment as well.

      On the other hand, suppose you are doing a lot of updates in a high transaction load environment. Which would you rather do? Update each column value (and skip around the disk doing this) or update the entire row on one disk block?

      Column-oriented databases are helpful for some things. They are not the only solution to the problem out there. And they are still relational. Hint: They are still based on Edgar Codd's relational mathematics.

      --

      LedgerSMB: Open source Accounting/ERP
    11. Re:dual-mode db? by runderwo · · Score: 1

      How about this, use different virtual memory maps to "view" the data as rows or as columns, so you only need to keep one copy of the data and just update pointers (virt-to-phys mappings) when data is modified.

    12. Re:dual-mode db? by Anonymous Coward · · Score: 0

      That would be a good idea, except we're talking about column and row storage because they have different reading and writing performance on disk...

    13. Re:dual-mode db? by jadavis · · Score: 1

      If your in a situation where you're mostly reading with (likely) only one infrequent writer, wouldn't eliminating the overhead of a database service entirely be desirable?

      RDBMSs provide many services (good ones do, anyway):

        * resource management for all the queries that happen to be running at that particular time
        * sophisticated query optimization, based on your actual data, not just "rules of thumb," but statistics on your actual data
        * a relational query language, which is good for asking questions which you might never have considered at the time you stored the data
        * useful operators on useful types
        * much more...

      SQLite doesn't even provide real data types. If you need an index on a non-text type, you will have a problem. If you need to sort non-text data, you have a problem. If you want to do any real manipulation before returning to the client, you will have a problem.

      SQLite essentially assumes that you will pull all the data into the application and then process it from there. That's a non-starter performance-wise, for many applications.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    14. Re:dual-mode db? by tinkertim · · Score: 1

      SQLite essentially assumes that you will pull all the data into the application and then process it from there. That's a non-starter performance-wise, for many applications.


      Thanks for sharing :) My typical use is extremely simple in all practical senses so I'm not aware of all of the quirks.

      I'm about to do some really, really big archiving projects, I've got to get quite a bit higher on the learning curve rather quickly.

      As far as free RDBMSs, all I can find from searching and research is many people arguing on how it should be done. I think I might just learn as I always do, the old fashioned way :

      Break it, Break it, Break it, get it right.
    15. Re:dual-mode db? by jadavis · · Score: 1

      I'm about to do some really, really big archiving projects, ... Break it, Break it, Break it, get it right.

      I don't envy that task ;)

      With a topic like databases, there are a million people pulling in different directions and they have different perspectives -- and different business incentives. There are two things that always work to get to the bottom of any discussion super-saturated with opinion:

      (1) Try to understand a lot of perspectives.
      (2) Look for people who have a very good grasp of logic, a lot of knowledge, and a good reputation, and learn some fundamentals from them. When you do this, you can cut through 90% of the crap in no time.

      I personally recommend reading anything by C.J. Date. He's a leader in relational theory, he's very logical, and has a writing style that works for me. It took me a while to appreciate a lot of his ideas, however. It might not help on your current project, except perhaps giving some hints on best practices (assuming that you buy into his arguments). Note: I own 4 of his books, so I'm probably violating rule #1, so whatever you do, don't consider me a member of group #2 :)

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  5. well by stoolpigeon · · Score: 5, Informative

    every article linked makes it clear that this is about warehousing as opposed to oltp. so is the technology dead? no - can it do everything? no

    --
    It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    1. Re:well by lucabrasi999 · · Score: 1

      Wait a minute!!! Are you suggesting that the submitter actually reads the article before submitting it? Blasphemer!

    2. Re:well by jimstapleton · · Score: 1

      You mean... My row based MySQL tables can't wash my dishes for me?

      --
      34486853790
      Connection too slow for X forwarding? Try "ssh -CX user@host"
    3. Re:well by stoolpigeon · · Score: 1

      From what little exposure I've had to MySQL - it can't do much of anything for you or anyone else. I think if you move to PostgreSQL you will find that it will do your dishes for you and make you a better person.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    4. Re:well by jimstapleton · · Score: 1

      down religeous db fanatic! down!

      both have their pros and cons, now stop trying to take a stupid joke an make it into your personal database soapbox

      --
      34486853790
      Connection too slow for X forwarding? Try "ssh -CX user@host"
    5. Re:well by TechyImmigrant · · Score: 1

      >From what little exposure I've had to MySQL - it can't do much of anything for you or anyone else. I think if you move to PostgreSQL you will find that it will do your dishes for you and make you a better person.

      Well it stores my data and meets my performance requirements. Is there something else I need it to do, given that I already own a dish washing machine.

      --
      Evil people are out to get you.
    6. Re:well by Scaba · · Score: 1

      Is there something else I need it to do, given that I already own a dish washing machine.

      Yes. You need to take your dish washing machine out to dinner occasionally, and buy her flowers. And tell her she looks beautiful.

    7. Re:well by Anonymous Coward · · Score: 0

      Obviously you need it to take the more difficult steps of putting the dishes into the washer, and putting them away when clean.

    8. Re:well by stoolpigeon · · Score: 1

      i was just making a stupid joke of my own -- relax

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    9. Re:well by jedidiah · · Score: 1

      "storing" is about all it does. Now if you want it to enforce interesting structural rules, that's an extra bolt on.

      MySQL: Because data corruption is no big deal.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    10. Re:well by smellotron · · Score: 1

      MySQL: Because data corruption is no big deal.

      Patently false!!! It just silently truncates data. That's not a bug, it's a feature!

  6. Rotate by Kozar_The_Malignant · · Score: 5, Funny

    >"column stores will take over the [data] warehouse market over time, completely displacing row stores."

    Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy.

    --
    Some mornings it's hardly worth chewing through the restraints to get out of bed.
    1. Re:Rotate by MightyMartian · · Score: 0

      Just wait until next week, when they'll be stored diagnoally, or the week after, when the individual bytes are chopped, reordered, and then sent to /dev/ram.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    2. Re:Rotate by jellomizer · · Score: 1

      Excel only handles 255 Columns. It is anoying. It is stupid and antiquated. But no one is sayis is Excel Obsolete... I guess it is because we already know the answer.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    3. Re:Rotate by Hijacked+Public · · Score: 3, Informative

      The most recent release (2007) will handle 2^14 columns.

      --
      "Sacrifice for the good of The State" - The State
    4. Re:Rotate by ben+there... · · Score: 5, Insightful

      Excel only handles 255 Columns. It should be noted that if you've designed a database (rather than an Excel abomination) with more than 255 columns, chances are, you're doing it wrong.
    5. Re:Rotate by kpainter · · Score: 2, Funny

      Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy. I tried that but my neck gets tired after a few minutes.
    6. Re:Rotate by Anonymous Coward · · Score: 2, Funny

      ". . . more than 255 columns, chances are, you're doing it wrong"

      Because there are only 254 days in the year

    7. Re:Rotate by Anonymous Coward · · Score: 1, Insightful

      You're doing it wrong.

    8. Re:Rotate by himself · · Score: 1

      Kozar_The_Malignant wrote:
      >
      > Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness?
      >

            And if you do it again, do you get one of those "data cubes" I hear so much about? Even cooler!

    9. Re:Rotate by Tablizer · · Score: 1

      It should be noted that if you've designed a database (rather than an Excel abomination) with more than 255 columns, chances are, you're doing it wrong.

      I used to think the same thing, but I've actually encountered a situation where it seems to make the most sense. It is montly statistics for each "office" (how many people got sick, etc), and there are about 250 items to collect info on (yes, its a lot of items). One could do it row-wise with the item category part of the primary key, but since one uses it in mostly a column-wise format, such as exports for spreadsheets, row-a-tizing it doesn't seem to make sense. A row-wise solution would certainly be livable, but the 250+ columns is the simplest approach.

    10. Re:Rotate by garyok · · Score: 1

      Welcome to SAP. And all the columns are 5-letter mnemonics for German words.

      --
      One of the penalties for refusing to participate in politics is that you end up being governed by your inferiors - Plato
    11. Re:Rotate by Pseudonym · · Score: 1

      You jest, but there are currently 243 fields in the MARC21 standard, and the list isn't getting shorter. And I haven't counted how many use attributes are in Z39.50 Bib-1, but it must be pretty close to that.

      The moral of this story is that the "something wrong" that you may be doing is using a RDBMS for something that it's not good at.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    12. Re:Rotate by sco08y · · Score: 1

      It should be noted that if you've designed a database (rather than an Excel abomination) with more than 255 columns, chances are, you're doing it wrong.

      Or that particular entity has a lot of attributes. I've seen schemas that had a ton of columns and gone through the normalization process... and often find that only 10% or so have any dependency, functional, join or otherwise. For example, if you have a personnel table the fact is that there tend to be a lot of simple facts about a person, like date of birth, height, etc. The Right Way to handle these is as attributes because each person has one and only one associated with him/her.

      The big problem is usually that the UI for the DBMS doesn't handle lots of columns well. They should allow "bundling" of columns, which could be entirely orthogonal to the relational model. That way "the hat size of Bob" can be bundled with "the inseam of Bob" rather than splitting them into a clothing sizes table and going through the rigmarole of enforcing a 1:1 relationship.

    13. Re:Rotate by espressojim · · Score: 1

      That's all great until you decide you need to record something new about Bob. Then you're stuck...adding a column to your table? One of the projects I've been working on is for phenotype storage (which is obserable characteristics of an individual.) For example: record a person's height, weight, eye color, age. What you do in essense is have types of data (height, weight), with values for a person (6, 150), and units (feet, pounds.) That way, you can store as much info as you need. These type-value combinations are extremely powerful, as you can store just about anything, and you can properly index them to get very good performance.

      If you have huge amounts of data that isn't going to change, then maybe you could have a warehouse with a lot of attributes all flattened out (which would certainly increase query speed), but you sacrifice a TON of flexibility.

    14. Re:Rotate by gringer · · Score: 1

      There's a bit of stuff around about how to do this well in things such as questionnaire databases. The general method is something like storing as {PersonID, AttributeID, Data}, with a primary key being the combination of PersonID, AttributeID. It works really well for data retrieval, as long as you're not scared off by using joins (e.g. to work out what AttributeID 6 means) in each query.

      I've run into this very problem (not having enough columns) with a database I have been working on. The questionnaire had around 300 individual questions, which exceeded [shudder] Microsoft Access' limitations. The solution by the person who worked on this database previously was to split the questionnaire up into sections, resulting in many different tables for the same questionnaire. This was very difficult to pull data out of (I had to know each column name before I could pull out the data, and in most cases it wasn't obvious) and maintain (Each new question added needed a custom designed field in the forms associated with it).

      I ended up spending around 2 weeks redesigning the database (with a few tables in the 3-column format mentioned) to be easier to query, and then mentioned to various people that a better solution was required (i.e. not Access). One year later, that's pretty much implemented, and hopefully we now have a good working solution that is both easy to query, easy to modify, and can support our data warehouse needs for a long time.

      --
      Ask me about repetitive DNA
    15. Re:Rotate by smellotron · · Score: 1

      For example, if you have a personnel table the fact is that there tend to be a lot of simple facts about a person, like date of birth, height, etc. The Right Way to handle these is as attributes because each person has one and only one associated with him/her.

      Ironically, personnel attribute management is one of the flagship examples of LDAP databases. LDAP is basically a hierarchical storage mechanism... a tree with predefined structure. Come to think of it, SNMP is also hierarchical, and it's also used to store and transmit large amounts of statistical data (mostly network monitoring attributes). Also, DNS... they even managed to combine load distribution based on the hierarchy. Basically, The Right Way is not to use a relational database at all for something like this, unless you're actually actively performing column aggregates (average height of all people, most common eye color).

      The trouble is that I've never seen a particularly good solution for a dataset that is partially relational and partially hierarchical. The best solution to be likely adopted (from a worse is better point of view) is probably some sort of hierarchy embedded in a relational DB, with special syntax or functions for accessing via a.b.c.d notation

    16. Re:Rotate by Anonymous Coward · · Score: 0

      You know, you could have saved yourself a lot of grief by using Microsoft's MSDE/SQL Server Express (which are free) which doesn't have the terrible limitations of access, plus all the good features like views, etc, which make pulling data from joined tables even more trivial than it is to begin with.

    17. Re:Rotate by ichigo+2.0 · · Score: 1

      Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy.
      Make that 180 degrees and it should be future-proof.
    18. Re:Rotate by Corporate+Troll · · Score: 1

      5-letter mnemonics for German words.

      You mean that's a standard? I thought the SAP people at a German company I was working with, were being moronic....

    19. Re:Rotate by Anonymous Coward · · Score: 0

      Why don't you just have a entity/attribute table rather than x-hundred columns in the entity table?

      Thats kinda, like, the point of relational.

  7. The guy... by AKAImBatman · · Score: 5, Interesting

    ...is duping himself and thus Slashdot is duping the stories by extension.

    Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior. While I think he has a point, I'm not sure if he really goes far enough. Our relational databases of today are heavily based on the ISAM files of yesteryear. Far too many products threw foreign keys on top of a collection of ISAMs and called it a day. Which is why we STILL have key integrity issues to this day.

    It would be nice if we could take a step back and re-engineer our databases with more modern technology in mind. e.g. Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars.

    The demands on our DBMSes are only going to grow. So there's something to be said for going back and reengineering things. If column-oriented databases are the answer, my opinion is that they're only PART of the answer. Take the redesign to its logical conclusion. Let's see databases that truly store any data, and enforce the integrity of their sets.

    1. Re:The guy... by Anonymous Coward · · Score: 0

      If you mean for too many products use a crappy mysql backend, I'll agree.

      Personally I use real relational databases, made by folks who never said stupid things like "you don't need transactions", and I don't have any key integrity issues.

    2. Re:The guy... by Otter · · Score: 1
      ...duping himself and thus Slashdot is duping the stories by extension.

      I read the blurb and thought "Haven't we had the same 'debate' over the same guy a bunch of times before?" The name stuck in my head as I always envision the former Notre Dame linebacker and his famously low GPA turning to a career in database architecture.

    3. Re:The guy... by oh_my_080980980 · · Score: 1


      No, there's something to be said for EDUCATION. The fact that you think a Relational Database can only store integers and varchars screams how ignorant you are. I would suggest reading, "An Introduction to Database Systems" by C.J. Date. Then you will be informed enough to make a comment about the state of affairs in Relational Databases.

    4. Re:The guy... by Anonymous Coward · · Score: 0

      It is an interesting idea however it has basically a 180 degree problem on what is wrong with it vs 'traditional' dbs.

      When writing it is easy to 'add new' for traditional. But for column wise you need at least the number of disk seeks in it as there are columns. Instead of just blasting the whole row to 1 sector. You need N number of writes vs 1.

      For reading with traditional it is the opposite. It is number of seeks per row read is = to the number of columns. You need to seek for the number of rows in the DB.

      For data that changes a lot (either on the data element or added) column wise would be crap performance wise. For data that is fairly 'static' and unchanging such as legacy data it would smoke.

      Or in other words column wise read would be fast but write slow. I could see this being an option within DBs in the future. As it is just a data layout problem not a language problem.

      He may have a point now that I sit and think about it.

    5. Re:The guy... by AKAImBatman · · Score: 1

      The fact that you think a Relational Database can only store integers and varchars screams how ignorant you are.

      Glad you're paying attention. Not. :-/

      The relational model can store just about anything you want. It's just math based on sets. But that's not what I'm talking about. What I'm talking about is that today's *batch* of DBMSes are terrible at storing data that is not an integer or varchar.

      In addition, today's DBMSes have no protection against bad data that isn't explicitly engineered into the data model. This provides an opportunity for human error to creep in. Thus you see issues with keys that aren't referenced/restricted, incorrect keys getting inserted because the ID exists in the referenced table, nulls that shouldn't be allowed, transactions that aren't closed in transactions, multiple rows for the same secondary key when it should be unique, data accesses that shouldn't be allowed, etc. There are solutions for nearly all of these in your average DBMS product, but they all require that the Database Administrator play dictator to enforce.

      A better solution would be to take the same tack that computer languages and compilers have been taking: Make a mistake impossible as early in the process as possible. Just as modern compilers can catch a variety of errors before the code is even executed, it would be better if our databases offered more up-front protection and solutions for these problems. As a bonus, the database would "know" more about the data and thus be able to plot better data storage and retrieval schemes automatically.
    6. Re:The guy... by nuzak · · Score: 1

      Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior.

      Oh, more than just Stonebraker: column-oriented databases have been getting pushed going on at least a decade, probably two. It comes up every few years in breathless statements like "row storage is obsolete legacy technology". Nevermind that most OLTP demands are a bit more of a hybrid thing, and that vertical partitioning usually does the trick pretty well.

      As far as ISAM goes, I think the only thing still using it is MySQL, and I'm pretty sure it's variable-length, so it's closer to VSAM. I guess you could count Paradox and all the other xBase stuff too as still using it. But legacy is legacy, it's hardly an indictment of today's technology.

      Concerning "abstract ID numbers", that's probably surrogate keys you're talking about -- which are technically a design flaw (or perhaps a "design smell" to crib an XP term), but they're kind of unavoidable for efficiency (I sure as hell don't want to sling around whole arrays of street/city/state/post in my CRM app). Any proper ORM or even good SQL should keep you from ever having to reference id columns except perhaps in a join (in case you're too cowardly to use NATURAL JOIN, which is actually pretty prudent given the surprises it can hit you with). PostgreSQL tried to get rid of these "rowid keys" with the OID column, but this didn't turn out well, and OID's are heavily deprecated in postgres now. OODBs and "document DBs" like CouchDB usually do a better job at hiding ID columns, but until those address their orthogonality problems, RDBMSs are here to stay, warts and all.

      --
      Done with slashdot, done with nerds, getting a life.
    7. Re:The guy... by AKAImBatman · · Score: 2, Insightful

      Or in other words column wise read would be fast but write slow. I could see this being an option within DBs in the future. As it is just a data layout problem not a language problem.

      An interesting idea for improving database technology is to actually change the way that database data is mirrored in a disk array. Rather than writing EXACT duplicates of the data, perhaps one set could be written in row-oriented form, while the other set would be written in column-oriented form. This guarantees that the data is always duplicated, but offers a new option to the query engine for retrieving the data.

      The primary issue I see is not creating an issue of "the worst of both worlds". Obviously row-oriented data is going to be faster to write. Thus you can't wait for the column oriented data to finish writing. A secondary process will have to manage that in parallel, with the assumption that the database will report the data as "committed" as soon as the row-oriented write is complete.
    8. Re:The guy... by Doctor+Faustus · · Score: 1

      Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers.
      I've thought about that, and I think it would be great within a stored procedure language, but it wouldn't work so well when you're pulling in data from an external program.

    9. Re:The guy... by morgan_greywolf · · Score: 1

      Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars. First off, databases can store something more complex than integers and varchars. Secondly, what you're describing is the heart and soul of a Product Data Managment (PDM) or Document Management system, such as those available from companies like UGS. The database contains pointers to the raw data, with the database storing all of the metadata. Special daemons retrieve the raw data from various vault servers.
    10. Re:The guy... by AKAImBatman · · Score: 1

      Sorry, perhaps I should have been more clear. Today's relational DBMSes can't store non-integer or varchar data with a *bleeping* *bleep* of a *bleep*. The data gets so incredibly bloated and is so incredibly SLOW to access that you're better off creating a custom data interface for storing your CLOBS/BLOBS in flat-file form in the File System.

      Or perhaps look into a PDMs like you mentioned. Either way, today's relational DBMSes are not well equipped to handle the data. :)

    11. Re:The guy... by jahudabudy · · Score: 1

      Thus you see issues with keys that aren't referenced/restricted, incorrect keys getting inserted because the ID exists in the referenced table, nulls that shouldn't be allowed, transactions that aren't closed in transactions, multiple rows for the same secondary key when it should be unique, data accesses that shouldn't be allowed, etc. ... A better solution would be to take the same tack that computer languages and compilers have been taking: Make a mistake impossible as early in the process as possible.

      The problem with that is most things that are a problem are only a problem for a particular design. So I as the architect need to explicitly "tell" the DBMS that I want unique secondary keys. If I do so correctly, most DBMSes will correctly enforce that restriction. But if I don't explicitly make that a restriction, then presumably it is because my design either allows for or even requires non-unique values on that secondary column . The same with NULLs, incorrect data accesses, etc. How should the DBMS know that something is a mistake unless I tell it that in this particular design, this is a mistake?

      --
      ...sometimes, in order to hurt someone very badly, you have to tell that person terrible lies. - PA
    12. Re:The guy... by tieTYT · · Score: 1

      Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. (By developers, I'm assuming you're talking about developers that USE databases) This is not the only way to do things: If you read the stuff by Joe Celko, you'll realize that using arbitrary numbers for primary keys is the subject of a religious debate in the database community. Joe Celko takes the side that using them is a bad thing. One of the many reasons he gives is that it simply makes everything in your database harder to understand.

      Joe Celko's book: http://www.amazon.com/Joe-Celkos-SQL-Smarties-Prog ramming/dp/0123693799/ref=pd_bbs_sr_1/002-2120092- 0648857?ie=UTF8&s=books&qid=1189101055&sr=8-1
    13. Re:The guy... by mattgreen · · Score: 1

      Quite interesting. What does he suggest using in its place? Some sort of semi-readable GUID?

    14. Re:The guy... by einhverfr · · Score: 1

      I don't know. I think the reason we have issues with keys to this day is that data engineers don't tend to understand the math involved. Abstract id's while OK should probably not be used as primary keys (use a natural primary key where possible).

      As for object abstraction... object representation may be functionally dependent on referential representation, the same cannot be said of the converse. :-) The best approach I have found is to put as much of the mapping logic into the db as possible using stored procedures and then you dont have to worry about it in your application.

      --

      LedgerSMB: Open source Accounting/ERP
    15. Re:The guy... by epine · · Score: 1


      In most systems I've encountered where it is practical to codify the formal requirements up front either does so, or the design of the product has taken it so deeply into account it becomes easy to forget the product implements just one approach to a larger and messier problem.

      The systems where it is possible to write down the formal requirements, but where the architecture doesn't reflect this, tend to the ones where the formal requirements are worthless in practice. There is more to a product than satisfying a formal integrity constraints: it must also conduct a practical computation to get there. When the number of approaches to practical computation explodes, you typically end up with a system where the user is given a lot of control over how the computation unfolds.

      Databases are hideously complex in this aspect. Performance characteristics have many dimensions: memory use, disk throughput, disk seek rate, transaction throughput (amortized, worst case, under normal load, under extreme load, for common queries, uncommon queries, on past hardware, on present hardware, on possible future hardware, WRT serialization bottlenecks, etc.) How soon do you need the integrity checks to take place? That can impact a multitude of metrics.

      If you had HAL, and you told HAL all this information about your desired performance profile, as well as your formal integrity requirements, then I'm sure HAL could automatically make all these choices for you. But if you already had HAL, no one would be working at that level anymore. Yes, it's an intellectual salt mine, but until computation begins to grow faster than the size of data warehouses / complexity of the queries desired, it's probably going to stay that way.

      Now if we had a quantum dot where we could multiplex a terabyte of data into the state of a single quantum particle, almost any query you desired to run could be returned in constant time. In this regime, you might be able to get away with a pure formal specificiation, without any decoherence of your Platonic ideal from requirements in the time domain.

    16. Re:The guy... by dkf · · Score: 1

      For data that changes a lot (either on the data element or added) column wise would be crap performance wise. For data that is fairly 'static' and unchanging such as legacy data it would smoke. Having seen a number of databases that have column-oriented store management, it is my (meagre) experience that in fact columns work quite well even when you get into more updates since you can avoid writing those columns that hold the default value (many datasets have many default values). Moreover, writes are almost always less common than reads (an update may involve both, of course.)

      As an aside, column-wise storage allows the DB to do more disk space optimization it seems. I never quite understood why though; I had someone explain it to me, but I still didn't grok it.
      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    17. Re:The guy... by rho · · Score: 1

      At what point are OIDs required for normal interoperation with a Postgres database? At some level I guess they're hanging around, but not to anybody who's not writing a C extension or something.

      WRT DB storage, this kind of optimization probably makes Knuth cry.

      --
      Potato chips are a by-yourself food.
    18. Re:The guy... by smellotron · · Score: 1

      What I'm talking about is that today's *batch* of DBMSes are terrible at storing data that is not an integer or varchar.

      Modern programming languages are generally based on a combination of primitive data types (integers, floating-point numbers, and arrays for C/C++) and aggregates (objects)... heck, even a floating-point number is "just integers"... an exponent and a mantissa. In the same way, relational databases use basic data types, and then aggregate those types (as rows) to form more complex objects. While I may be completely wrong, it sounds like you've either A) not been reading up on the capabilities of your RDBMS, or B) using MySQL/Excel/Access as your RDBMS.

      There are other database engines that give you the power to do things with aggregates, like enforcing arbitrary contraints or providing arbitrary functions (gosh... almost like treating a record as an object). Or, you just get more powerful primitives... Off the top of my head, Postgres allows MAC addresses, IP addresses, timestamps (but who doesn't?), time intervals, and uniformly nested/typed arrays. It also allows defining new datatypes based on old ones with CHECK constraints, so e.g. if I can create a function that validates a URL (or email address, or domain name, or mailing address, or conjugatable verb, ...), I can create a URL datatype and provide a complete set of functions with it.

      In addition, today's DBMSes have no protection against bad data that isn't explicitly engineered into the data model.

      How is that different from any other programming task? If I write a PHP script or a Java servlet, I have to put in my own protections against all sorts of bad user input. No language knows my data model. If I say "write an integer here ____" and the user writes nothing, is that a NULL? or zero? It depends on the domain, and it can't always be automated. I'd say that SQL-based constraints are actually some of the better ways of providing protection against bad data. Define a few datatypes with appropriate validation functions, and then just leave it at that. Want to know if "FOO BA ZUFYE" is a valid MyMagicType? Send it to the database... if it complains, there's something wrong. But no system is going to do the work of a data modeler.

      here are solutions for nearly all of these in your average DBMS product, but they all require that the Database Administrator play dictator to enforce... A better solution would be to take the same tack that computer languages and compilers have been taking: Make a mistake impossible as early in the process as possible.

      So.... you don't want a DBA to play dictator by creating database constraints... but you want the database to enforce constraints? Constraints don't just appear out of thin air—someone has to create them. That's why DBAs exist in the first place: to take a logical set of constraints from the domain at hand, and then translate that into an executable form via our handy-dandy fully-featured RDBMS.

    19. Re:The guy... by jadavis · · Score: 1

      Let's see databases that truly store any data, and enforce the integrity of their sets.

      Postgres has the ability to CREATE TYPE. Not only can you create any type you want, you can specify the input/output representation, and any arbitrary operators or functions of that type. It can even store the data of that type in variable width, up to about 1GB for a single value, stored out-of-line (not in the original table) transparently. You can even make indexes on non-trivial types, such as full text indexes (implemented on two different generalized index access methods) or indexes over arrays or multi-dimensional types (points in 3D, etc). Indexes aren't limited to equality and ranges, by the way, but can also do more sophisticated things with these more sophisticated types (contains, overlaps, consistent, etc).

      What more could you ask for?

      I may have misunderstood your post, but you seem to be confusing object classes (types, domains) and relation variables (table definitions).

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    20. Re:The guy... by jadavis · · Score: 1

      surrogate keys you're talking about -- which are technically a design flaw (or perhaps a "design smell" to crib an XP term), but they're kind of unavoidable for efficiency (I sure as hell don't want to sling around whole arrays of street/city/state/post in my CRM app)

      If by "efficiency" you mean performance, the surrogate keys are a physical implementation detail and should be hidden from the application, in which case you don't have any surrogate keys at all.

      If by "efficiency" you mean that you prefer to deal with data by assigning new logical names (e.g. an employee ID) that the application/user actually care about, then that's not a surrogate key either, but a real key. Even though my driver's license number is generated, I still know what it is, and still may need to identify myself by that number in some situations. So it's not a surrogate key.

      A surrogate key is when you start muddying the distinction between the logical and physical layers. A surrogate key is something that appears in the logical layer but has absolutely no meaning to anything outside the database. Why present something at the logical level that is meaningless at the logical level? It's essentially like the database spitting out physical addresses to the application code ("INSERT successful, record stored on block number 1234 on volume 4576, and it was also at memory address 0x12345678 for a while ...).

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    21. Re:The guy... by nuzak · · Score: 1

      > At what point are OIDs required for normal interoperation with a Postgres database?

      Never. Postgres 8.x defaults to not even creating them.

      --
      Done with slashdot, done with nerds, getting a life.
    22. Re:The guy... by nuzak · · Score: 1

      If by "efficiency" you mean performance, the surrogate keys are a physical implementation detail and should be hidden from the application, in which case you don't have any surrogate keys at all.

      Sure. In an ideal world, I might indeed sling tuples of address data around, but we're often dealing with languages that don't even have a concept of tuples. It's only fancy ORMs that let me treat data without always having to refer to its "handle", and even then it's not guaranteed.

      An employee ID is necessary because names aren't unique enough (I've worked at two companies that have had people with the same first and last name as me), or they may vary (especially in the case of marriage. A driver's license number is actually serving its purpose as a unique ID, and while in some instances it may be a surrogate key for your very identity as a person, we're not at any point where a database can quite encode your ineffable "you-ness" :)

      It's my understanding that you have surrogate keys whenever a row would otherwise be perfectly unique but you use an extra column for the unique id anyway. Purely from a theory angle, that id is superfluous, and isn't really part of the tuple, hence a surrogate. I guess it's only a Bad Thing when you expose them to the application, but I still cringe a little every time I find myself having to explicitly join 'id' columns, especially when I've already established foreign key relationships.

      --
      Done with slashdot, done with nerds, getting a life.
  8. Differnt Solutions by jythie · · Score: 1

    *headdesk* heaven forbid that solutions might be different depending on the situation and the data....

    Nope! Apparently there is a new method and thus it must be the real One Twue Way.

  9. IMS--Hierarchical DB Still Exists by curmudgeon99 · · Score: 5, Insightful

    You've all heard of the IBM product called DB2, right? So what was DB1? Answer: IMS, which is a hierarchical database. They were a pain in the ass to use--PSBs and all--but they were/are faster than hell and I doubt any company is going to throw them out for any reason. Same goes for relational databases. They're going nowhere. Sure, we have room for more but nobody is going to displace the RDBMS anytime soon.

    1. Re:IMS--Hierarchical DB Still Exists by Anonymous Coward · · Score: 0

      Today we call the hierarchical DB an "XML database"

  10. Re : Are Relational Databases Obsolete? by littlefoo · · Score: 5, Funny

    No. There, that was easy !

    It's like the packet of crisps that says "Is there a 20 pound note in here !!?" - the answer should always be 'No'.

    Except maybe for one person.

    sed -e 's/crisps/potato chips/' -e 's/pound/dollar/'

  11. that doesn't mean they're going to become obsolete by Arathon · · Score: 5, Insightful

    Obviously, he's biased. But more importantly, he just said that column-store databases are going to take over the WAREHOUSE market. That doesn't mean that row-store databases are going to become obsolete, because there will always be applications out there that do a substantial amount of writing as well as reading.

    In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.

    So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...

  12. Re:IDG *owns* Slashdot These Days... by kevin_conaway · · Score: 0

    And of course there should be disclosure so that visitors to Slashdot realize they are reading advertisements and not an article submitted by a "real" user...

    Meh, content is content. As long as the linked "article" is informative and sparks discussion, I'm happy.

  13. Marketing hype by FUD.. typical by cwford · · Score: 2, Informative

    From TFA:

    "Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. "

    Marketing hype for his startup.

    What a sleezeball.

    1. Re:Marketing hype by FUD.. typical by thatskinnyguy · · Score: 1

      Would you expect anything better out of Computerworld?

      --
      The game.
    2. Re:Marketing hype by FUD.. typical by Anonymous Coward · · Score: 0

      Yeah, but this is the cost of enjoying Slashdot.

      Consider that it is a dynamics site that handles traffic that routinely knocks over many other static ones. It isn't cheap to maintain that kind of capacity so I doubt advertising revenue pays all the bills. Subscribers are few and far between.

      Without Slashvertisements I don't think Slashdot would still be here in its current form.

  14. No. by Just+Some+Guy · · Score: 1

    Relational databases will be around as long as humans generate relational data. Take the classic example of an invoice that may have many entries, each entry referencing an inventory item. This sort of thing is likely to exist forever, and RDBMSes model that pretty well.

    As far as whether the backend is row- or column-oriented - who cares? As long as I can use the one most appropriate to my access pattern, the implementation details just don't interest me enough to get worked into a furor. Don't get me wrong - I think that there are some neat developments in the works - it's just that I don't have a strong general preference on how my information is physically laid down on the platters.

    A more interesting question for me is whether SQL is obsolete. For the most part, I'd say that it is in the sense that most people need never use it directly. We use SQLAlchemy instead of writing raw SQL, and the Java folks seem to be fond of Hibernate. I still look at the generated queries sometimes to convince myself that it's sane or for debugging or optimization purposes, but if they inserted a new middle layer between Alchemy and PostgreSQL that used something completely different, our code wouldn't notice the difference.

    --
    Dewey, what part of this looks like authorities should be involved?
    1. Re:No. by dada21 · · Score: 1

      Your invoice example is one that historically seems to only work with a purely relationship database, but I beg to differ because I've seen systems in place (custom coded) where an invoice front end was way more interactive in how you entered data. Searches were faster, and for large companies, the shoehorning to get data into place can be almost violent because they have such a variety of what they want to enter, or search for, or assemble by, etc.

      We have one customer, a large contractor, who is trying a new PO system that isn't relational, and uses a ton of non-persistent data. Each supplier, each job, each inventory item might be completely different on a particular job, and the relational database system fails because it is so complex to try to make everything work with one another properly.

      I'm NOT saying that relational isn't the best solution for a simple invoice or inventory system, but it isn't the perfect solution for many companies that have a need to provide a different "mapping" for items that might seem similar. I recall one example where the company orders items from 3 suppliers, with those same invoices being distributed to different jobs that had different invoicing structures and different billing situations. It was a mess if we just said "10,000 items from supplier A, 10,000 items from supplier B, etc" because of the sheer number of fields involved. The system we provided for them, while proprietary to my business, is a much more fluid and adapting system that made the entry clerks and the project managers happy because the system WAS so fluid and capable of dealing with really sticky situations of having 15 suppliers for 35 jobs and 20 different processes to invoice, bill, collect, process and track. What we did was "revolutionary" according to the client, and I'm hoping to integrate it for future clients as well. We tried for years to work these problems out relationally, but it never worked, even hiring some top database consultants who said "just do it the way it has always been done."

      I really think the relational database WILL eventually die as we find more ways to balance the idea of persistent data versus non-persistent data versus how we look things up, store them, and regurgitate them in a different way for different tasks. I am not a database programmer, but I always try to attack ongoing problems using newer tools and unique ways to look at things.

    2. Re:No. by plague3106 · · Score: 1

      You are pretty vague about what data you couldn't handle relationally, but it sounds more to me like a problem with your 'top consultants' not knowing how to do proper entity relation design. If the solution they provided didn't work for your problem domain, they didn't do their job properly.

      I'd like to hear more; I'm not saying relational is perfect, but I find it odd that you had to abandon RDBMs all together.

    3. Re:No. by Just+Some+Guy · · Score: 1

      I am not a database programmer, but I always try to attack ongoing problems using newer tools and unique ways to look at things.

      This is hard to phrase without sounding awful, but I mean it genuinely: does that also mean that you have not studied database design? Is your opinion because you know the research and that what you want isn't possible from an RDBMS on a theoretical level, or because in your experience you've never seen it be made to work?

      I ask because I've heard similar ideas before, but generally from people who didn't know the theory behind what they were doing and had gotten frustrated by the problems that others had already solved, unbeknownst to them. If that's not the case for you, I apologize. If it is, though, you might want to get some good math-based books on the subject before you dismiss it.

      --
      Dewey, what part of this looks like authorities should be involved?
    4. Re:No. by dada21 · · Score: 1

      I'm not a database programmer, so it is difficult for me to use all the key phrases properly, although I think I have a distinct understanding of the technology under it. When I sell a project, I always draft my thoughts on paper (flow chart or object-oriented style).

      For me, the most difficult aspect of the relational database has been what the database elite call impedance mismatch: how to take data in a relational database back end and utilize it properly in the front end. Much of the data that seems logical to flow in a persistent manner may actually be better suited to an object-oriented manner, where a single top level object, say a particular invoice, may actually point to a variety of objects that aren't relational in the 2D sense. Sure, business data entry has always been simplified in almost a spreadsheet manner (when querying the database overview), but we've found that while this system has always been intuitive from a simplicity perspective, it doesn't actually facilitate objects that are way more complicated than the simple relational format forces you to accept. I know that is a jumble of words, but read it again and you'll see where I'm coming from.

      For simple clients, simple databases make sense. But most of our clients are large businesses with a massive amount of data, and shoehorning that data into a relational format leaves a LOT to be desired, even if it makes logical sense from a tiny example perspective. Once you have data that is readily adaptable to different needs for different jobs, you open the door to having software that adapts better to changing needs, no matter how complex those needs are. We have individuals within companies who want varying degrees of complexity for data input, relating data to other data, and even report generating, and no matter who I've subbed work out to, I've often heard "Oh, that can't be done." When I ask why not, they said that the database just can't handle all the different directions the various individuals at the client's office want. Yes, it is complicated, but only from a relational view. Once you open the system up to allowing more complexity in how objects refer to one another, you give individuals a better opportunity to enter data that THEY see important, even if others within the organization would simply ignore it or not enter it all together.

      Again, I'm not a programmer, so I'm not familiar with all the issues, but I have worked hard in the past few years to develop (on paper, and eventually in code through programmers) systems that take the flat/relational invoice idea to another level, giving employees at different levels of the organization the ability to convey information they feel is important into a database, so others CAN utilize the information if needed, but it is still there for that particular complex employee's needs. Not all our systems work well, yet, but they work better than before even with the glitches.

      Object-oriented databases are scary, they can be a real mess, and they can make simple tasks slightly more complex, but I think the added efficiency of allowing people to work with a larger dataset in various manners is worth the initial mess of coding what I want into reality. From what I can tell, most younger database programmers seem to WANT more complexity because they also can see the added incentive of a short term inefficiency (design and coding) that leads to a long term added efficiency (more adaptable data sets, entry and reporting) because the objects are tied down relationally.

      If that didn't make sense, I'm sorry -- it is had to convey what I want to do without giving away the cow to someone else to beat me to market :) Googling my name and some of my industries might give one of my competitors a lead in what we're trying to accomplish, ha.

    5. Re:No. by Anonymous Coward · · Score: 0

      It sounds more like your 'DBA' had no idea how to do schema design for a well designed and normalized database. Nothing about what you've vaguely described wouldn't fit well in a well designed RDBMS.

      People who don't really know RDBMS's dont' want to use them. Shocker. I'd hate to be the one to have to come in and clean up all their data after you guys get done with them. If it's not relational, it's most likely just a matter of time till they find things aren't adding up correctly.

    6. Re:No. by LurkerXXX · · Score: 1

      So you admit, you aren't a programmer, and aren't trained as a DBA either, yet you have taken upon yourself to decide that RDBMSs are all wrong, without really understanding them or their theory in detail.

      Thank you for creating lots of work for the rest of us cleaning up the mess after you have trashed your clients data most thoroughly. That's lots of billable hours for us to clean it up and put it in a proper relational format.

    7. Re:No. by dada21 · · Score: 1

      I've had the unfortunate life that I've butted heads with databases since I was a teenager, going back to the old dBase junk on contracts.

      I have studied database design extensively over the years, but not to the mastering stage. The label "just enough to be dangerous" applies to me, and yes I know that database programmers HATE that.

      I've also taken steps to study RDBMS on a theoretical level, and I think it IS adaptable to the object-oriented ideas that I have, but I think it is more of a kludge than an efficient long term solution. Maybe relational database can be adapted to be more object-oriented, but I don't think they're the solution. When you deal with TONS of information on a given project, information that is important to 2 users out of 400, how do you shoehorn it into a structure without making the relational structure overwhemlingly complicated to manager? I can't begin to tell you how many "experts" we've hired that have said that relational is not the solution to what I envisioned, and eventually we found other solutions to the problem (on one occasion, we took steps backwards to hierarchical (sp) databases which worked great).

      Slashdot is a great learning too when subjects like this come up, because of the vast amount of really intellectual readers and posters. I appreciate that, and I'd say I learned more about database solutions from the infrequent database topics here than I have from books or hands-on experience with my programmers. What I am trying to accomplish for my clients is to take steps in NEW directions, even if others have failed before. It is why we can charge what we charge, because we look to adapt to future technological growth ahead of when that growth becomes commonplace. Do we fail? Yes. Are some of my ideas simple on paper but impossible in reality? Yes. But when you have someone like myself who is not a layman, knows enough to be dangerous, and can think outside the box with a team of good programmers, I think we make inroads to a field that resists these kinds of changes. You should see the look on the faces of the OEM code providers when I explain what we're going to do with a secondary database outside of theirs, using our own report writers, data entry software, and management summaries. Eventually, things work, they work more efficiently, and the clients are satisfied even if my programmers get wigged out by the ideas.

      The web is a great example of what I feel is a good direction. How could you take a subset of a website, its outbound links, and all of its users and stick it into a relationship database and have it function? You can't. We use dozens of databases just to handle slashdot and the sites it links to, but I think there is a object oriented structure ABOVE the actual databases and backend code used that COULD allow you to encode that subset into its own non-relational structures. Maybe a combination of a hierarchical (sp) database with top objects and sub-objects having external linked-list options? I don't know, I'm not a programmer. But it can be done, and it should be done, so individuals can enter relevant data, and utilize relevant data of others, to produce the reports they need.

    8. Re:No. by dada21 · · Score: 1

      That's lots of billable hours for us to clean it up and put it in a proper relational format.

      Most of our work adding features onto another OEMs software has been provided for without trashing their datasets by incorporating secondary non-relational databases on the side and using our own front end to query their database and ours and stick information back to the proper databases. This way, accounting can deal with the OEMs terribly outdated front end, while management, field workers, and company officials can enter their own data, and run their own reports, without trashing accounting's data. It works, but it's a little bit of a kludge.

      By the way, I love giving work to local database programmers rather than offshore ones (in this case because of the language issue). So I don't mind throwing you all that extra work when we do make a mistake :)

    9. Re:No. by Retric · · Score: 1

      Chances are the "experts" did not know what they where talking about. Let's say you're designing a billing system that handles 10 years of historical data on 1000 transactions a second and interfaces with an average of 5,000 billing systems with 3-5 new billing systems a week. With a little work you can solve this problem using a moderately complex backend DB. The problem is separating the basic functions of a billing system from the implementations of each billing system.

      And the simplest way to do this is to dynamically generate one or more tables for each new system which also fits the basic needs of your billing system. Yes, that's right you can dynamically update your database schema at any time. Granted the larger the table the longer this takes but as long as you separate your core functions from the details you just prevent new data from entering that table and have at it.

      PS: Now I know you like using the next new thing so if it makes you feel better feel free to use XML to keep track of your DB schemas but the point of DB's is they increase speed, stability, and data integrity and let you work on harder parts of the problem at hand.

    10. Re:No. by Retric · · Score: 1

      The problem is not the RDBMS model it's people who try to use it without understanding LISP. People write OO software in C++, Java, C# etc which stores data in a flat file(RAM) though the use of a compiler. They then try and save a subset of this data into a static Database and assume they need to a few tables on a monster machine but try thinking about this.

      Take a table with ID an AGE and a Type.
      101, 10 years old, Person
      102, 10 months old, Dog
      103, 10 second old, Thread

      You then have a Thread table a Dog table and a Person Table but you can easily find an age for any ID. Now your Person and Dog tables might have gender but it's meaningless for a Thread yet everything inherits its Age data from the parent object. What's my point you don't shoehorn it into a structure you build a structure that fit's you data it's easy if your object has inheritance do the same thing in your database.

      PS: The point about LISP. Think about this odd but valid table Invoice Items.
      Invoice ID, Item ID, Cost, Type, Company
      101, 731, 25k, bike, BMW
      101, 732, 25k, bike, BMW
      102, 101, 50k, Invoice, BMW
      102, 971, 50k, car, Honda
      103, 102, 100k, Invoice, Composite

      My point this simple looking table can handle any level of recursion. Now if you want the total costs you select from Invoice Items where Type not Invoice. Granted in the real world you need time stamps, status, etc but you get the idea.

    11. Re:No. by Anonymous Coward · · Score: 0

      I think you will enjoy some reading on the relational theory for example c.j.date 's databases in depth. Also a question how are you at math. Not to well probably that does not matter you are probably better at less boring things but please do not assume that the reports you think up automatically make sense logically because they make sense to you. Knowing some relational theory can really be enlightening.

    12. Re:No. by plague3106 · · Score: 1

      What you are talking about is ORM, object-relational mapping. Basically, how do you get relation data into objects. The solution is to build objects which ONLY statisfy a single use case. The business object then knows how to get to the data it cares about. There are some problems with OO databases, and reporting out them is hell.

      ORM is greatly simplified if you build your business objects correctly, but its hard to discuss without getting more technical.

    13. Re:No. by Anonymous Coward · · Score: 0

      How could you take a subset of a website, its outbound links, and all of its users and stick it into a relationship database and have it function? You can't.

      Psst. Where do you think the post you just made is stored?

  15. Yea, it's all the same. by SatanicPuppy · · Score: 5, Insightful

    Column stores are great (better than a row store) if you're just reading tons of data, but they're much more costly than a row store if you're writing tons of data.

    Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.

    On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.

    Why do people insist that one size really does fit all?

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    1. Re:Yea, it's all the same. by theGreater · · Score: 5, Interesting

      So it seems to me the -real- money is in integrating an RDBMS which, for usage purposes, is row-oriented; but which, for archival purposes, is column-oriented. This could either be a backup-type thing, or an aging-type thing. Quick, to the Pat(ent)mobile!

      -theGreater

    2. Re:Yea, it's all the same. by stoolpigeon · · Score: 4, Interesting

      Maybe, but I doubt it. The money is in the data warehouse market and the etl tools that move the data from the oltp environment to the warehouse environment. I think what the author points out is not that people are trying to use the same database to do both, but rather that they are trying to use the same product to both. He says it would make more sense to use Oracle (for example) for oltp - and something else for the warehouse, rather than trying to get Oracle to do both well.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    3. Re:Yea, it's all the same. by KingSkippus · · Score: 5, Insightful

      Why do people insist that one size really does fit all?

      I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.

      He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"

    4. Re:Yea, it's all the same. by jhantin · · Score: 1

      A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return.

      So build a covering index. What they seem to be driving at is aggregation performance, though, not just raw read performance; for that you need to build a materialized view, and not all RDBMS support those conveniently so you end up building a horrid mess of triggers or just rebuilding your proto-aggregate table out of a cron job. Column stores have an easier time of aggregation (just scan the column and fold) unless you are doing something that really is row-oriented, like SUM(A.x + B.y * B.n) over some huge wretched join with conditions have absolutely nothing to do with what you're aggregating.

      Probably a better policy is to have a DBMS that supports a variety of table store formats. MySQL wins that point, and would probably run away with it if it supported a column store format. Second best is to have a variety of index store formats, like Postgres. Or you can always have just one trick but optimize it to the hilt.

      --
      ...when you're writing a game...tweak the difficulty of "Easy" to something [your mother] can cope with. -- onion2k
    5. Re:Yea, it's all the same. by GeckoX · · Score: 5, Funny

      I've got it!

      Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!

      Just think about it, what could be better than one file for each column in each row?

      And they said I couldn't have my cake and eat it too, sheesh ;)

      --
      No Comment.
    6. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      Quick, to the Pat(ent)mobile! So that it won't happen for 20 more years! Except in Europe, China, Russia, India, South America and Canada! Genius!

    7. Re:Yea, it's all the same. by jkrise · · Score: 5, Funny

      I went back and read the original article....

      Oh, the horror! That's a heinous crime on Slashdot! Not even the editors do that!!!

      --
      If you keep throwing chairs, one day you'll break windows....
    8. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      But what if, instead, you just proto-aggregated the sum of the RDBMS gizinas with the mean of the SQL values of the hypercottles? I mean you don't REALLY want to get into how the column width offsets in Open Office are going to merge with the Row values stored in row-row-row your boat. We should just delete the delta of the data's cosine values and be done.

      Sorry, your post sounds very erudite - and I wouldn't presume to dispute it for a minute. I just had no idea what you were talking about (not being a database guru) so I thought I would represent it how it sounded to me...

    9. Re:Yea, it's all the same. by Jerry · · Score: 4, Interesting
      Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!


      You are years late. The PICK operating system/db already does that. Back in 1985 I used the DOS based Advanced Revelation to write GAP accounting packages. It used the ASCI 253 character to separate "columns" of data in a cell. Reading and writing was parsed automatically. Invoice information, for example, was stored in a Customer's info table, not in a invoice table, and doing a query on accounts receivable produced very fast results. Symbolic dictionary definitions containing formulas allowed for easy column and row totals.

      In fact KDB/K looks a lot like a PICK system that uses FORTH as the language.

      --

      Running with Linux for over 20 years!

    10. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      Parent is not adding anything. All he did was take a sentence from the article, then rewrite it into two paragraphs that say the same thing, then finish it up with a re-write of the title of the column from the second link.

    11. Re:Yea, it's all the same. by hackstraw · · Score: 1

      Column stores are great (better than a row store) if you're just reading tons of data

      Sounds reasonable. Plus you could store the columns on different disks or machines for scaling. Sounds fine, but only evolutionary, not revolutionary.

      Now, mod me to oblivion if I'm a moron, but years ago when I did DB stuff, it seemed common that I wanted a single cell in a table to contain 0 to potentially "lots" of values where each one was indexed. Oracle did have a hack that allowed something similar, but it was not indexed, and it was slow as molasis. By storing data by columns, this kind of feature would be possible.

      Am I the only one that wanted such a feature, or was I missing something back then?

    12. Re:Yea, it's all the same. by Retric · · Score: 1

      Add a second table.
      T1 Bob
      (ID), (Has Items),...
      123, false,...
      124, true,...

      T2 Items
      (Bob ID), (Item ID),...
      124, asdfa,...
      124, asdfb,...

      You then do a join on whatever you're searching for.

    13. Re:Yea, it's all the same. by Doctor+Memory · · Score: 1

      when I did DB stuff, it seemed common that I wanted a single cell in a table to contain 0 to potentially "lots" of values where each one was indexed Something like a single DB column that has an array data type? This is generally used in data warehousing situations to provide access to aggregate details (e.g., you have a row that contains the total of all sales for a quarter, but there's a field in there that contains all the sales by region). It's also generally wished for by people who don't understand relational concepts and want to put all the data they need in one table so they don't have to do any joins.

      I'd prefer to believe you just did a lot of data warehouse work... ;)
      --
      Just junk food for thought...
    14. Re:Yea, it's all the same. by SixDimensionalArray · · Score: 2, Informative

      Just to add to that - PICK/D3 is called a "multivalue" database, and there are a lot of interesting kinds of databases out their that use this "multidimensional"/sparse array storage concept, such as the ancient MUMPS legacy system/database (now implemented as Intersystem's Cache product) found in old Veteran's Affairs (VA) systems in the US.

      I actually wonder if some of the current databases such as Microsoft SQL Server, etc. aren't going to actually start morphing into these older styles of databases due to the increase in use of XML files/data, which by their very nature are hierarchical and kind of "multidimensional". Actually, the company that now maintains PICK/D3 (Raining Data) has an interesting XML database (Tigerlogic) that uses some of the old technology and new technology combined.

      This could be a great and possibly painful experience of history repeating itself!

      SixD

    15. Re:Yea, it's all the same. by bjourne · · Score: 2, Interesting

      The sane thing to me seems to be to just skip the oltp step since the data contained therein is a strict subset of the data in the warehouse. Let's say you design a relation database for a company. You have a table with Employees and another with their Positions, a Project table and so on like the orthodox relational model tells you to. Which works great until someone quits, then you have a problem.

      Naturally, you don't want to delete that person because then you lose lots of important archival data. So you introduce two fields in the Employee table which contains the dates when the employee started and quitted. You also need to do the same for almost all other data. If Bob gets a raise, you don't want to simply update his salary field, you want the data to state that Bob had salary $x during the dates $A-$B and $y during the dates $B-$C. So you need even more fields for validity intervals to fully describe your data. It gets incredibly messy, all your queries becomes really complicated and you need to employ stored procedures and triggers to keep your data intact. The relational model breaks down.

      What you need at this point, is some kind of "event based data storage." Like "Bob got a 5% raise 20070801" or "Company address changed from $x to $y 20001010." For these kinds of data, a warehouse is much better suited. A column based storage would probably also be much faster than a row based for it.

    16. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      I went back and read the original article.

      What is this "reading" of the "original article" you speak of?

    17. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      they might be able to do that and still present a virtual 'row' based model to the public via SQL.

      There is a product called Relavance which has each item (can have embedded attributes) is universally addressable via a 4 ordinal vector and it can be 'associated' on any number of 'axes of association' (Each gets a name) to other items for any reason. So you for example, associate Employees to their respective Bosses on the ReportsTo axis. In this way, it doesn't use tables, but sticks the vectors of the associates directly in an embedded attribute for the ReportsTo axis, and just has to read that to know where the others are, so you don't have to pass over a row set and compare values for a match. For example if you run a 'query' you find your source item, the Boss, and simply read all of it's associates on the right axis, for example "DirectReports", and you have the keys of all the Employees that report to that guy in one go. You can even take it a step further, and associate them all to the item representative of the query, JimsEmployees, and give it an axis called "Members" so you only have to read the associates of a single item to get your results of your entire query. If you add a new report later, you can just add a new one, no restructuring of any existing items or data, or whatever.

      If you are clever about how you balance the embedded attributes to associates, then you can produce a system that will be usable as both online transaction and functions like a datawarehouse, simply because you can just add more associates to the 'in place' data without touching whatever else has to read it. It's a really obscure concept for those used to relational databases, but its' actually quite useful if you learn it. It can be an independent storage system, but it's also often used as a BI piece on top of your existing databases (it sniffs transaction packets and mirrors their activity in the associative store) so that you can do realtime DWH functions on your otherwise relational database.

    18. Re:Yea, it's all the same. by stoolpigeon · · Score: 1

      Columns read fast - rows write fast. It's that simple. Then just look at what the system needs to do. If there are lots of updates and inserts - go with rows. If the data is being loaded in huge chunks that bypass all the normal restrictions and the only consideration is reading quickly - go with columns. There really isn't much need (as far as I can see) to make it any more complicated than that.
       
      In my experience - hiring, firing, changing (raises, promotions, lateral moves, etc.) all involve enough writing that oltp is where it is at. But more on target - it is also more efficient for building and processing orders. But once those orders have been shipped and what not - some manager is going to want to dig around in that data - and she wont be writing anything - just reading. So somebody is going to be shipping all those rows from all those normalized tables into a star schema or something - and then columns may make sense as another method to help make that data accessible for bi and what not.
       
      Those two functions, as I said, wont be occurring in the same place, so there is no need to choose between the two options. Use them both.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    19. Re:Yea, it's all the same. by Some+Pig! · · Score: 1

      So it seems to me the -real- money is in integrating an RDBMS which, for usage purposes, is row-oriented; but which, for archival purposes, is column-oriented.

      Sybase does this. It integrates its relational server (ASE) with its columnwise server (IQ). Not the only ones doing it, just an example I know something about.

      IQ has been on the market since the mid-90's, and may have been the first commercial columnwise database server. But sales have taken off only recently.

    20. Re:Yea, it's all the same. by SatanicPuppy · · Score: 2, Informative

      Not sure how that follows...The column style database would be functionally identical to the row style database, only you'd have column proliferation instead of row expansion.

      The easiest way to deal with proliferating events is to create a very simple table that has a timestamp, your basic audit information (user who made the change, change the terminal was made from, etc), and the change itself.

      So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee record id, bob's salary, the date, and the audit crap. That's it. Then when bob gets a raise to 55,000 there is another simple entry, id, salary, date, audit crap. Etc, etc. All your data is there, you can easily retrieve the history, you know when the changes were made and by whom.

      It's all about normalization. Why put in two date fields if you don't have to? Two records, each with one date, will give you all the info you need, simplify your queries, whiten your teeth, etc. Whenever you have an event driven model, just throw the event, in the simplest possible form, in a table. If your tables start proliferating out of control, check your normalization and make sure you're not duplicating data across multiple tables. If that's not the problem, try to refine the scope of your database. If it's doing too many things, then try to break things out by their relevance.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    21. Re:Yea, it's all the same. by hackstraw · · Score: 1

      Something like a single DB column that has an array data type?

      Something like that. That rings a bell. I tried an array datatype in Oracle, and it was slow as molasis. I don't remember the details, but I do remember that it came up more than once that it would have been nice to have an indexed array datatype or something like that. Its been years.

    22. Re:Yea, it's all the same. by Twinbee · · Score: 1

      Write an entry in Wikipedia if this is so much better than the other database systems (or even if it's as good). You might as well :P

      --
      Why OpalCalc is the best Windows calc
    23. Re:Yea, it's all the same. by TheRealMindChild · · Score: 1

      "PIC", not "PICK". You used PIC BASIC, and the machines were outfitted with a PIC Controller.

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    24. Re:Yea, it's all the same. by kv9 · · Score: 1

      I just had no idea what you were talking about (not being a database guru) so I thought I would represent it how it sounded to me... I believe what you meant is this
    25. Re:Yea, it's all the same. by Anonymous Coward · · Score: 2, Informative

      No, it's not "PIC", it's "PICK", named after it's creator Richard Pick who originally wrote Generalized Information Retrieval Language System (GIRLS) in 1965 with Don Wilson to help inventory helicopter parts during the Vietnam War.

      http://en.wikipedia.org/wiki/Pick_operating_system

      PIC is something completely different.

    26. Re:Yea, it's all the same. by daVinci1980 · · Score: 1

      A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return.


      This is untrue. In virtually all row databases, the index stores the offset for each field for each table. Records are fixed width, and the offset to each field within a record is at a known offset from the beginning. If I query for the last name of a person, I read only the last name of a person for each entry in the table. I read nothing I don't want.

      It's no accident that to move from one record to the next requires a single assembly instruction (an add) with periodic bulk reads from disk.

      This is why having variable length text in DBs screws up the perf, what I've said is no longer true. However, most DBs store text as a fixed size pointer to another buffer elsewhere to avoid precisely this problem.

      --
      I currently have no clever signature witicism to add here.
    27. Re:Yea, it's all the same. by Firehed · · Score: 1

      It's also GAAP, not GAP. I'd strongly question this guy's credibility.

      --
      How are sites slashdotted when nobody reads TFAs?
    28. Re:Yea, it's all the same. by leenks · · Score: 1

      This is ideal - not everyone reads the article :)

    29. Re:Yea, it's all the same. by CodeBuster · · Score: 2, Informative

      PICK has some very serious disadvantages over the relational databases, not the least of which is the rather annoying property that queries often do not return *ALL* of the related records unless the person writing the query has a very intimate knowledge of the data cubes and how the database is connected and even then there are no guarantees. The brilliance of atomicity defined by Codd and the query guarantees that one is able to make when the "one value per cell" rule is enforced are not of trivial benefit. PICK and multidimensional can be convenient in some circumstances, but the marketplace has basically decided that those niches are not worth the hassle of giving up the query benefits of the relational model for more general cases, especially when absolute speed of the query is less important than completeness of the information returned. It has also not helped that multivalue database vendors, the few that are left anyway, have generally lagged behind the relational database vendors in terms of tool support and integration with other systems.

      Note: I may be somewhat biased because of my long use of relational databases and a really bad experience updating a jBase solution that really was "long in the tooth". You might argue that I just saw a bad example of multivalue database application, but I think that it goes deeper than that or else we would all be using PICK today and not some flavor of SQL.

    30. Re:Yea, it's all the same. by Sancho · · Score: 1

      Right. The person to whom you replied seemed to want logic put into the database. A database is meant to hold data. Interpretation of the data is left to the program using the database.

    31. Re:Yea, it's all the same. by SatanicPuppy · · Score: 1

      One of the most common things I see in a database is people trying to store data in such a way that they never have to put multiple joins in their queries. For example, a single record with all of Bob's information that has to be updated every time something about Bob changes, and then maybe copied, whole cloth, to a "history" table so they they can query that history table directly if they need Bob's record for the year 2005.

      Just a mess, and incredibly inefficient.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    32. Re:Yea, it's all the same. by SatanicPuppy · · Score: 1

      There is still file access overhead because it's two dimensional...The index may store the offsets, but you're going to have to parse them to get to that field before you can start returning elements. A column style database is effectively one dimensional...The only records in the file belong to the field you're looking for.

      More significantly, it's one data type per file, so you can use the most effective compression for that particular data type, rather than having to use a type that suits a non-homogenous row database.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    33. Re:Yea, it's all the same. by bjourne · · Score: 1

      So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee record id, bob's salary, the date, and the audit crap. That's it. Then when bob gets a raise to 55,000 there is another simple entry, id, salary, date, audit crap. Etc, etc. All your data is there, you can easily retrieve the history, you know when the changes were made and by whom.

      Let's say Bob was hired 20000101 with a salary of $50000. Bob get raise 20050101 to $55000. Bob gets fired 20060606, Bob gets hired again 20070501 with the salary $10000. What is Bob's accumulated salary and how much has he earned on average every month? What was the average salary of the whole company in 1999? How do you prevent someone from inserting a row stating that Bob's salary was $99000 in 20070101?

      The answer to all of the above is some very complicated queries, stored procedures and triggers. When you are dealing with temporal data, the relational model doesn't fit anymore. You need something else and better.

    34. Re:Yea, it's all the same. by c.gerritsen · · Score: 2, Informative

      Actually, Stonebraker did say, and I quote the original article that you went back and read (with a dereferenced pronoun in []),

      [The current major relational DBMSs] should be considered legacy technology, more than a quarter of century in age and "long in the tooth".

      He called the existing major RDBMSs legacy and long in the tooth for not implementing the feature he is trumpeting, column storage, as an option when setting up a DB for use. He laments the fact that these vendors don't give you the option of setting up your DB in a way that provides huge performance gains for some usages that are becoming more common everyday, and says that there will be a revolution.

      He is not saying that we don't still need row-oriented RDBMSs...which makes me wonder, does Vertica support both row and column stores?

    35. Re:Yea, it's all the same. by bgspence · · Score: 1

      He did say 'The current major relational DBMSs (DB2, SQLserver, Oracle)...' 'should be considered legacy technology, more than a quarter of century in age and "long in the tooth".'

    36. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      Wasn't Don Wilson in the Beach Boys?

    37. Re:Yea, it's all the same. by gasmasher · · Score: 1

      Read a book ya whippersnapper. PICK is exactly what he meant.

    38. Re:Yea, it's all the same. by Gorobei · · Score: 1

      Having spent the last few years designing and building temporal databases, I'm with you on this one. You just can't do it with SQL (that's your assembly language, not the programmer environment.)

      Most people don't need, want, or understand the stuff that temporal models can provide, but those that do tend to react with dropped jaws and "OMFG"s when you show them a system that does this stuff.

    39. Re:Yea, it's all the same. by Nefarious+Wheel · · Score: 1
      is often a matter of writing a record to many files...Obviously more costly.

      Uh, careful with that assumption Eugene -- that presumes a single disk, where your statement would be true. In a large RAID-5 or RAID-1+0 array it could be as quick, or even quicker to write to multiple files depending on where the allocation table resides (Yipes, Stripes!).

      Other than that, you're making sense, but I'd add -- how does retrieval stack up relative to index generation? You have to give up something to get that speed, RDBMS' have a lot of in-built power, refined over a long period of time. Is this shift all due to XML storage doubling up on metadata? Are we tending towards external index builds due to the upsurge in search technology? Will this fly like the leap from CODASYL to Relational DBMS' or will it um, "achieve a flat trajectory" like OODB? Tune in next week...

      --
      Do not mock my vision of impractical footwear
    40. Re:Yea, it's all the same. by sco08y · · Score: 1

      Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.

      And the Way It Should Work is that the DBMS offer both modes of storage, profile query performance and choose the best mode of storage without user intervention.

    41. Re:Yea, it's all the same. by jasen666 · · Score: 1

      I'd like to see a good example of one. Recommend a good place to research further?

    42. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      haha... i read olpc instead of oltp and thought you were going a different direction with that post...

    43. Re:Yea, it's all the same. by canuck57 · · Score: 1

      You can read more on it here: http://en.wikipedia.org/wiki/Pick_operating_system

      Now it runs as a DB in UNIX. Had to deal with it for 2 clients. Also known as UniVerse with /u2 file systems, Ultimate Systems. Yep, I know that one.

      What an I/O pig. The more disk I/O bandwidth you can give this dinosaur the better. No kidding. I couldn't recommended PIC/PICK to my worst enemy, as they would come after me. I added up the disk I/O blocks need to simply do monthly financial statements for just 200,000 accounts and could not believe the numbers I was seeing. And this application can be bought today!!

      Put it on Linux with Postgres with 1/10 the hardware cost and enjoy. As unless you can run PICK (or derived systems) entirely out of RAM with abosolutely no page/swap forget it.

      Or if your management is dumb enough to buy it, suggest the sales people load up a system with all your data, do a real report and watch. It doesn't even have transactional safety. Be sure to crash test it half way through.

    44. Re:Yea, it's all the same. by Gorobei · · Score: 1

      About the only the good place to research futher is Wall Street. There might be some minor players like the DoD, etc.

      Almost nothing has been published, but a few people at all the big firms know the issues: data is 2d (it has an arrival time and an effective time,) restatements are common, we need multi-dimensional time-travel to explain variables, etc.

      Want to give it a shot?

    45. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      If you only care about a twenty-byte field out of each hundred-byte record, the data you're after is now spread across five times as many sectors as it would be if stored contiguously, interleaved with the 80% crap the drive will decode from the platter and transmit to the controller (and probably DMA into the page cache) only to be ignored by the database.

    46. Re:Yea, it's all the same. by bytesex · · Score: 1

      That's not a database; that's a log. If your database is supposed to never remove things from it, you're better off trying to invent a system that can cheaply apply deltas to itself, than forever keep pushing things on top of itself.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
    47. Re:Yea, it's all the same. by jadavis · · Score: 1

      Probably a better policy is to have a DBMS that supports a variety of table store formats. MySQL wins that point, and would probably run away with it if it supported a column store format.

      I disagree. Changing the storage engine in MySQL changes the semantic behavior -- enough to introduce and eliminate major features.

      I support the idea of being able to store relational data in a variety of ways. Not just "store this table in this engine, that table in that engine" but also more sophisticated storage schemes, like storing one table in many files (perhaps to improve UPDATE performance) or many tables in one file (perhaps to improve join performance on those tables). However this should not change semantic behavior, only performance. The whole point is achieving performance without sacrificing your logical design or constraints.

      MySQL is a step backwards when it comes to that goal.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    48. Re:Yea, it's all the same. by NoelProf · · Score: 1

      I've programmed and maintained a number of "Pick" applications as well as applications using SQL databases such as PostgreSQL. I think of Pick as a poor implementation of a poor idea. You can easily get yourself in trouble designing multi-value databases in Pick. You'll always be jumping into procedure land to cull through your data. Even with ideal table design (called files in Pick), try doing a sub select. Or an outer join with a having clause. The only nice think I could say about Pick and it's variants is that with low order data sets, applications and hardware, pick was very efficient.

    49. Re:Yea, it's all the same. by MariusBoo · · Score: 1, Informative

      Seemed interesting to me too. Try http://www.cs.aau.dk/TimeCenter/pub.htm. Lots of stuff, not obvious where to begin.

    50. Re:Yea, it's all the same. by RAMMS+EIN · · Score: 1

      Isn't all this just a tornado in a glass of water? I mean, who cares
      how the DBMS maps records to files, or if it even maps them to files at
      all? I'd think that after, what, 50 years of having relational databases,
      the implementers should know how to make things efficient.

      If you open up a random book about databases, chances are it is full of
      information on exactly how database implementations work around various
      performance gotchas in operating systems, never mind that many of these
      have been worked around in operating systems (at least Linux) by now,
      as well.

      And I thought the whole reason we're still using RDBMSes, rather than
      persistant object stores (which map much better to the data models of
      our programming languages) is that RDBMSes can be and have been made
      efficient. Any idiot can come up with the idea to make a database that
      stores objects just the same way his favorite programming language does
      it, thus obviating the need for shoehorning your data into SQL's data
      model. However, there aren't as many idiots who can stamp out such a
      database and make it competitive with the established RDMBSes in
      terms of performance.

      Column-stored vs. row stored? I don't even know what that means. Store
      columns together instead of rows? Yeah, whatever, just make what I am
      doing fast. I can't believe people only thought of this recently.

      So, to sum up my thoughts in a somewhat less ranty manner:

      1. As far as I know, relational databases are far from dead. I know
      few people who wouldn't want to trade the data model for something that
      fits better with their programming languages', but, as far as I understand,
      we still end up using existing RDBMSes because they are _fast_.

      2. Column-oriented vs. row-oriented databases are just another optimization.
      People are optimizing DBMSes all the time. I hope this isn't going to
      become some heavily advertised feature...might as well start advertising that
      your operating system does inode caching, or whatever.

      --
      Please correct me if I got my facts wrong.
    51. Re:Yea, it's all the same. by Bastard+of+Subhumani · · Score: 1

      The sane thing to me seems to be to just skip the oltp step
      I don't know what your definition of OLTP is, but I don't see how a company that doesn't do any operations stays in business.

      since the data contained therein is a strict subset of the data in the warehouse.
      What point are pretty reports and graphs if the company isn't doing anything? Where would the warehouse get its data from?
      --
      Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
    52. Re:Yea, it's all the same. by Hognoxious · · Score: 1

      Those two functions, as I said, wont be occurring in the same place, so there is no need to choose between the two options. Use them both.
      The trick is using the right one for the right job ...
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    53. Re:Yea, it's all the same. by Stooshie · · Score: 1

      ... event based data storage ...

      Sheesh, if only someone would invent triggers. Oh, they have?

      --
      America, Home of the Brave. ... .and the Squaw.
    54. Re:Yea, it's all the same. by dintech · · Score: 1

      Are you storing massive amounts of data? Column stores are probably not for you...

      Not always. I work in the financial services industry and manage a database called kdb+. It's a column store database which for us stores market data. We collect around 10 gigs a day. The reason that a column store database is used over row store is that analysis on a single column (most importantly time) is very straightforward and way more performant than what could be archived by row store.

    55. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      Sybase has separate database servers to do just that: ASE (Adaptive Server Enterprise - formerly SQL Server | row storage) for OLTP, IQ (Intelligent/Integrated Query - a competitor to Teradata | column storage) for data warehousing.

    56. Re:Yea, it's all the same. by Mr.Ned · · Score: 1

      That's what his company, Vertica, is developing. Their database is aimed at the data warehouse market - yes, they update information continuously or on some regular schedule, but most of their heavy usage is reads. They've got a small, traditional row-based store for inserting data, and it's periodically incorporated into the large, column-based store. It's all transparent to the user, and looks pretty slick.

    57. Re:Yea, it's all the same. by Tarwn · · Score: 1

      Or you could look at manufacturing plants that have been storing, retrieving, and deriving time series data for as long as their have been time series historians. And even though "Wall Street" might have some nifty new systems, the data providers they connect to (or I should say the largest provider of stock pricing data) use SQL Server, an RDBMS.

      --
      Whee signature.
    58. Re:Yea, it's all the same. by TheRealBurKaZoiD · · Score: 1

      Your comment about SQL Server and it's new support for XML and other non-relational formats, and a previous comment made by someone regarding using different products for different purposes (oracle for olap, and something else for the warehouse, for example) made me wonder about this: so where (or how) exactly would the non-relational features of an RDBMS fit into this picture? My question is not worded well, but the example I'm thinking of is Oracle and it's support for nested tables (a table within a column). Does this type of feature fit into this "column-based" approach?

    59. Re:Yea, it's all the same. by Anonymous Coward · · Score: 0

      I, and the person who had my job before me at my last place of work , would concur with every word you just said. It was our function to migrate new customers from their old POS application, which stored all data in D3 over to our Access/MS-SQL backed app. I can't even begin to tell you how excruciating it was to pull out data in a format we could use, especially as each and every customer had a slightly different version of the application and a slightly different way of storing data in the database.

      Again, perhaps I just saw a really bad example, but I will never again touch anything like it again. *shudder*

    60. Re:Yea, it's all the same. by Stooshie · · Score: 1

      I think the GGP was saying that the difference between column based and row based databases were like the difference between driving automatic and driving stick! ;-)

      --
      America, Home of the Brave. ... .and the Squaw.
    61. Re:Yea, it's all the same. by Stooshie · · Score: 1

      I am presuming (perhaps wrongly) that indexes can be created agains columns in column stored DBs which would speed things up.

      --
      America, Home of the Brave. ... .and the Squaw.
    62. Re:Yea, it's all the same. by SatanicPuppy · · Score: 1

      Don't do it in SQL. That would be silly. Do it in your programming logic.

      I'd be interested if you would have another way to do it in the database. Myself, I'd create an entirely new employee record for bob, and have a single field referencing his previous employee entry. There would be no reason to carry over all the previous bob data if bob was re-hired for some vastly different position...It's no longer an apples to apples comparison, because as you cogently pointed out, the data are wildly disparate.

      And there is obviously no way of preventing someone with access from inserting any row they like...Same with any other database. That's what the audit information is for. With any other method, you're going to have a much much narrower audit history. If you create a new entry for every salary change, rather than updating an old entry, then you have an extremely detailed history of every change, and every user who made that change. If you were trying to catch fraud, that would be an invaluable tool. It would also be very easy to use pure SQL to catch records that changed too often, or records that changed by too large an amount.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    63. Re:Yea, it's all the same. by mksql · · Score: 1

      > Why do people insist that one size really does fit all?

      Because people are lazy, and techies are no different. Many want to learn one technology, so every problem can be addressed with a minimum of intellectual effort. Look at any current whiz-bang buzzword tech, and you will see it is marketed as the solution to all your problems.

      "Learn different languages for UI and data access. The hell you say!"

      --
      I should have been a Geologist.
    64. Re:Yea, it's all the same. by SixDimensionalArray · · Score: 1

      Phew, that's a good question. I'm not sure there is a direct relationship between the column-oriented and non-relational features - it depends on how the non-relational features such as the XML datatype in SQL Server 2005 are actually implemented. I think the XML data type of SQL 2005 can serialize the XML and also index it into something resembling a row-oriented table hidden behind the scenes.

      However, taking into account that the non-relational features seem to be leaning towards storing specific data types which may be more complicated, like XML, binary objects, other tables, who knows, it looks like we are perhaps bordering on object-relational databases.

      So if you apply the strengths of a column oriented data store (fast retrieval speed, strongly typed columns, possibly very efficient compression due to data type/structure similarity), what it seems like is you get lists of objects, one object type per file, in a column store.

      In a row-oriented DB, you would get a tuple object which consists of other objects, but all related objects are stored in the same tuple, thus increasing write speed but possibly at the cost of read speed (you have to read the whole tuple to find a specific column). Basically, here you have lists of tuples, and tuples are objects themselves - very close to a column oriented data store with only one column of tuples - not necessarily as efficient for reading as partitioning the data into separate files by column/data type of each column in the tuple.

      So, maybe my answer is not worded well, but maybe the non-relational features are an admission to the fact that people want to store more complicated data types in new and different ways. Whether or not the storage is column or row based has to do with how one partitions and separates the data. To me these concepts are loosely related.

      Or I'm full of hot air. LOL ;)

      SixD

    65. Re:Yea, it's all the same. by jonadab · · Score: 1

      > Column stores are great (better than a row store) if you're just reading
      > tons of data, but they're much more costly than a row store if you're
      > writing tons of data. Therefore, pick your method depending on your needs.

      To me, this is really just performance tuning, which the RDBMS and/or the DB admin should handle behind the scenes. The application sitting on top of the database shouldn't need to know or care whether the underlying storage is organized by row or by column. The RDBMS should handle such things automagically, and the database administrator can tune such performance settings as necessary, but the application developer (to say nothing of the user) shouldn't have to be concerned with such matters.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    66. Re:Yea, it's all the same. by Allador · · Score: 2, Insightful

      The answer to what you're describing is not to give up relational dbs, but to design your schema correctly.

      There are tried and true approaches to the problems you describe; several actually for most of them, depending on your needs.

      If you run into these problems due to an evolutionary growth into these features, then its time to stop, take a step back, and re-architect your schemas to handle these needs from the get go.

      There's no reason at all to resort to hacks like stored procedures and triggers. These are only used when your schema is fundamentally mismatched to your needs, and so you have to be continuously cleaning up after your data mods.

      Any mature system doesnt ever delete 'entity' objects from the DB. That would just be silly (and wouldnt be possible if you had referential integrity set up correctly). So all of these sorts of entities have an 'active' column, that determines whether they're active or not.

      In the salary example you give, you DO want to store the 'current' salary field in the employees table, but then you also have a 'positions' or 'incumbents' table to store the history of things that change over time.

      I think the core problem you're describing is due to the fact that you're trying to store all of this stuff in one table. And thats not the correct approach.

      The 'event based storage' you're talking about is another table, with one row per event. Some folks advocate another approach, where entity tables are versioned, and all old versions are kept (sometimes moved to another table), so you can see the history of all changes for all time. Thats not a very well normalized solution, but there are times when its appropriate.

    67. Re:Yea, it's all the same. by stonecypher · · Score: 1

      Wow. Grab Mr. Peabody's Wayback machine and go get yourself hired by Oracle in 1981.

      --
      StoneCypher is Full of BS
  16. Aha! by Stanistani · · Score: 4, Funny

    The next big thing in DBMS:
    turning your head sideways.

    1. Re:Aha! by lucabrasi999 · · Score: 1

      Damn you! You owe me a cup of coffee.

  17. Should be, but isn't, and won't. by dada21 · · Score: 4, Interesting

    In my IT business, a vast majority of our top tier clients (grossing over US$100 million annually) are still using antiquated software that is still using a relational database backend. While these companies are generally VERY efficient in terms of providing services or products to their market, their accounting, purchase orders and project management software is decades outdated. Many of the companies that maintain these packages have merely made the interface more current (but still 5+ years old, but are still using terribly outdated software. I can't begin to tell you how often the words "FoxPro" and "MS SQL" come up and it ends up being a relational database "solution" or even worse.

    It is very frustrating because we do have programmers on staff that create third party plug-ins to these databases to try to make solutions that the OEM code doesn't. When you meet younger programmers, many of them are frustrated themselves to work on ancient solutions that have no hope of being upgraded, because these industries we work in are not in a rush to try anything new and shiny, but instead are happy with the status quo.

    I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is." That's money thrown out the window, employees who are frustrated (without knowing why), and forcing the company to lose efficiency by not being able to compete with newer companies that are utilizing newer technology to better their bottom line.

    Ugh.

    1. Re:Should be, but isn't, and won't. by Anonymous Coward · · Score: 0

      You want them to go without relational databases? There is a reason they said no.

      There are places (data warehousing, mining, etc) where having the data in a non-relational format is fine, but for most database uses a company has, a relational database IS needed.

    2. Re:Should be, but isn't, and won't. by nuzak · · Score: 1

      In my IT business, a vast majority of our top tier clients (grossing over US$100 million annually) are still using antiquated software that is still using a relational database backend.

      No, antiquated is using a non-relational backend, like raw ISAM. You know what you use to work with ISAM? COBOL. Now COBOL isn't bad for its verbosity, which is what most people complain about, because you can bind most of it to abbreviations in your editor. No, what makes COBOL so awful is that it accumulated so much weird cruft over the years. If you find the word "ALTER" in a COBOL source, go grab a bottle of extra-strength advil, you're going to need it)

      (This is all second-hand: I never had to work with it, but I had a housemate that made beaucoup bucks doing COBOL well before Y2K)

      --
      Done with slashdot, done with nerds, getting a life.
    3. Re:Should be, but isn't, and won't. by Orange+Crush · · Score: 2, Insightful

      I've been in the banking industry for the past 6 years and every bank I've worked at has relied on text-only server side applications that we connect to via various terminal emulators. The workstations are all modern, but we don't use anything more taxing than excel and an e-mail app.

      Why have none of them changed beyond a few interface bolt-ons? Well . . . one of them actually did once . . . and it wasn't pretty. Sure it was graphical and point-and-click and more "user friendly" in appearance. But the fact of the matter is that we were a production environment and what could be done with hotkeys in 3 steps was now a 12 step process clicking widgets etc. To be fair, there were still a few hot keys, but they were all different and everyone had to relearn. Productivity never quite got back to what it was under the old system.

      Larger companies will kick and scream to avoid "upgrades." Many of them have had horrifying experiences and they just can't risk getting stuck with software worse than what they have now.

    4. Re:Should be, but isn't, and won't. by Doctor+Faustus · · Score: 2, Informative

      I can't begin to tell you how often the words "FoxPro" and "MS SQL"
      You do know those aren't remotely comparable, right? FoxPro scales to more users than Access (due to tables separated into different files), but they're otherwise on a similar level in terms of what sort of jobs they're appropriate for. MS SQL Server is a full-fledged enterprise RDBMS. It may not scale quite as far as Oracle or DB2, but it get closer every generation, and having worked mostly in Oracle for the last year or so, I've been missing SQL Server.

      I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is."
      That was probably a polite way of saying "We don't believe you.". Maybe you made a case to them for why relation is obsolete, but you certainly didn't here.

    5. Re:Should be, but isn't, and won't. by Anonymous Coward · · Score: 0

      I work on a COBOL-based ERP system, it thinks all it's data is stored in ISAM files, altho it's really stored in Oracle (there is a database driver it uses.) My biggest beef with COBOL is absolutely it's verbosity, and setting up my editor to abbreviate stuff only works if I'm WRITING code, but most of the time I'm READING it. I spent HOURS working on a fairly subtle bug that turned out to be a simple logic error in an IF/ELSE statement, but it took a while "animating" (ie, stepping thru the code w/a debugger) before identifying the issue. I literally commented out one line and it worked.

      I'm currently spending my free-time learning Java, tomorrow's COBOL. It's extremely frustrating professionally to be stuck in the past. Even though I make a ton of dough for what I do (mainly undeserved too), that's not what I'm in this line of work for. I'm working to re-tool my skillset and get back to REAL software development, not this mickey mouse horseshit, even if it does run all the critical parts of Fortune 100 companies.

      - A

    6. Re:Should be, but isn't, and won't. by LWATCDR · · Score: 1

      You got it.
      Point and click is great for doing things that you have to do rarely. For tasks you have to often the keyboard rules.
      Think about typing as an example. If you took a person that didn't know how to type at all then a keyboard on the screen with the letters in alphabetical order would be much faster then teaching them how to type.
      A system that has been in use for years has probably evolved in to a very efficent system. The companies are willing to put in the time up front to train people to use them so speed is more important being intuitive.
      Think of programs like vi, emacs, and Unix vs Wordpad and Windows.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    7. Re:Should be, but isn't, and won't. by Anonymous Coward · · Score: 0

      Ha. I know exactly what you mean. I work in the banking industry and we are the people that time forgot when it came to IDEs and such. Vi and make are all we need. It's like the 1980s every day,

    8. Re:Should be, but isn't, and won't. by rtaylor · · Score: 1

      Microsoft and Apple focus on making programs easy to learn NOT efficient to use. Companies who train employees want programs that are efficient to use and have less concern over whether they are easy to learn.

      Sounds like the redesign sacrificed efficiency in order to make it easier to pick-up. This is a common issue.

      No reason it couldn't be both. Have a command line area in the new GUI application that supports all of the hot-keys and quick data entry methods but with improved presentation of the data.

      --
      Rod Taylor
  18. Re:slightly OT by MemoryDragon · · Score: 1

    The main problem is SQL is just a description language for set data, and a relational database is exactly that a set data.
    The main problem is so far nobody really has brought out something more reable to deal with sets in a mathematical sense, you could use mathematical operators but then things would become even less readable than SQL is.

    All approaches on the programming side I have seen (criteria objects etc...) make things only easier in some domains, after that you revert to plain sql and its derivates.
    Elminiating SQL would mean you probably have to eliminate the data storage model of sets as well.

  19. They might... by Anonymous Coward · · Score: 0

    Even Slashdot has become obsolete.

  20. Simple solution. by fahrbot-bot · · Score: 3, Funny

    I tried turning my Oracle server on its side to get column-store access. Strangely, I didn't see any increase in performance. Perhaps I'll try the other side...

    --
    It must have been something you assimilated. . . .
    1. Re:Simple solution. by QuantumRiff · · Score: 1

      Don't be so gentle when you turn it, and make sure the DB server is running.. If you happen to do it just right, you can make enough hard drives in your raid array hiccup, and lose the whole array. Then it can read the entire DB in a millisecond!

      --

      What are we going to do tonight Brain?
    2. Re:Simple solution. by bar-agent · · Score: 1

      You also have to make sure you're using left joins and right joins properly. See, if you use left joins, but turn the server counter-clockwise, all those joins have to run uphill to execute, which is slower. Right joins will work much faster.

      Unfortunately, sometimes you have both left and right joins. That's a pain, because the server's no longer load balanced -- some queries run uphill and some downhill. That's why a good row-store databases also include upper and lower joins. If you use those in preference to left and right joins, when the server is rotated, the queries are horizontal, which means no net up/down load.

      --
      i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
    3. Re:Simple solution. by Demerara · · Score: 1

      I tried turning my Oracle server on its side to get column-store access. Strangely, I didn't see any increase in performance.

      No, no, no - you have to re-mount the disk drives at 90 degrees - turning the entire server won't work.

      --
      Backward%20compatibility%20is%20over-rated
    4. Re:Simple solution. by aquatone282 · · Score: 1

      I threw my Oracle server out the window and while performance didn't approve, my attitude did.

      --
      What?
  21. Well, he WOULD say that by Random+BedHead+Ed · · Score: 1

    Let me get this straight by paraphrasing: Column databases are the wave of the future, says a column database distributor on his new column database blog. And Red Hat would recommend you run your new column database on Red Hat Enterprise Linux, perhaps? I wonder what brand of kit Dell would recommend I run RHEL on ...

  22. Re:slightly OT by plague3106 · · Score: 1

    Would you like it better with IronPython and Linq?

  23. I don't think this makes sense. Or does it? by w4rl5ck · · Score: 1

    From the perspective of an application developer, this is pure nonsense. I practically don't care wether my DB stores data in columns or rows or whatever.

    What I need is a good, consistant layer that can handle object-based tree structures - nothing more, nothing less. I want to dump my Java/Objective-C/C#/C++/PHP/Python objects in some storage layer, and I want to be able to get it back, search for it, etc.pp.

    Yes, relational databases are (or should be) dead for most modern application designs. But not because of RDBMs are going to be replaced by column-oriented DBMS (which is, from application perspective, no difference - IMHO), but because OODBs solves most application problems better (not that good solutions exists, yet... *sigh*)

    On the other hand, I never got any master degree. Shame on me. Just won a best paper award for a paper I did not even wrote. Maybe I just don't have the wits for this stuff ;)

    1. Re:I don't think this makes sense. Or does it? by DamnStupidElf · · Score: 1

      From the perspective of an application developer, this is pure nonsense. I practically don't care wether my DB stores data in columns or rows or whatever.

      You do care; you want it to be as fast as possible. This has less to do with how you interface with the database than it does with how fast reads are versus writes.

      What I need is a good, consistant layer that can handle object-based tree structures - nothing more, nothing less. I want to dump my Java/Objective-C/C#/C++/PHP/Python objects in some storage layer, and I want to be able to get it back, search for it, etc.pp.

      Look into operating systems with a single level store. They are not very common (one example is OS/400), but they are the perfect design for what you want. Objects are stored at permanent virtual addresses and everything that can't fit in memory is swapped out to secondary storage. No concept of loading or storing objects; they just exist until deleted.

    2. Re:I don't think this makes sense. Or does it? by Anonymous Coward · · Score: 0

      This is an interesting point, because it very much follows the row/column database debate.

      As an applications developer, you're primarily interested in transactional data, which means that your data model and underlying storage methodology probably end up reflecting your object model. This makes perfect sense for you.

      However, if you were a reports developer, you'd probably sing a different tune. You'll be far more interested in aggregate data and quick methods of retrieving it. You probably wouldn't give a rat's ass about the object model or if the data was organized in a similar fashion, because your output isn't based on it.

      Of course, the interesting bit is that large applications will usually have both transactional and reporting requirements, but no one-size-fits-all solution that satisfies them both.

  24. Object Databases by jjohnson · · Score: 3, Interesting

    Are they now officially an also-ran? Has the whole concept failed to be usefully implemented commercially, or will it be another Lisp--elegant, beautiful, and largely unused because it's kind of weird?

    --
    Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
    1. Re:Object Databases by SashaMan · · Score: 2, Insightful

      In a word, yes. I think there are a couple reasons for this:

      1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
      2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker. Suppose an object database can handle 95% of my queries with adequate performance. All well and good, but I'm totally screwed on those other 5%. On the other hand, if I was using a relational database with hibernate, hibernate might handle 95% of the queries with adequate performance, but for those other 5% I can workaround by writing custom SQL. With that setup I get the best of both worlds.

      I don't know of any attempts to use object databases on large enterprise projects that haven't been complete failures, with the failure always due to performance issues.

    2. Re:Object Databases by afidel · · Score: 1

      I know of quite a few that worked out fine. here is a list of successful projects from one of the biggest object database companies. For the areas where they fit in the performance of object database can be significantly better than traditional RDBMS's.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:Object Databases by Anonymous Coward · · Score: 0

      Lisp databases were abandoned very early on when it was discovered they took twice the storage space of other databases, and that the last half of the data consisted entirely of close-brackets...

    4. Re:Object Databases by LoveMe2Times · · Score: 1

      1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
      Actually, it's that object databases just haven't kept up with technology. 10 years ago, object databases were simpler and easier to use--a significant value add--than any ORM solution that I have seen today. I worked with POET Object Database in the late 90s, and ORM solutions today seem feeble in comparison. If object databases kept up to better take advantage of introspection or meta programming, there would be no contest. BOOST type libraries for a C++ interface, or annotation possibilities in Java, or the fun and games possible in modern scripting languages, would put ORM to shame.

      2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker.

      This is absolutely and completely false. It depends entirely on what you're doing. In particular, whether you're doing something that's kind of object based or something a little more relational... If your objects explicitly maintain their relationships to each other through pointers/references, and your goal is to occasionally look up and object and then mostly walk its reference chain, an object database will positively kill a relationally mapped version of the same thing. Sure, search times with OQL (the object version of SQL) might be slower to get the first object(s), but then walking the object graph doesn't require further queries. While "SELECT * FROM mydepedency WHERE id = 10" might be a quick query, it just doesn't stand up to *no* query. And yes, you can do joins and what not to get the dependent information up front in the first query, but even 10 years ago object databases were good about lazy lookup.


      However, if the relations between the fields are more important, like if you're data mining or pulling reports rather than making an application, then yeah, the relational model will win out.


      As many in this discussion have mentioned, you might use a row based DB for writes archive to a column based for reads. Similarly, you might use an object based DB for the interactive application that then dumps through to a relational db for reports and mining. Having worked with both, I can definitely say that I mostly see people using RDBMSs because that's "all there is" when they'd probably be better off using an object database. Unfortunately, while there's lots of great F/OSS RDBMSs, I don't know of any F/OSS Object DBs. But I work in an area with relatively small amounts of data where it's important that the interactive app be responsive and it's ok if the reports take a while. Meaning that I would push for a switch to an object DB in a heartbeat if I knew a good F/OSS one. We're already stuck on a niche proprietary platform, so I'm not about to recommend another one!

    5. Re:Object Databases by Anonymous Coward · · Score: 0

      har

  25. Re:slightly OT by lucabrasi999 · · Score: 1
    Can someone please declare SQL legacy technology? pretty please?

    TRUNCATE TABLE SQL_LANGUAGE;

    COMMIT;

    There, feel better?

  26. He may have a point by duffbeer703 · · Score: 2, Interesting

    For data warehousing, a higher or different level of abstraction may be useful and make database design easier, particularly as paralellism becomes more and more common. Storing rich markup language or media in a database might be problematic as well.

    But there's no way that RDBMS's are going away -- relational algebra simply solves too many data storage problems.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  27. Are relations obsolete? by roman_mir · · Score: 4, Informative

    Once someone shows that there is no longer a use for any relationship between data entries, then we'll be able to say that RDBMSs are obsolete. Actually both headlines (/. and the linked article) are mistaken about what Michael Stonebraker is saying. He is talking about read intensive applications mostly and he is talking about optimization of data for reading purposes. This does not mean that RDBMSs are obsolete for all uses, just that he sees a faster way to retrieve data for certain uses.

    1. Re:Are relations obsolete? by yerM)M · · Score: 1
      It depends on the application. I have done quite a bit of work with Jean-Claude Wippler metakit database. It is essentially a column based store with some clever underpinnings to make it relational as well. We mainly used it to track molecules and assay results coming off of our plate-readers. The nice thing is that this streaming data can be saved to disk and used in our LIMS system. What it does really well is scan and sort columns, the throughput is really quite amazing. It gets pretty slow for relational queries, but has the benefit that results of queries are persistant in the sense you can chain them together in a relational algebra fashion.


        When I was getting interested in this, I also came across kdb a relational database based on the K programming language. kdb is also a column based store with one huge benefit, the interpreter that analyzes the columns,K fits inside a level 1 cache. The throughput of kdb is immense which is why it is mainly used to track and analyze financial data and other streaming and real-time data. It used to be you could try it out for free, I'm not sure what the state is currently, but I suggest at least giving it a try, it's not cheap but it certainly changed my view of databases.


      Just my two cents.

  28. SenSage is earlier example of column-oriented DB by GringoGoiano · · Score: 1

    SenSage built a column-oriented DB in 2001 and has had much success with the approach for their fast-input, fast-query, high-density, multi-TB databases. Stonebreaker was on their technical advisory board. Interesting that he now centers his own startup on the same principles. See http://en.wikipedia.org/wiki/Column-oriented_DBMS.

  29. Re:Re : Are Relational Databases Obsolete? by ahem · · Score: 2, Funny
    Your sed is missing two expressions for full i18n:

    sed -e 's/crisps/potato chips/' -e 's/pound/dollar/' -e 's/note/bill/' -e 's/packet/bag/'

    --
    Not A Sig
  30. Re:Re : Are Relational Databases Obsolete? by Anonymous Coward · · Score: 0

    sed -e 's/pound/pound note' -e 's/dollar/dollar bill'
    To be applied to your sedscript. That's right, I sed'd your sed.

  31. Yeah, but look at all the publicity by Colin+Smith · · Score: 0, Flamebait

    Vertica's getting a load of it free.

    Hey, you never know, some good may come of it. Maybe some of the "Nobody ever got fired for buying Oracle/SQL Server/DB2" people will have to explain why they're using inappropriate technology.

    --
    Deleted
  32. sed -e 's/packet/bag/' by Anonymous Coward · · Score: 0

    There, finished that off for ya. :)

  33. This will be the year by Tribbin · · Score: 1

    This will be the year of the column based database.

    --
    If you mod this up, your slashdot background will turn into a beautiful sunset!
  34. Re:slightly OT by Anonymous Coward · · Score: 0

    Staying OT with you, I do not agree with your opinion on SQL in general, although using it for basic CRUD in an object-oriented application is tedious and bug-prone. For that, I warmly recommend the up-and-coming Python-based ORM called SQLAlchemy. See: http://www.sqlalchemy.org/.

  35. paradigm shift! by sohp · · Score: 4, Funny

    Along with Procedural Programming, this could REVOLUTIONIZE the software industry!!

    1. Re:paradigm shift! by bigmouth_strikes · · Score: 1

      Yeah, procedural programming with Erlang, with column store. I'm just a bit torn on if it's going to be running on big iron or on thin clients...

      --
      Oh, I can't help quoting you because everything that you said rings true
  36. You have to have a successor to be obsolete by Anonymous Coward · · Score: 0

    Yep, I'd say they're obsolete given the wild success of their replacement(s)....oh...wait....

  37. Careful by HangingChad · · Score: 1

    The Dvorak keyboard is more efficient by a factor of 10 and you don't see it taking over the keyboard layout landscape.

    Just because something is "better", even in technology, doesn't mean it's going to take over.

    I've also lived through the decline of mainframes...still around. The internet was going to replace faxes...I still have a fax machine.

    Linux is better than Windows, columns are better than rows but I wouldn't get all a-twitter over either of them just yet. Particularly from someone selling column based data stores.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
    1. Re:Careful by corvair2k1 · · Score: 1

      The Dvorak keyboard is more efficient by a factor of 10 and you don't see it taking over the keyboard layout landscape.

      I currently type at least 110 words per minute, no errors. Do you mean to tell me that I can type 1100 words per minute with Dvorak?

      Actually, I'm not aware of a well-constructed study that proves the Dvorak layout as superior. Here is an interesting article on this subject: http://www.reason.com/news/show/29944.html

    2. Re:Careful by DavidHumus · · Score: 1
      Apparently, use of the Dvorak keyboard hurts math skills. Looking for evidence of the assertion

      The Dvorak keyboard is more efficient by a factor of 10 I found this site http://www.theworldofstuff.com/dvorak/ which starts with a similar assertion about the Dvorak keyboard:

      It is estimated to be 12 to 20 times more efficient.... and followed it with several statements that don't add up, like

      Some have seen a 200-300% increase in their speed. and

      A Dvorak typist typically makes half as many mistakes.... and

      ...in school. I could only do about 30 WPM. ... Now I'm typing over 60 WPM.... and

      ...typing 57 to 62 WPM in comparison to typing 45 to 50 WPM....
    3. Re:Careful by rossifer · · Score: 1

      The Dvorak keyboard is more efficient by a factor of 1.05
      There. Fixed that for you. Dvorak is a little better than Querty, but you're fooling yourself if you actually believe it's hugely better.
  38. rtfa before posting by jilles · · Score: 3, Informative

    I can understand people not reading every link on a slashdot article they comment on. But if you post the bloody link, is it too much asked to actually RTFA?! It's an article about a column. The actual column is quite interesting.

    To add some content, this is about optimal storage for SQL databases in a data warehouse context where there are some interesting products that use something more optimal than the one size fits all solutions currently available from the big RDBMS vendors. The API on top is the same (i.e. SQL and other familiar dataware house APIs), which makes it quite easy to integrate.

    Regarding the obsolescence question, one size fits all will be good enough for most for some time to come. Increasingly people are more than happy with lightweight options that are even less efficient on which they slap persistence layers that reduce performance even more just because it allows them to autogenerate all the code that deals with stuffing boring data in some storage. Not having to deal with that makes it irrelevant how the database works and allows you to focus on how you work with the data rather than worrying about tables, rows and ACID properties. Autogenerating code that interacts with the database allows you to do all sorts of interesting things in the generated code and the layers underneath. For example, the hibernate (a popular persistence layer for Java) people have been integrating Apache Lucene, a popular search index product, so that you can index and search your data objects using lucene search queries rather than sql. It's a quite neat solution that adds real value (e.g. fully text searchable product catalogs are dead easy with this).

    Column based storage is just an optimization and not really that critical to the applications on top. If you need it, there are some specialized products currently. The author of the column is probably right about such solutions finding their way into mainstream products really soon. At the application level, you'll still be talking good old SQL to the damn thing though.

    --

    Jilles
  39. Re:slightly OT by Just+Some+Guy · · Score: 1

    For that, I warmly recommend the up-and-coming Python-based ORM called SQLAlchemy.

    Exactly. If you're using Python, you're not allowed to complain about SQL because there are good alternatives. Besides Alchemy, Django has a very nice object mapper of its own. Both of those have progressed to the point that writing raw SQL is simple unnecessary for almost any application development.

    --
    Dewey, what part of this looks like authorities should be involved?
  40. Wrong approach? by Aladrin · · Score: 3, Interesting

    Maybe his approach is all wrong. The database my company uses has MANY tables that are rarely written to, but a few that are written to ALL the time. Instead of trying to cram his 'one size fits all' database scheme down our throats and replace the current 'one size fits all' database scheme, maybe he should be trying to create a database engine that can do both.

    I think you would have to determine the main use of the table beforehand (write-seldom or write-often), but the DB engine could use a different scheme for each table that way. I know some will claim that it can't be more efficient to split things this way, but remember that this guy is claiming 50x the speed for write-seldom operations.

    As for Relational Databases... How is this exclusive to that? This is simply how the data is stored and accessed. If he is claiming 50x speed-up because he doesn't deal with the relational stuff, that's bunk. You could write a row-store database with much greater speed as well, given those parameters.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
  41. Specialized versus generalized? by dpbsmith · · Score: 3, Interesting

    '"In every major application area I can think of, it is possible to build a SQL DBMS engine with vertical market-specific internals that outperforms the 'one size fits all' engines by a factor of 50 or so," he wrote.'

    I know very little about DBMS systems, but I thought it has always been true that you can achieve monumental performance increases by building somewhat specialized database systems in which the internals of the system make assumptions, and are tied to, the structure of the data being modelled. In fact, when RDBMS systems came in, one of the knocks on them was that they were far more resource-intensive than the hierarchical databases they displaced. However, the carved-in-stone assumptions of those models made them difficult and expensive change or repurposed.

    I'm sure I remember innumerable articles insisting that "relational databases don't need to be really all that much terribly slower if you know how to optimize this that and the other thing..."

    In other words, as an outsider viewing from a distance, I've assumed that the increasing use of RDBMS was an indication that in the real world it turned out that it was better to be slow, flexible, and general, than fast, rigid, and specialized.

    So, what is a "column store?" It sounds like it is an agile, rapid development methodology for generating fast, rigid, specialized databases?

    1. Re:Specialized versus generalized? by Anonymous Coward · · Score: 0

      Instead of storing a "row" (of bytes) for a particlular table (file) you would store EACH column (field) in its own file. It is an auto-indexed data table basically. Much faster for reading of general data (and yes complete rows). I can't really see why it would be all that much slower to write as well. If the server hardware is fast/capable you can write each (eh hmm, field) column out in a different thread. I'm going to write a new DBMS system like this, it should be fun. Anyone else?

  42. Mildly Confused by Duffy13 · · Score: 1

    While I didn't particularly pay much attention in my database class, or go to it that often, from my current work with databases and a quick skim of some definitions for RDBMS, it strikes me that a good portion of people in this thread and the articles are using the term RDBMS incorrectly. (Though some of the posts appear to be in agreement with me) As far as I can determine RDBMS is solely (and simplified) the concept of relating data between different tables to decrease the repetition of said data. It's a method, a widely applied method, but just a method, not an actual type of database storage.

    Sooo, wtf does RDBMS have to do with storing data with either columns or rows in a file?

    --
    "Now you know, and knowing is half the battle!"
    1. Re:Mildly Confused by jjohnson · · Score: 1

      You're correct in theory, but in practice all major RDBMSs are row-based, so the two terms are synonymous in use.

      --
      Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
    2. Re:Mildly Confused by Duffy13 · · Score: 1

      That brings me to question #2, can the relational model be applied to column-based DBs?

      --
      "Now you know, and knowing is half the battle!"
    3. Re:Mildly Confused by jjohnson · · Score: 1

      As you pointed out, the relational principle is orthogonal to row vs. column. The latter idea has to do with the most efficient way to store the data for use on a disk, while the former is a theoretical model about how to structure data to eliminate redundancies and (false) dependencies.

      In a nutshell, row based storage means storing records together, while column based means storing columns together. Cutting aside the marketing hype in the OP's link, column based storage has some obvious efficiencies for data warehousing, where it's written once and read many times, usually selectively by column. Row based is still obviously superior where writing is frequent, though, because records are stored together. Nothing in the relational model favours one over the other necessarily, and I'm certain that in ten years the major vendors will have you select row vs. column based storage when setting up a new database (or some fusion of the two). The OP trumpetting row based storage is justi hyping a performance feature of his software.

      --
      Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
  43. Perl Objects have both column and row DB advantage by goombah99 · · Score: 4, Interesting

    Traditionally perl-objects are hashes with one blessed hash per instance. The hash contains all the instance variable values using their names as keys.

    instead one can use blessed scalars holding a single integer value for instances and let the class variable contain all the instance data in arrays indexed by the instances scalar value.

    This technique was originally promoted as an indirection to protect object data from direct manipution that bypassed get/set methods. But it also allows the object to be either row or column oriented internally. that is the class could store all the instance hashes in an array indexed by the scalar. or it could store each instance variable in a separate array that is indexed by the scalar value.

    Thus the perl class can, on-the-fly, switch itself from column-oriented to row-oriented as needed while maintaining the same external interface.

    Of course this is not a perl-exclusive feature and it can implemented in other languages. It just happens to be particularly easy and natural to do in perl.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  44. Elegant? by Anonymous Coward · · Score: 0

    I would hardly use the word elegant to describe object databases? They seem elegant, right up until you try to version a schema or point a reporting tool at them. Interesting, yes. Elegant, hell no.

  45. Soon... by lpangelrob · · Score: 3, Funny

    The near future. Mr. Stonebraker walks into a store.

    Mr. Stonebraker: How much are these plums?
    Checkout girl: Plums? They're $0.99, $1.39, $12.49, $15.99, $26.38, $13.37...

    1. Re:Soon... by Anonymous Coward · · Score: 0

      Where the hell did he find a store with human checkout staff?

    2. Re:Soon... by lpangelrob · · Score: 1

      They still have them in most stores, (sometimes in addition to self-checkouts). It's probably easier to steal stuff if you just have self-checkouts.

  46. kdawson, you asshole by Sloppy · · Score: 1

    Stop doing it.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  47. This can't be right... by QunaLop · · Score: 1

    I can see how column stored data is faster for querying, but for these benefits, it will be slower for writes, and the quote claims that databases this way will replace row-format databases. In many cases it may, however, there are many write intensive applications. Additionally the index features of current row-store database engines is pretty much equivalent to column store, (including possible indices on column store tables.)

    Ideally a RDBMS would allow a admin to select the primary storage strategy. To me, ideally lookup tables could be column and transaction tables could be row... seamlessly. And as mentioned previously, the term RDBMS really is used incorrectly in the summary, though I suspect it was meant as "traditional RDBMS"

    1. Re:This can't be right... by DigitalSorceress · · Score: 1

      Yep, that's more or less exactly what I was thinking: an index IS a column-based view, so with an RDBMS, you get the best of both worlds.

      Generally, I'm suspicious anytime someone declares a widely-used technology obsolete. Look at IPv4... IPv6 may be vastly superior, but millions upon millions of networks still rely on the tried and true. How about all those declarations about how business will be totally paperless. It's an admirable goal - many companies are shifting, but paper's going to be with us for a long time yet.

      Same's true for databases. Sure, maybe this approach WILL revolutionize data warehouses, but to me that just means we will be using column-based stuff as yet another tool. I've played around in Object-oriented databases... I know some folks who swear by them... did they completely replace traditional RDBMS? nope... just another tool.

      "If the only tool you have is a hammer, you tend to see every problem as a nail." --Abraham Maslow

      --

      The Digital Sorceress
  48. Are Relational Databases Obsolete? Not at all by SpaceLifeForm · · Score: 3, Informative
    The Slashdot article headline is trollish.

    The relational concept will still exist regardless of the underlying storage methods.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
    1. Re: Are Relational Databases Obsolete? Not at all by lgw · · Score: 2, Interesting

      The "relational" concept existed before SQL, really. It's just a question of whether you want to do the work on the client side or the server side.

      The modern RDBMS is good when the pipe from client to server is much smaller than the pipe from server to backing store/cache. Minimal communication for maximum results. The trade-off, of course, is that the server needs lots of resources because it's doing significant work on behalf of every client.

      "Non-relational" databases still have their place today, however, in embedded/appliance environments. When the pipe between the client and server is very large (shared memory), and the server has only a token amount of memory for cache, a RDMBS is a poor choice. The distinction is only the implementation: nothing stops you from using relational concepts and pushing the work to the client side. But the implementation difference really changes what's fast and what's slow.

      Similarly, a column-based store is just an implementation difference from a traditional RDBMS, but it also makes a real change in what's fast and what's slow, which of course means a real change in what makes a good DB design.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re: Are Relational Databases Obsolete? Not at all by jotaeleemeese · · Score: 1

      The Slashdot article headline is trollish. You jest surely. This is /. !

      --
      IANAL but write like a drunk one.
    3. Re: Are Relational Databases Obsolete? Not at all by mattpalmer1086 · · Score: 1

      I think I get what you mean, but you are kind of mixing up a discussion of implementing a client-server architecture in different environments with the logical relational database model. I guess your choice of database implementation may have performance implications in different networked environments, but this has nothing to do with the relational model per se.

    4. Re: Are Relational Databases Obsolete? Not at all by lgw · · Score: 1

      I guess your choice of database implementation may have performance implications in different networked environments, but this has nothing to do with the relational model per se. Yes, that was my point about column-oriented storage exactly. :) Just because it's "still relational" doesn't mean it's not a significant change, when it comes down to what's going to solve a given problem (with perormance requirements), and what's not.
      --
      Socialism: a lie told by totalitarians and believed by fools.
  49. Keeping It Simple... by Prototerm · · Score: 0, Troll

    IMHO, databases would be much, much different today if IBM had extended the VSAM file type from the mainframe to the PC (for the uninitiated, think of a combination flat-file table with one built-in index). In my experience, the vast majority of database requirements of office workers are simple, so simple that even applications like Access are overkill. What do people use a database for? Scan through an entire table looking for one specific piece of data using only one key. VSAM files would be perfect!

    And things like the column-based database table (which strangely enough strikes me as an attempt by a spreadsheet user who naturally thinks sideways instead of down to create a sideways-oriented database) is even greater overkill.

    Besides, everyone knows that the database query users *really* want to run is: "SELECT * FROM *", and get upset when told they can't do it. That's another reason they prefer flat file databases to relational. It's easier to print the data out on fan-fold paper and do searches the old fashioned way.

    --
    "My country, right or wrong; if right, to be kept right; and if wrong, to be set right." --Senator Carl Schurz (1872)
    1. Re:Keeping It Simple... by nuzak · · Score: 1

      > IMHO, databases would be much, much different today if IBM had extended the VSAM file type from the mainframe to the PC

      They did, it's called OS/2. The race is not always to the swift nor the battle to the strong...

      --
      Done with slashdot, done with nerds, getting a life.
  50. When is any Data technique obsolete? by kildurin · · Score: 1

    Sorry, but I still believe in B-Trees, ISAM files, fseek(), ftell() and text configuration files. Every time something new comes out somebody calls the old thing dead. I think a huge # of uses of databases could be handled by the techniques previously mentioned but we hit up Mysql causes its easiest to get going. Will this ever end? This is one of the main reasons I see for people getting that shiny new 3.2Ghz system and seeing that it isn't really that much faster than the 200Mhz Pentium. Lemmings use the newest ideas which are actually take more computing power to use but less brain power to implement. Not actually sure on this technique but I bet if I dig I'll find it.

  51. I would love to have a DB that stores it both ways by Boap · · Score: 1

    I would love for a DB to store in rows when I have data that gets written to a lot but where the data is mainly just read I would love it to be stored in the column format.

    I would select it based upon the table that way I could control the data easily without making it too complicated.

    This would give the best of both worlds IMO

  52. Re:IMS--Hierarchical DB harder to use? by cdn-programmer · · Score: 5, Interesting

    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    I gave up on that interface.

    ---------------

    Oracle had pre-compilers. They did the same damn thing. The code generated by the pre-compilers was just awful.

    ---------------

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Nooo...

    In the old days one could read and write into a flat file at a given location with a single statement or function call depending on the language. Of course "where" to read and write became a real issue and I do fully understand the complexity of file based tree structures and so forth, especially since I wrote a lot of code to implement these algorithms.

    The thing is now we have RDBMS and other solutions that give us the data organisational abilities we need - and we lose the ease of mapping these structures into a suitable structure or object in the programming language.

    I for one do not think we have stepped forward very far at all.

    -------------

    I'll toss in a case in point made by a good buddy of mine who just happens to be one of the top geophysical programmers in this city.

    One of his clients was running an application hooked to an Oracle database running on a fast SUN. Run times were measured in close to a day.

    Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.

    As my buddy says - He will NOT use any RDBMS. He can take 5 views of the data comprising 1000's of seismic lines and the user can click on any trace number, line number, well tie and so forth and in real time he can modify all views of the data on as many as 5 s

  53. It's a simple trick, really by MarsDefenseMinister · · Score: 1, Insightful

    This guy isn't really making a prediction, he's just generating publicity for himself, and that requires a gimmick. Creating the gimmick is easy, and it's done like this:

    First, pick a dualism. Any dualism. Here's some examples:

    1) You can type in a word processor, and you can type in a spreadsheet.
    2) You've got computers, and you've got networks.
    3) Skirts can be short, or skirts can be long.

    Now, make the unsupported claim that the dualism is really a continuum. It might be true, or it might be bullshit. Like this:

    1) Things you can type in resemble word processors or spreadsheets, to certain degrees.
    2) You can do computing activity on an isolated computer, or you can do it on a totally networked system.
    3) Skirts can be any length between long or short.

    Now, assert that there is movement. Look at which end of the bullshit continuum you've created that your asserted movement is pointing at. There's your prediction for the future. If you hype that like a pro, you too can be a tech pundit on Slashdot or even somewhere important.

    Here's the predictions that result:

    1) Over time, word processors are going to become more like spreadsheets. Entire sections of documents are going to be calculated by formulas.
    2) In the future, the network will be the computer.
    3) Next year we're going to have short skirts. Yay!

    How often do we see this kind of thing? Very often. And sometimes they can even be good ideas, worthy of making products out of. But even if they are terrible ideas, they're useful for hyping yourself.

    Here's some more:
    1) This article - Rows and columns. You assert that rows are more important that columns. Except in the glorious future where columns take over.
    2) We've got Embedded processors outnumbering desktop computers. In the future, no desktops, just embedded processors in our skulls.
    3) Paper is giving way to computers. In the future, completely paperless office!
    4) Mouse, Joystick. In the future, you tilt your mouse.
    5) Structured and object. Obviously, everything's going objects in the future. The death of structured programming means that loops and decisions are obsolete.

    And so on.

    I know this because I was a consultant. It's a basic skill.

    --
    No weapon in the arsenals of the world is so formidable as the will and moral courage of free men.-Ronald Reagan
  54. I agree by GamblerZG · · Score: 1

    I agree. In web development column-oriented storage is a way to go. I even tried to simulate stuff like that with PHP and MySQL, and it worked, but the maintenance of such databases was too difficult. The problem is, whenever you suggest that there might be something wrong with relational databases (for some application areas) there is always some smart-ass who accuses you of "not getting" it and promoting "outdated" hierarchical or network models. "ZOMG, u want to have efficient tree support with variable nodes? U r in teh stone age. We, on the other hand, are firmly rooted in 70s!"

  55. Easy out by Anonymous Coward · · Score: 0

    "He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."

    Can't I just turn my head sideways?

    (PS - postgresql rocks)

  56. Uh, isn't the problem finding the correct row? by w3woody · · Score: 1

    Here's what I don't understand. Most relational databases provide the ability to create one or more indexes upon one or more columns within a database--essentially creating an alternate, column-specific file which maps column values into the row data object. In other words, an 'index' upon a column is a column-oriented database object.

    So how is it that creating a column store makes reading a database more efficient, when indexing a row-oriented database essentially creates a column store alongside the row-oriented data?

    The only place where I can see column-oriented databases would be more efficient is by using uniformity of datatype across the data in a column can make finding un-indexed data far more efficient. In other words, if you have an un-indexed column storing integers, the file becomes dirt-simple: it's essentially an array of values on disk, and finding the 30th row in a particular column of integers is as simple as a fseek() followed by an fread().

    1. Re:Uh, isn't the problem finding the correct row? by Aladrin · · Score: 1

      So you are asking the difference between storing, then creating and maintaining an index vs just storing it that way to begin with? I don't think that even deserves an answer.

      Instead, it sounds like indexes could be made for column-store data that optimize the slow process for it: Writing. As most databases do more reading than writing, it makes sense to optimize for reading first, then writing, unless writing is many, many, many times slower. Of course, until someone has written optimized Databases both ways (they have now, apparently) you can't measure the performance differences. This is a non-article and there's no point in arguing it until multiple people have run good tests on the performance of each.

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    2. Re:Uh, isn't the problem finding the correct row? by w3woody · · Score: 1

      So you are asking the difference between storing, then creating and maintaining an index vs just storing it that way to begin with? I don't think that even deserves an answer.
      But think about it for a second: the only difference between creating and maintaining an index with column data and "storing it that way to begin with" is...what, exactly? Somewhere you have a data structure representing a table which gives the file name of the column data/index data (which I assume here is exactly the same thing, since an index is essentially column-oriented data stored using a data structure (B-Tree, hash table, etc) which optimizes read operations), and, for a row-oriented database, a paginated file containing the row data itself.

      The only thing a column-oriented database seems to do is eliminate the row-oriented file and provide column-oriented information for each column instead.

      Now for a database with a number of fixed-width fields, looking up a particular row should be relatively straight foward; each column data item is a fixed index into the file. But if you have a variable-width data field, you're going to have to store those data fields in some sort of paginated file structure anyway--which will look like a row-oriented database but with a whole bunch of small objects instead of a few large ones.

      And if you have a table with five variable-width columns and perform a select statement on those five columns, you're going to wind up doing five times the work looking up the five column records within five paginated files.

      I don't think that even deserves an answer.
      Excuse me, but when you have something this complicated--which you admit when you wrote: "This is a non-article and there's no point in arguing it until multiple people have run good tests on the performance of each."--treating my question like it's a dumb one and like I'm an idiot is rather rude.

      I'll bet you're also one of those people who write 'lusers' to talk about the customers of the code you write/manage/play with, too.
  57. They are not merely a backend preference by Jayson · · Score: 1

    The differences between column and row orientation is not merely in the backend. Most column dbs support a more extensive query language that is made possible by the column orientation. They tend to be far better at ordered data (time series, symbol groupings, etc) and have extensions to exploit the ordering, such as moving average functions that aren't poorly bolted on.

    SQL is also a huge part of the problem, and one of the benefits of column dbs is that is goes beyond SQL and allows more expressive queries.

    Also I don't know where this poor idea keeps coming up that column-oriented tables aren't good for reading. They are often used for things like stock market data that spin tremendous amounts of data into a table very quickly, far faster than any row-oriented db could handle (especially if you need to index the row-based table).

  58. This is pretty ridiculous by hey! · · Score: 1

    Relational or not has nothing to do with how data is stored. Nor does it have anything to do with performance.

    It's a standardized abstraction of a record keeping system. As such it has advantages and disadvantages.

    The strategy behind relational databases is what we in modern parlance would call "separation of concerns". The application programmer concentrates on what he wants. The RDBMS systems programmer concentrates on storage and optimization. The DBA concentrates on tweaking and balancing the viewpoints of different users of the data.

    The payoff for this is that data is no longer strongly coupled to a single purpose. It also turns out that by working on a small number of standard paradigms for years and years, the RDBMS developers can provide application programmers with systems that give them what they want faster than the app programmers would bother to make it in most instances. Certainly more reliably.

    The cost is that the people who do this have to work within a common model or set of abstractions. It has always been possible to obtain faster results by throwing out the idea of a common abstraction and coding closer to the metal. It always will be. Nor has it ever been realistic to assume that the relational model would be optimal for all application domains; merely most (although we weren't so cognizant of this back in the 80s).

    There are three things that have changed since the advent of the relational model, none of which obsolete the model, but do somewhat change its slice of the application domain pie.

    (1) The vision of Grand Unified Databases in which all an organization's data assets live seamlessly has proven to be marketing malarkey. It's proven necessary to separate transaction processing from analytical processing of course, but even more fundamentally organizations have structure in part to hide irrelevant or sensitive data or to control the scope of policy making in various areas. Even though some vendors (notably Oracle) are somewhat better at creating massive database management systems with distributed administration, in the end the vision of enterprise GUDs would never have worked because people in different parts of an organization can't spend their lives in meetings hashing out each other's data models. Everybody's heard the analogy of the blind men and the elephant, but the reason organizations have parts is because it's more efficient to specialize and have a trunk expert who thinks of an elephant as snaky.

    In general, the need to share data between databases is greater than we thought it would be. In part this is a consequence of the impracticality of Grand Unified Databases, but also because everybody is, in effect, networked to everybody else these days.

    (2) There are new application domains which involve processing huge amounts of data for relatively narrow sets of operations. Examples would be DNA databases in bioinformatics or document fingerprinting for web indexing. These new applications don't diminish the need for relational databases in their traditional application domains.

    (3) There's just too much friggen stuff for people to learn. I've worked with people out of master CS programs who could do amazing XSLT transformation, could draw UML for common design patterns blindfolded, but could not give a cogent explanation of what NULL means in a relational database system. The so called object-relational impedance mismatch has always been exaggerated in my opinion; the problem is finding people who really understand the relational paradigm and tools.

    Overall, I think the relational model is a huge success, and will continue to be, although other models may be useful in restricted problem domains. However, the way we use the relational paradigm should be realistic.

    One example is the concept of identity in the relational model. Nearly every relational database design flaw eventually boils down to getting the criteria for ide

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    1. Re:This is pretty ridiculous by Anonymous Coward · · Score: 0

      Relational or not has nothing to do with how data is stored.

      Thank you, I think you're the only commentor here that HAS A FUCKING CLUE.

      To repeat: the relational *model* DOES NOT say ANYTHING about storage. It is an abstract model for expressing data management tasks. Take some data stored in relations, representing facts, and derive new facts from them. THAT'S IT.

      The relational model therefore, says NOTHING about performance. To say "relational databases are slow" is like saying "a bridge designed using calculus is expensive". It makes no sense whatsoever.

      The relational model also DOES NOT specify a type system. Storing complex objects, integers, or XML text within individual attributes are all 100% acceptable to the relational model. You just need to have *A* type system. You could use Java's type system for instance. You just need a finite set of possible values for each type, and an equality test. That's all.

      The relational model also DOES NOT specify any design criteria. The relational model does not care if your tables are normalized. The relational model does not care how many columns in your table. The relational model does not care if you store your company's data entirely within a single column in a single row.

      The relational model IS NOT A PRODUCT. SQL DBMS systems, which are often called RDBMs, violate the relational model in at least a dozen ways. THE RELATIONAL MODEL IS A *MODEL*. You use it to understand data management, to work with it in an abstract level with proven theorems, and to describe data management tasks with a general vocabulary.

      Of course we can use the relational model to *discuss* all these topics and more.

      Whew. The level of database ignorance among supposed professionals is astounding, so I feel these points are worth repeating over and over again.

      Nor has it ever been realistic to assume that the relational model would be optimal for all application domains; merely most (although we weren't so cognizant of this back in the 80s).

      Don't confuse people, the relational model is a (and at the moment *the only*) general model for data management. It can apply to all *data management* problem domains. Give a counterexample, if you don't believe it.

      The vision of Grand Unified Databases in which all an organization's data assets live seamlessly has proven to be marketing malarkey.

      The relational model has no problem with this however. The RM includes the highly powerful concept of *views*, which would make it perfectly rational to have a single logical data store with multiple physical storage mechanisms, multiple interfaces (SQL and hierarchical, for instance), all kept consistent. But no such products exist, that I know of. In order to implement them, you'd need to have a deep appreciation of the abstraction that the RM provides. As soon as more people understand the RM, they will begin to demand these products.

      There's just too much friggen stuff for people to learn.

      Then don't learn bullshit like UML and XSLT until you understand *fundamentals*. They are *implementations*, which come and go, but the RM can be used to understand any data management system, past, present, or future.

      but could not give a cogent explanation of what NULL means in a relational database system

      If you come up with the answer to that, let me know. :-)

      One example is the concept of identity in the relational model.

      The relational model has no problem with identity. Identifying things is a fundamental part of it (that's why you need an EQUALITY TEST for your type system, at a bare minimum.. so you can differentiate one thing from another).

      It's bad OO programming and web URLs that have driven the use of single "universal identifiers", but this is bad design: a rule of thumb in data management is that your *logical* model (database operations) should be very close to your *conceptual* model

  59. How is this visible to the user? by Anonymous Coward · · Score: 0

    You can store data internally in either row-major or column-major order, as you wish, and still present it to the user in exactly the same way onscreen. So how does this sort of internal storage change make the relational database obsolete?

  60. unbiased opinions bore me by Anonymous Coward · · Score: 0

    I am more interested in biased facts.

  61. Re:IMS--Hierarchical DB harder to use? by Sharkeys-Day · · Score: 2, Informative

    I think a place to start is to ask how to map language structures and RDBMS structures into a common denominator. One should never be looking at function calls with over 1000 parameters. That is just plain stupid. One should also never be dynamically mapping each and every tidbit of every field in a row on the fly at run time and especially so for each row in a table. Quite right, which is why programmers who still have have their sanity use JDBC or DBI. This part of your problem has already been solved at least twice.

    Here's how I write out a customer record:

      $dbh->do(
          'insert into customer (id,name,yada1,yada2,yada3) values (?,?,?,?,?)',
          undef,
          @customer{id,name,yada1,yada2,yada3}
      );

    I think that's even easier than your 3rd-gen code, and I didn't have to write my own indexing code.
  62. If i had $1 for everyone that predicted... by pjr.cc · · Score: 1

    If i had $1 for everyone that predicted the end of the RDBMS i'd have about $20 by now... hmmm, thats not very impessive is it?

    If i had $1,000,000 for everyone that predicted the end of the RDBMS i'd be a rich man by now. Ahhhh, thats better.

    Seriously though, i can remember any number of dbms concepts that have come out and supposedly been potential for replacing rdbms as "the thing" for data storage. OODBMS springs to mind when c++ started to take hold, and even more so when java took off, "yeah, we need an OODBMS"

    In reality, rdbms is and probably always will be flexible enough to handle almost any role. Maybe when we all have enterprise (starship that is) AI in our homes we'll have developed some form of natural data storage that will become the norm, but until then - i'll be a cynic!

    1. Re:If i had $1 for everyone that predicted... by NoOneInParticular · · Score: 1
      The big thing lacking in an RDBMS is a proper way to steer it from an application. Currently, there's a gap, where strings are fired into the RDBMS, which will respond with an exception if something goes wrong. Nice for independence, lousy for productivity. There is no common interface that lets the application programmer figure out errors that are deducible from static information. If you know the schema (should be definable in the application language), many errors can be statically caught by the compiler. If the schema in the application does not match the schema in the database, the application should fail with a linker error. It should not be possible (from a statically typed language) to even attempt to write a string in an integer.

      Currently such interfaces can be build (viz. Hibernate), but it still needs be implemented by the very very slow process of: firing strings to the RDBMS, parsing the string in the RDBMS, creating an execution plan, executing the plan, wrapping the result in strings, sending back the strings, unwrapping the strings, looping over the result set, putting the row strings into cells, putting the cells into the applications primitive types, and resume execution. With native RDBMS support for working together (instead of against) programming languages, dynamic linking to tables/views, strong type checking, possibly static constraint checking, you might be able to imagine that the whole world of hurt that is writing database applications might become easier over time. Instead, we're at a standstill of over 30 years.

      Currently, he state of affairs is still pathetic.

  63. Heard this one before... by Anonymous Coward · · Score: 0

    I've been hearing people ringing the death bell of RDBMSs since the mid 90's. If I had a nickle for every Object Database I was forced to use because it was the next big thing... well, I'd have probably 20 or 30 cents, but the important thing is that every single one of those systems never because "the big thing". Some of the OLAP systems I've seen look promising because #1, they interoperate with relational databases nicely, and #2, they actually solve some performance problems. Still, does the storage engine really equal a paradigm shift?

  64. From the wikipedia link by sheldon · · Score: 1

    In practice, row store architectures are well-suited for OLTP-like workloads which are more heavily loaded with update transactions (writes) rather than queries (reads). This is because row stores are extremely "write friendly", in that adding a row of data to a table requires a simple file appending I/O. On the other hand, column stores are well-suited for OLAP (data warehouse)-like workloads. Such workloads can be characterized as "read-mostly", primarily because of the relatively high cost of queries (which might perform complex analysis of several TB of data) vs transactions (which primarily consist of simple insertions).


    So apparently the answer to the /. question "Are Relational Databases Obselete?" is...

    NO.

    Considering we're talking about row store versus column store, and the layout of data is dependent upon whether you are doing OLTP versus OLAP.
    1. Re:From the wikipedia link by wwilbee · · Score: 1
      I do not think the big database vendors are worried. The point that the workload should drive the storage architecture is well taken.

      I am sure the major Database vendors (Microsoft, Oracle, etc) have active research in these areas.

      In the mean time for most conventional applications the current crop of products serve well. Also there is a lot of work being done to add kewl new features to the existing products. The FileStream datatype is pretty cool.

  65. Non relational database models by Edward+Ka-Spel · · Score: 1

    This really isn't my field, so I'm curious. What are some examples of non-relational database models?

  66. Columns stores are RDMB by suv4x4 · · Score: 1

    When it becomes mainstream, it'll be just a checkbox I tick when I create a table on my Oracle, MS SQL, PGSQL or MySQL store to use it.

    Stop with the nonsense.

  67. Re:slightly OT by Ark42 · · Score: 1


    I really quite like -some- aspects of M / Mumps / Cache in how it stores data. A large chunk of the healthcare industry still relies on programs which use exclusively this weird non-SQL language for reading and writing to the "database". Think of it as more like a permanent global disk-based array of arrays of data. There are no rows/columns, just an associative array which can contain any level of nested arrays or data, but is always synchronized with the disk.

  68. Overused, Not Obsolete by CyberLife · · Score: 1

    I don't think relational databases are obsolete, as there are a great many applications which the technology fits nicely. I do, however, think they are greatly overused. From what I can tell, people are being trained to use RDBMS anytime they need to persist data. I can't count the number of times I've asked somebody why they're using one and the response is, "It's what I know."

    I read a very good statement about relational technology. It said one should not think of an RDBMS as a persistence tool. Sure, it can be used for that, but it's better to think of it as an integration tool. It should be used as a mechanism to permit the uniform sharing of information between different applications, built using different languages, running on different machines, hosted by different operating systems. That's, the statement argued, is the real power.

    I'm inclined to agree. If you've got a simple blogger app that doesn't need distribution capabilities, doesn't need language-independence, doesn't need OS-independence, just needs to persist an ordered list of very basic data, an RDBMS is way overkill.

  69. One size fits all by DragonWriter · · Score: 1

    Stonebreaker's complaint is that modern RDMBS's are "one size fits all" when different approaches can outperform them in particular niches. The problem here is that real enterprises often need solutions that fill multiple roles (OLTP, OLAP, etc.) and integrate seemlessly. Having a solution that fills one role with less fuss and overhead and DBA workload than existing RDMBS's is of limited utility if it makes all the other things that need to connect to it take correspondingly more fuss than they would with a "one-size fits all" database system.

    Of course, if it can be wrapped up in a way which is transparent to the user and integrates seemlessly "behind the scenes" with existing systems, this stops being an issue, but then you still have an RDBMS, just with a different underlying implementation of the storage engine. And its hardly as if optimized storage implementations for particular roles are something new to the RDBMS world.

  70. Do it. by rs79 · · Score: 1

    I met Michael Stonebreaker once. He has a very very fine mind and while wha he comes up with may look like whackjob conclusions, when you hear him explain them they, and he make perfect sense.

    I suggest very strongly you talk to him directly and tell him what you told us. I promise that time will not be wasted.

    --
    Need Mercedes parts ?
  71. Re:slightly OT by Anonymous Coward · · Score: 0

    If I recall a commit after a truncate isn't necessary. No undo info is generated so the operation can not be rolled back.

  72. Far from it! by jbeaupre · · Score: 2, Funny

    Living in Kentucky, I can tell you a relational database would be handy if actually used. Why I've got in-laws that, well, I won't go into details. Let's just say I suspect my wife married an out of stater for fresh genes.

    What? Not that kind of relational?

    --
    The world is made by those who show up for the job.
  73. I've looked at Vertica... by puppetman · · Score: 1

    which appears to be the front-runner.

    The concepts are interesting - they claim speed, and significantly lower disk-space requirements. They have a video on their site explaining the concepts (http://vertica.com/techoverview - you'll need to provide contact info to see it).

    But the last time I talked to their sales representative, they did not support triggers, stored procedures, or any sort of high-availability options.

    While the concept is interesting, until they roll out what are now standard database features (even MySQL has views, triggers, and stored procedures now), it won't bet taken seriously. Even then, they might go the way of object-oriented database systems (we tried Matisse without much luck - too much locking). Row-based database systems have a lot of momentum.

  74. What do you think should replace the RDBMS with? by einhverfr · · Score: 1

    You seem to think that RDBMS's are obsolete. What do you think they should be replaced with? Even Stonebreaker isn't suggesting that set-theory-based math isn't the right way to go-- he is talking aobug physical storage choices (which work better in some cases), not the theoretical math basis for the information management.

    --

    LedgerSMB: Open source Accounting/ERP
  75. Abstraction vs BS by Anonymous Coward · · Score: 0

    What's the difference. Don't you guys feel the term relational database is actually the only thing dated here.

    A "data" "base" is just a big pile of data. How you access it truly defines the database, not simply how the data is stored.

    In the world of practicality we may have names for efficient ways to access the data stores, but ultimately they are just naming conventions that means little. The term relational database simply doesn't really mean anything. It's just a BS term made to describe one set of logic for access more or less the same data in the same format.

    In it's most abstract and original form 'relational database' means nothing more than a file with fields that 'relate' to each other. So I'd argue that realistically, most any database can be defined relational. If in fact the fields can be shown to have a relational value in any sense to each other then it can be considered a relational database.

    Therefore, this guy is obviously not the master of database knowledge or is trying to hype some new technology as so often is the case on slashdot.

    Saying relational databased are obsolete is like MS saying procedural programming is obsolete because object oriented programming is the future. The reality is that object oriented programming contains procedural programming, not that they are exclusive. The same can be applied for new forms of indexing relational databases. The fact the field relate in an intelligent fashion does not change based on how your search the data. What you are seeing here is the corruption of terminology for the sake of selling a new technology. While we call standard SQL databases relational that doesn't mean other formats are not also relational even though they are significantly different at either the point of formating the data or accessing it. Saying something impressive like quantum database would still imply the fact that the field relate. A quantum database would redefine the methods of access, but none the less would not redefine the fact that databases are all relational.

    This is just a term used to sell technology. All too often Slashdot focuses on this type of technological terminology.

  76. assert(Are you joking == RDBMS is legacy); by Anonymous Coward · · Score: 0

    I've heard similiar stories being blurted for years. First it was object relational data stores, then it was various abstraction layers.

    Now somehow tilting a table 90 degrees will replace the 'legacy' RDBMS. Modern OLAP implementations have already surpassed whatever niche need these people think they will be filling...

  77. In Korea... by Anonymous Coward · · Score: 0

    ... only old people use relational databases!

  78. Not quite by shis-ka-bob · · Score: 1

    If you are doing this by altering your SQL, it has nothing to do with column store. Column store means, as others have noted, that the physical memory is organized by columns, not rows. Writing clever SQL that mimics a 'pivot table' is not going to alter the internal structure of the data. As a practical matter, I would achieve a 'dual mode' database by using PostgreSQL for 'row store' and Vertica when I wanted 'column store'. Then I would at least be able to write portable SQL since they use the same PostgreSQL front end. Since the front end has the SQL parser, I would be able to move my work product from one DB to the other with a maximum likelihood of success (query optimization is another story, of course). In practice, this would mean that I use PostgreSQL for transaction processing and Vertica for analytics. Sybase also offers both sorts of databases, so that would also be an option. But SQL Server doesn't support column store, see the Wikipedia article on column oriented DBMS at http://en.wikipedia.org/wiki/Column-oriented_DBMS.

    --
    Think global, act loco
  79. show me a database that isn't relational? by DragonTHC · · Score: 1

    Aren't relationships the core fundamental of a database?

    I've got a database in about 6 shoeboxes in my bedroom. Old bills and documents. It's not relational.
    Is that a better way than a relational database?

    I think what we're referring to here is a mass of data with tags that are searchable. This method, while more modern, does not mean better.
    Those tags have to be applied. And, they're still a relation.

    I still think that, until someone presents a more efficient and reliable method, we should follow Codd's 12 rules.

    --
    They're using their grammar skills there.
    1. Re:show me a database that isn't relational? by CyberLife · · Score: 1

      This method, while more modern, does not mean better.

      You're right, it doesn't necessarily mean better. Once again, there's a right tool for the right job. As the article describes, a column-oriented database is optimized for a much different purpose than a row-oriented one. The following example may help to clarify this point.

      Consider a database such as the one employed by Google Images. Each record represents a single image, and it contains, among other things, the format (e.g. JPEG, GIF, PNG), the dimensions, the file size, the date/time spidered, and of course the pixel data in both original and thumbnail sizes. Now suppose you wanted to see the dimensions of all JPEG images. In a row-oriented system, each JPEG record would have to be loaded from disk in its entirety (including the possibly multi-megabyte image data) just to extract a few bytes of dimension information. Using Google's column-oriented Bigtable system, however, only the file containing the dimension data is accessed. The image data isn't even thought of.

      Now many people using row-oriented systems for such applications tend to keep their image data outside of the database for this very reason. However, one could reasonably argue that this is nothing more than a pseudo-column-oriented approach.

  80. NRDBMS by WED+Fan · · Score: 1

    Not only is the Relational DBMS dead, it is buried.

    We now have the Non-relational DBMS. Data is stored in a manner not as it relates to other data, but how it does not relate to other data.

    Features of the NRDBMS:

    • AQL (Accidental Query Language) - Results of an AQ will return all results not requested, thus leaving the user to determine what the unrelated dated represents
      • Throw - Data is thown against an AQWall filter.
      • Stick - Returns all data that Sticks to the AQWall: return unit is refered to as a MESS
      • Residue - Results of a funtion that determines what is missing from the MESS
    • GUIDO - This is the replacement for a CRUD system. Guess, Insert, Update, Destroy, Offer.
    • INCEST Layer - Watchdogs the DBMS for accidental pairing of related elements

    The NRDBMS is based on a non-column, non-row, non-cell oriented theoretical system outlined in paragraph 1D.10.T of the Savant manual.

    --
    Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
    1. Re:NRDBMS by Phil06 · · Score: 0

      What about SDBMS (Spreadsheet Database Management System) where you have all your employees (none of whom have ever taken an Excel class) keep all your key information in separate spreadsheets, with multiple copies with non-descriptive names stored in hackneyed folder hierarchies, all with every possible formating trick in addition to numbers and dates stored as text making them impossible to sort, with needless pivotables up the wazoo, numerous unintelligible graphs, with every effort put into just making it print okay and requiring the entire worksheet to be reformatted when you want to make one small change.

      --
      "...and yet, I blame society" Duke - Repo Man
    2. Re:NRDBMS by smittyoneeach · · Score: 1

      Dude, SDBMS was replaces by PDBMS, the Presentation Database Management System. Not only did the datasets get smaller, so they would be legible on a slide, but they now have spiffy transitions.
      When used to display something like a project Gantt chart, PDBMS has the additional benefit of decoupling from the data, because you know those notional dates are going nowhere save to the right.
      PDBMS also has way more swell hardcopy options than SDBMS.
      When you are sufficiently Tufte, you'll upgrade from that wimpy SDBMS.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  81. Database conversion? by gorckat · · Score: 1

    Can I do this by turning my monitor on its side so the old rows are the new columns? Does it matter if I turn it left or right?

  82. Sideways Computer, bad data storage by Dareth · · Score: 1

    Everyone knows the best way to store data requires columns labelled with letters, and rows labeled with numbers!

    If you turn the box sideways, that is all messed up. Well, I guess if you turn your monitor sideways too it would correct this... I am a genius!

    --

    I only look human.
    My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
  83. Stonebraker's current track record by WindBourne · · Score: 5, Insightful
    1. He helped created THE first relational DB.
    2. He later moved to creating an Object-Oriented Relational DB with Postgres in the 80's. Much of that tech has found its way into other DBs such as Oracle and even helped create the OODBMS world.
    3. Now, he is creating the Column store DB and announces that this will be the next big thing.
    I would listen to him. Biased or not, He has a better track record than most intelligent ppl (and all the wanna-be/hasbeens; dvrack comes to mind) in the tech field.
    --
    I prefer the "u" in honour as it seems to be missing these days.
    1. Re:Stonebraker's current track record by KingSkippus · · Score: 1

      Yeah, but the headline and the ComputerWorld article make him sound kind of kooky and grossly misrepresents what he actually said. I've since gone back to read the original article, and I'm more inclined to agree with you now.

    2. Re:Stonebraker's current track record by Tablizer · · Score: 1

      and even helped create the OODBMS world.

      What OODBMS world? OODBMS kind of fell off the edge of the world, except for some really nichy stuff.

    3. Re:Stonebraker's current track record by WindBourne · · Score: 1

      Oracle (and DBs like Postgres) killed them. That, and they were overhyped. For the way many of those were developed, it is easier to stash an Object in Oracle/PG and read it from there. Also a great deal faster.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    4. Re:Stonebraker's current track record by Anonymous Coward · · Score: 0, Troll

      Stonebraker is an academic. There are two categories of early DB pioneers. There were people
      like Jim Gray and C. Mohan who actually did the work and came up with useful (practical) algorithms.
      Then there are people like Date, Codd, and Stonebreaker who theorized a lot. The theory was
      useful (Codd's, at least), but much of it was quietly worked around to get a viable
      product developed. Even if (a) the article agreed with the summary, and (b) he didn't have
      financial interest in column stores, I wouldn't listen to his prophesies too seriously. I'd
      love to be proven wrong though. Wouldn't be the first time.

    5. Re:Stonebraker's current track record by jbplou · · Score: 1

      Because Object Oriented databases were so successful, I think his best work is behind him.

    6. Re:Stonebraker's current track record by WindBourne · · Score: 4, Informative

      Not Quite. Stonebraker was THE core developer on Ingres which was the second relational DB created (System R was the first developed by Codd et.al using SQL). He operated at both ends of the spectrum. Gray/Mohan did the same as well. While System R disappeared, Ingres was developed into a major company (ingres) and was ultimately bought by CA.
      Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.

      Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.

      Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.

      So armed with that knowledge, exactly WHY would you discount his work and his statements?

      --
      I prefer the "u" in honour as it seems to be missing these days.
  84. think that's fast? by myatmpinis1234 · · Score: 1

    I want a Beowulf cluster of column databases.

  85. Stonebraker is wrong by Master+of+Transhuman · · Score: 0, Flamebait


    This sort of crap was dug up when "object-oriented databases" were invented,

    Every little while somebody who doesn't even understand relational theory suggests that relational databases aren't "necessary", "efficient", "legacy", or some other crap.

    Here are the facts: short of some sort of AI conceptual processing being developed, relational databases are the ONLY known method to CORRECTLY reflect the realities of modeling the real world as data (and even then there are problems.)

    Go talk to C. J. Date or Fabian Pascal - they'll set you straight on that nonsense.

    Go to the Database Debunkings site and learn something.

    --
    Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    1. Re:Stonebraker is wrong by CyberLife · · Score: 1

      ... relational databases are the ONLY known method to CORRECTLY reflect the realities of modeling the real world as data ...

      The only known method or the only method you know of? Not only that, but what exactly is the definition of correctly reflecting the realities of modeling the real world as data? Relational technology may be based upon sound mathematics, but the application thereof is not. Any application of technology to a particular task is subjective. Every situation is different, and everybody involved has different ideas of what is important.

      Don't get me wrong. As my other posts in this discussion illustrate, I don't for a minute agree with the notion that relational technology will be completely replaced by this. That's just good old-fashioned FUD, designed to stir up demand for something different. All I'm saying is that there is no one technology that is the best or the only for all situations.

    2. Re:Stonebraker is wrong by Master+of+Transhuman · · Score: 1

      The point of relational theory is to do exactly that: correctly model the real world in data - subject to the technological limitations of computers, as I said. If you have AI, you can do better. We don't, so we use relational theory.

      I also didn't address the application of relational theory. If you study Date and others, you realize that there isn't a completely relational database anywhere on the market. All of them make serious compromises with the theory, either because of perceived performance needs or simple lack of understanding of the theory. This doesn't mean these databases are of no use - it just means that in certain data modeling situations, they will not correctly represent the real world and the gaps will have to be adjusted by procedural code or other fixes that reflect their incomplete implementation of the theory - or you just have to accept the occasional wrong answer to a query.

      Object orientation has no mathematical theory behind it, so it flails. It's less likely to accurately model real-world data than relational theory. It may be fine for applications, but not for databases. Date and others have covered these issues in some detail.

      And I don't know of any other technology that aims to compete with relational theory other than OO - except Stonebraker's concepts. I'll admit I haven't studied his stuff closely, but Date and others have complained about his concepts before, so I think I'm safe to say it's unlikely he's made any new breakthroughs.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
    3. Re:Stonebraker is wrong by NoOneInParticular · · Score: 1
      Wow, that is one hopeless site. Not only do you have to know up front what 'vociferous' means, but it presupposes that everyone is aware of the abbreviation EAV in various postings. I was going there, hoping to get some actual information about the myths that perpetrate relation data management. Unfortunately, I need to be a convert, and 'in the know', before I can be expected to understand the information provided.

      The authors of that site should really take a look at the meaning of debunking: it is not 'putting an extra floor on my ivory tower'.

    4. Re:Stonebraker is wrong by Master+of+Transhuman · · Score: 1

      Ignore that - Fabian Pascal is not noted for being "user friendly".

      The articles there also presuppose a considerable knowledge of relational theory.

      It's a good site, nonetheless - a lot of good stuff there from CJ Date.

      Persevere - it's worth it.

      --
      Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  86. re: rotate - one word by villy · · Score: 1

    Transpose.
    Done.
    Profit.
    42.
    You get the idea.

  87. perhaps it's good for websites too by fmoliveira · · Score: 1

    A site like slashdot have a lot more reads than writes. Perhaps these column thingies could help here too.

  88. Only a new storage engine for MySql by xluap · · Score: 1

    It could become a new storage engine for MySql, so applications that benefit from a column store can use it instead of one of the other available storage engines.

    1. Re:Only a new storage engine for MySql by jadavis · · Score: 1

      It could become a new storage engine for MySql,

      Every current storage MySQL engine choice changes the semantics and forces you to give up features.

      I wonder what basic features you will have to sacrifice in order to use this hypothetical new storage engine.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  89. Common Business Mistake by C10H14N2 · · Score: 2, Insightful


    #1: Assuming what you think your customer needs is what your customer wants.
    #2: Assuming they are the ones who made the mistake when you lost the job.

    1. Re:Common Business Mistake by myowntrueself · · Score: 1

      Oooh! Oooh!

      I know the answer to this one...

      #1: Assuming what you think your customer needs is what your customer wants.

      Apple.

      #2: Assuming they are the ones who made the mistake when you lost the job.

      Microsoft.

      --
      In the free world the media isn't government run; the government is media run.
  90. Re:IMS--Hierarchical DB harder to use? by SixDimensionalArray · · Score: 1

    I know, I know, it's Microsoft.. BUT - check out the upcoming LINQ (Language Integrated Query) and DLINQ (LINQ for databases) in the .NET languages - I think this might be a step in the direction you desire.

    SixD

  91. Column oriented database = IMS? by scottsk · · Score: 1

    What a "column oriented database" (never heard that term before) sounds like is IBM's old IMS, which grouped columns together (they didn't call them columns because relational databases didn't exist back then) to make it easy to process all of the same fields for all records in the table (they didn't call it a table). For example, you could iterate through all the phone numbers for every record. Mike Murach's company still sells a book on IMS DB (two volumes, but you only need part one) if you're interested in the power of this approach. IMS was designed in the days when config files were coded in assembler and assembled into a program because of efficiency constraints, so it is major ugly to code in. Basically, IMS was a framework (again, they didn't use that name) which loaded your assembled file definitions and ran against them. The IMS query language was so bizarre it's worth looking at if you've never seen it just to see something that bizarre. (And I mean bizarre, such as a space being part of the relational operator -- you have to code a space if you use a one-byte operator like "> " instead of a two-byte one like ">="!) If someone could dust this off and make it work in a more modern way, it would be one of the most remarkable comebacks in computer history. With modern partitioning for relational databases, and the fact that you'd have to have some sort of query language, I don't think this could replace general-purpose RDBMSes, but it could be a good extension to them some day.

  92. Re:IMS--Hierarchical DB harder to use? by einhverfr · · Score: 3, Interesting

    On the contrary.

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    I think there is a key distinction here. Application object store vs data management. Hierarchical db's are far better at storing object information, but *far* worse at real data managment.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    In LedgerSMB, we solved this by putting a functional interface in the db. Then we dynamically map the objects and their properties into functions and arguments. Works great :-)

    As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...

    You are either trolling or you need to fire the DB architect who designed that. THere is *no way* that a 100-column table is good DB design. (Ok, mathematically, there is nothing that precludes it being good db design, but I can't even imagine a scenario where this would be OK).

    In the 3rd generation programming languages this was just a simple structure with 100 entries.

    Oh, you were the one who designed the 100-column table. Sorry..... Please go out and get some books on db design. You will thank me :-)

    The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.

    IMO, your problem honestly is in the fact that you are using a monkey wrench as a ball peen hammer. It may sorta work but you are using the wrong tool for the job. If you want a simple object store use BDB or something like it. If you want a real data management solution, build your db *first.* If that is not your goal, use something other than an RDBMS.

    Next: There were about 10 parameters per field (column).

    1: Column name
    2: Column name length
    3: data type
    4: data length
    5: character representation ... etc

    finally 10: Address where the data lives.

    The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.

    Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.

    How is this an issue with RDBMS's?

    Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.

    Again, this is not a PostgreSQL problem ;-)

    I gave up on that interface.

    From your description, that sounds like a wise choice.

    While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.

    Again, this is an issue with the frameworks you are using. Personally, I tend to do the

    --

    LedgerSMB: Open source Accounting/ERP
  93. ODBMS by Lodragandraoidh · · Score: 2, Informative

    I think the object database management system (ODBMS) will overtake RDBMSs in the future for several reasons (from the link and my own musings):

    1. Object-oriented databases are designed to work well with object-oriented programming languages such as Python, Java, C#, Visual Basic .NET, C++ and Smalltalk. This makes implimentation quick and easy - yet stable and scalable at the same time.

    2. ODBMSs use exactly the same model as object-oriented programming languages.

    3. It is also worth noting that object databases hold the record for the World's largest database (over 1000 Terabytes at Stanford Linear Accelerator Center).

    4. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.

    5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.

    I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets. ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.

    Right now my favorite ODBMS is the ZODB (Zope Object Data Base) - an ODBMS system tightly integrated with both Python (implimented using Python's native 'pickle' object persistence functionality), and the Zope web application development system - which itself is built with and uses Python. You can learn more about Zope at Zope.org.

    --

    Lodragan Draoidh
    The more you explain it, the more I don't understand it. - Mark Twain
    1. Re:ODBMS by Anonymous Coward · · Score: 0

      I think the object database management system (ODBMS) will overtake RDBMSs in the future for several reasons (from the link and my own musings):

      No, because the ODBMS is not general enough. The relational model is a superset of all data management models. That's like saying "integers will overtake real numbers in the future". Certainly, ODBMS are useful for certain tasks, but since they aren't as general as what can be described by the relational model, they can't possibly "overtake" it.

      Example: The relational model associates data via boolean assertions. ODBMs use 1-to-1 "pointers". Which is more general? What if your data is just floating point numbers, no pointers?

      Object-oriented databases are designed to work well with object-oriented programming languages.... This makes implimentation quick and easy - yet stable and scalable at the same time.

      "Quick and easy"? Who cares? I design data systems that are meant to last decades. I prefer to get the DBMS right, and let the programmers be as "Agile" as they like. In a few years, all their code will be replaced by the next new fad, or in "maintainance mode" anyway. Data is king. As for "stable and scalable", that has nothing to do with ODBMS vs. RDBMS. In my experience, the opposite is true.

      Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.

      The physical implementation of a DBMS is completely irrelevant to the model used to access it.

      Be careful with that "joins are often not needed" BS. It is a typical pattern in ODBMs to retrieve only a single "column" of objects per query, and then programmatically retrieve the related objects, again a single "column" at a time, using "navigational" techniques. For instance get a list of customers matching a criteria, and then a list of orders for each. With a relational query, you get the data in one declarative query, which can be optimized by the DBMS or a future upgrade of the DBMS. With the ODBMs, the *programmer* determines the query plan, at coding time. I'll leave it to you to ponder which is more powerful and flexible, and which can be studied abstractly outside of the system.

      I'm not sure what difference it makes that you have full text search. There's nothing "unrelational" about a search operator for your types.

      Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.

      Meaningless mumbo-jumbo BS. I'd like to see any DBMS that doesn't "provide data persistence".

      I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets.

      "More structured situations"? You don't know what you're putting in your DBMS? Data is structured, if it's not, then it's noise and you can't store it or identify it. Think about it.

      Give me a formal way to determine what data set is "too small" for an RDBMs, otherwise I call BS.

      ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.

      You really have no clue what you're talking about. "Data is changing"? "Persistence is critical"? You're describing basic data management, not any superiority of one model over another. "Large binary objects"? Wait, weren't you storing data from object-oriented languages before? Now you've just got BLOBs? No thanks, I'd like to look inside the objects when making queries.

      Listen pal, data management is a SCIENCE. There are theorems and abstractions and precise language. If you can't defend your blabbering, or even use terms consistently, stick to programming. Also, wikipedia's data management entries are complete garbage, guarded by people who probably started their data management career with Ruby on Rails 6 months ago. Pick up a copy of "Database in Depth" instead.

    2. Re:ODBMS by dcam · · Score: 1

      I think that is rather unlikely.

      1. You can only access the the data through the API you provide. You need to do the work of writing reporting, updating etc.
      2. You have to write the API for other code to access

      The point is you remove all generalised means of accessing the data. Also you end up writing a lot of code rather than relying on a thrid party provider to write common code.

      --
      meh
    3. Re:ODBMS by Tablizer · · Score: 1

      I am a fervent skeptic of OOP. I've yet to see OOP work well for biz apps and challenged many OO fans to produce decent evidence. They couldn't. The best they can claim is that OO fits their own particular individual mind (which by definition I cannot refute). OODBMS have been rightfully compared to the "nagivational" databases from the late 60's that relational rightfully killed and replaced. CoocooCachoo.

      Access to data can be faster because joins are often not needed.... This is because an object can be retrieved directly without a search, by following pointers

      This is one reason why I call OOP the GoTo of the next generation: it is a shanty town of chaotic pointers. When inheritance turned out not powerful enough for realistic biz modeling, OO'ers turned to objects referencing objects (pointers) and created messes.

      5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.

      I don't see why an RDBMS-based solution could not also do such.

    4. Re:ODBMS by Lodragandraoidh · · Score: 1

      "More structured situations"? You don't know what you're putting in your DBMS? Data is structured, if it's not, then it's noise and you can't store it or identify it. Think about it.


      I have thought about it - for years - dealing with venders and internal data modelers who assume that you can define every property of a given *changing* system before hand. When dealing with dynamic changing systems - you can not define 100% of the relationships - you can only make an approximation:

      The information in the world is not composed of one set ontology, except in very proscribed areas. For the most part information is gathered and categorized on the fly - what is relevent to one group is not to another or even to the same group on different day - you will never be able to capture that in an RDBMS. Your framework is already irrelevant by the time you build it. You do not serve your users by stating 'data is king' - and leaving them without a solution.

      Most information is like this - there is a very small subset that falls clearly into those categories that allow you to normalize the data nicely. Then there is the vast sea of information that is unclassifiable, or conversely is classified in various ways because of its volatility - all of which you want to capture.

      In fact human beings think this way - we don't have a strict set of classifications for a given subject. We have many different and sometimes conflicting ideas about how things are related and how they are not. Chaos is a given, and RDBMSs can not deal with it as well as ODBMSs. RDBMS advocates have been telling me I can't do 'X' for years, and I just go forward and do it anyway using ODBMSs - and it works. Most of what I deal with can not effectively be managed in an RDBMS. From my perspective, RDBMSs will continue to be a smaller subset of my solutions when compared to ODBMSs. YMMV.

      It must be nice to work in an environment where the structure of the data doesn't change for '100' years. I don't have that luxury

      Wait, weren't you storing data from object-oriented languages before? Now you've just got BLOBs? No thanks, I'd like to look inside the objects when making queries.


      Try looking inside of an MPEG, MP3 or JPEG - you will have a bit of meta information in a header, but the vast majority of the data is a blob. Most people are collecting information in these formats today - in fact I would say the vast majority of data is in these kinds of formats. You are basically saying 'if it isn't alphanumeric I'm not going to deal with it'. This illustrates perfectly, the close mindedness of the hard-core RDBMS crowd. The world is changing - you aren't going to stop it by sticking your nose in the air, and refusing to deal with it.

      If there is a better way to deal with dynamic/changing systems and multimedia objects - please show me the light. From my own experience RDBMSs are not the solution.
      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
  94. Are computers obsolete? by bb5ch39t · · Score: 1

    Or just humans?

  95. Slower on write because by einhverfr · · Score: 1

    disk seek time is a real killer.

    Instead of writing all the data in one operation, you get to seek, write, seek, write, seek, write, seek, write, etc.

    Moving parts are slow ;-)

    --

    LedgerSMB: Open source Accounting/ERP
  96. Huh!? by Anonymous Coward · · Score: 0

    ...they should be considered legacy technology, more than a quarter of century in age and "long in the tooth".
    http://www.databasecolumn.com/2007/09/one-size-fit s-all.html

    The point is clear. People could(and in many cases should) be getting a lot of use out of column oriented databases. The title of his post is "One Size Fits All - A Concept Whose Time Has Come and Gone". His argument basically comes down to saying that blindly throwing Oracle at all of your database problems is stupid and outdated. There are better solutions for many situations out there, and custom built solutions can offer many benefits as well.
  97. Especially not the editors by blueZ3 · · Score: 1

    I think there's some sort of "editor TOS" that specificaly prohibits any "article-based" editing. Or correction of grammar/spelling/typos. Or checking of whether links point to some random profit-blog.

    Heck, I think the editor TOS pretty much reads: post 'em if you got 'em

    --
    Interested in a Flash-based MAME front end? Visit mame.danzbb.com
  98. Re:slightly OT by Deadplant · · Score: 1

    Indeed, I am using Django for several projects and it really has freed me from reading/writing SQL queries. Django has actually made my life easier and more pleasant in so many ways...

  99. Thanks for the post! by lakeland · · Score: 1

    I've recently shifted from a cutting-edge linux shop to a behind-the-times microsoft shop and have been struggling to find decent MS tools rather than giving up and just going with what I know.

    PS: looking at the code sample

            int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

            var lowNums =
                    from n in numbers
                    where n 5
                    select n;

    Doesn't it remind you of a PROC DATA step in SAS?

  100. Re:Re : Are Relational Databases Obsolete? by OldManAndTheC++ · · Score: 1

    sed -e 's/crisps/potato chips/' -e 's/pound/dollar/'

    Oh, you English and your funny language! A chip is a "crisp", and a truck is a "lorry". Next you'll be saying a wrench is a "spanner", a car's trunk is a "boot", and the President of the United States is a "wanker".

    --
    Soylent Green is peoplicious!
  101. IQ by PigIronBob · · Score: 1

    Column-oriented database systems are not new. Sybase has successfully sold its column-based IQ database for years as a high-performance business intelligence solution.

    I have worked with IQ, they should have called it 'DUMB', it was bought on the premise that all 'TRANSACTSQL' was 100% compatible with SYBASE, well it wasn't, in fact it didn't even come close, what a pig.

    --
    You never catch me alive
  102. Sounds Like Krap... by Nom+du+Keyboard · · Score: 1
    Sounds like Krap to me. The argument that because RDBMSs are now "long in the tooth" is no good reason why they now have to be replaced.

    And besides, I thought Object Databases were the next new thing.

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  103. Re:IMS--Hierarchical DB harder to use? by cdn-programmer · · Score: 1

    With regard to say 100 fields in a row.

    In many cases your observation would be true that once you are getting up to 100 fields in a row that this is really pushing it.

    However with geological data this is not always the case. Consider a borehole in a mining setting. The hole lives at a certain location and has directional information. The borehole is identified by an ID. It was drilled by such adn such a company at such and such a time and hence it still makes sense to carry all this data in a row identified by the borehole ID. If one breaks this data into multiple rows - then a merge may need to take place and this can be costly compared to the much easier job of just ignoring a few fields which are not needed.

    Add to this it issue that if you have split the rows up then some of the data may be missing and some may be duplicate because the data structure allows it.

    Now if you had two contractors drilling the same borehole then I would agree and say split the contractor data into one table and the technical data into another. In fact that was done even though there is no possiblity of having two or more contractors on a borehole. If something funny happens one can always assign a new borehole ID anyway.

    Look at the technical data.

    Isotopic signatures may be taken. There are over 100 isotopes alone.

    Various mineral assays can be conducted. The number of fields here is also large.

    In short it is quite easy to end up with more than 100 fields.

    ------------------

    I'd like to address your assertion that an RDBMS is a "data managment tool" and not an "applications development tool". In my mind - one should not preclude the other.

    By the time one has defined a view of the data it looks just like a flat file. We have a fixed number of columns of data - some of which may be dupicated and of course some of which may be composite data such as if we average something.

    I have no problem with this at all.

    My point is that if the data looks like a flat file and can be printed as a flat file by pretty much any SQL query engine, then why can't the RDBMS software present this to a 3rd generation programming language as a flat file that can be read. In fact it can.

    One _could_ for instance fire up a child process and feed it the query and have it hand this off to some SQL query engine which then writes the data out to a flat file which the application then opens and reads with standard old 3rd generation programming statements like fread().

    This would actually be fairly quick to program.

    The thing is that its a horrible way to write an application. So why doesn't the RDBMS provide what looks like a flat file interface but do it in a fashion that is at least as elegant as the good old read and write logic that's been around for 30 years or more?

    Similarly for writing data into the database.

    One should not have to stand on their head and do back flips simply because there is so little attention to the interface between the programming language and the data management software.

    The example I cited was from PostgreSQL where over 1000 parameters were generated by their precompiler. It was poorly implemented and I have not looked at it since.

    Of course I do agree that what I am talking about is the API to the RDBMS.

    Still, in my mind this is just as important as the data management capabilities of the system.

    ----------------

    Oh - I want to address the comment that is a language like Perl you use a float.

    For monetary values it is rare that it is valid to EVER use a float. In C and C++ you do have floats available. It is just worng under almost all circumstances to use them.

    The reason is that when you add floats you cannot guarantee that rounding errors will crop up which will render what you are doing incorrect. For instance, if you wish to tally the items in an invoice then the items should in fact add up to the total you print. With floats this cannot

  104. Re:IMS--Hierarchical DB harder to use? by Anonymous Coward · · Score: 0

    I'm not sure if I get you correctly. Are you looking for some Object-Relational Mapper perhaps? So, for Python there is SQLObject, SQLAlchemy and some other; from what i know Java has Hibernate which is more or less the same. Google for "ORM" plus your language of choice.

  105. Re:Yea, it's all the same.-EXCUSE ME BUT... by Nom+du+Keyboard · · Score: 1

    A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.

    Excuse me but, isn't that what an index is for?

    --
    "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
  106. Re:IMS--Hierarchical DB harder to use? by Tablizer · · Score: 1

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    I suggest you use dynamic languages (or type-free language) and just have the query API map a given row into an associative array (aka "map", "dictionary array"). Same with saving it: change the map as needed, and then push it to a SaveRow function. (Saving may require a data dictionary because generated SQL statements need to know whether each item is a string or not.)

        resultSet = query(std, "select * from fooTable where bar=7");
        while (rowMap = getNextRow(resultSet)) {
            print("Name: " . rowMap["name"]);
            rowMap["name"] = "I changed the name";
            saveRow(std, rowMap, "fooTable"); // data dictionary knows keys and types
        }


    Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.

    What a file system (hierarchical database) does is improve performance on one search factor at the *expense* of others. Yes, that particular operation may have been faster, but RDBMS are designed to work well under a *variety* of search "paths", something hierarchical stores (like files) cannot do well. Plus, there are often ways to tune RDBMS for common search paths if one wants to trade a single search path (factor) performance for general purpose. It may take some expertise though.

    For example, I often want to search my files by date changed, regardless of folder. This takes a long time because the folder tree is "indexed" based on file/folder names, not other attributes such as dates. "Cross tree" queries are slow and combersome. (Some tools will index on other attributes, but this turns it into a network or navigational database instead of a hierarchical one).

    Using RDBMS effectively does take experience and education. They are not a simple technology to use. But, they can be very powerful.

  107. We're talking about this topic at geekSessions by Jim+Ethanol · · Score: 2, Informative

    The next geekSessions is on this topic. If you're interested in hearing about alternatives to RDBMS in and using them in practice, take a look at the site. The event will be held on October 2nd in downtown SF and will also be available via webcast. In addition to the presentations, we'll have a Q&A session along with some food and FREE BEER. Speakers at the event are:

    Josh Berkus from the PostgreSQL core team

    Paul Querna from Apache and Bloglines (wrote his own filesystem for Bloglines)

    Chad Walters from Powerset who is implementing BigTable there.

    Hope to see you there!

  108. Column Store Lite by walkerp1 · · Score: 1

    If I desire the benefits of a column store in my relational database, I simply create an index. For warehouse-flavor queries, I might create a multi-column indexes. Robust RDBMS's will choose multi-column indexes as exclusive read targets if all of the queried columns reside within that index.

  109. Something completely different by uggedal · · Score: 1

    If you gents and ladies want to try out a completely different approach to organized data storage I would suggest that you give CouchDB a try.

  110. tons of problems... by ebunga · · Score: 1

    First off, Stonebreaker has a vested commercial interest in the matter. Take everything he says with a grain of salt. Actually, take the whole damned salt lick. Secondly, whether data is stored by columns or rows is not a concern for the interface of an RDBMS. That is a physical implementation detail that any RDBMS could do. Don't forget Codd's 12 Rules. Pay close attention to Rule 8.

  111. Relational approach is there to stay by trondotcom · · Score: 0

    Some other alternatives like object approach has appeared, but the thing is that relational approach is the one it is mainly used without competition (for the moment)

  112. How are column-stores searched or indexed? by Anonymous Coward · · Score: 0

    QUESTION: How is a column-store DB better than a row-store one using single-column indexes?

    WHAT I THINK COLUMN-STORE DB ENGINES DO:
    I don't see how column-stores make retrieval any easier. Most search conditions have more than one column involved. Won't several column-files have to be opened & read to match the conditions?

    SELECT * FROM T WHERE ID BETWEEN 1 AND 10 AND LAST_NAME LIKE 'S%';

    There are two column-files here, ID & LAST_NAME. Assume the data is sorted, but there are no indexes. ID is not a PRIMARY KEY!

    The DB engine has to open two files. Then, it binary-searches each file. It has to hit ID twice for 1 and 10. Next, it sequentially checks all matches by the search pointer. Finally, it has to open & read the row-slices of all other column-files in that range.

    Isn't all this what a single-column index provides? We can get a very small set of candidate rows. Yet, a row-store DB has all the related data in the same file, and WHERE can check all the other columns at once.

    Can someone expalin the data-warehouse advantage here?

  113. Re: Row / Column based storage hybrids by butlerm · · Score: 1

    Row/column hybrids are a good idea, but there are a half dozen technical reasons why data in different formats won't work for redundancy. The big problem is those functions are in radically different sections of the system. The disk controller doesn't know anything about the database and vice versa.

    Also, modern databases never have to wait for data blocks to actually hit the disk during normal operation. They only have to wait for log records to actually hit the disk (at transaction commit time). Once the log records are persistent, if the system crashes, they can be used during the database recovery process to bring all the data blocks back up to date.

    Journalling file systems are similar, except usually only with regard to meta data changes.

  114. Re:Yea, it's all the same.-EXCUSE ME BUT... by SatanicPuppy · · Score: 1

    Well, kinda, but you're still storing the original table, so while the index speeds stuff up, you're still left with the original bloated table hanging around backstage.

    Column-style databases offer some of the advantages of an index, but without the storage-inefficient back end.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  115. Re: Natural vs. Synthetic keys by butlerm · · Score: 1

    Of course there are tradeoffs, but a natural key only policy in a relational database of any sophistication is more or less insane. The biggest problem is that natural keys add at least one column for each level you go down in a database hierarchy. It is not uncommon for the natural key at the detail level of a typical snowflake table structure to have four or five columns.

    And of course all of these columns have to be carried in all foreign keys, including primary foreign keys as well. So suddenly, simple association tables have *ten* primary key columns. That is a programming nightmare. Any relational database that requires more than two columns per table join on average is going to be difficult to manage, develop for, and maintain. It is also going to waste a lot of unnecessary resources.

    In addition, a no synthetic key policy forces users to come up with unique identifiers instead of the database in cases where they do not care and where it is unnecessary. That is bad user interface design, plain and simple. Names and descriptions are usually unique - certainly enough for user purposes, but what kind of system uses a free form text field as a primary or foreign key?

    Worse, if anything the user can change is used as a primary key, cascading updates will be required. Cascading updates are inefficient and most databases do not do them automatically.

  116. Index on every column, how revolutionary! by porneL · · Score: 2, Insightful

    A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about

    That's why row-oriented databases have indexes and perform index scans.

  117. Re:slightly OT by Anonymous Coward · · Score: 0

    > If I recall a commit after a truncate isn't necessary. No undo info is generated so the operation can not be rolled back.

    Maybe in your toy RDBMS. Mine can even roll back DDL.

  118. Re:IMS--Hierarchical DB harder to use? by tknd · · Score: 1

    If one breaks this data into multiple rows - then a merge may need to take place and this can be costly compared to the much easier job of just ignoring a few fields which are not needed.

    I'm not sure I follow that form of logic. Yes, sometimes it's easier just to have a big number of columns, but almost always is the first quick death to RDBMS. The strength of RDBMS comes from the relationships portion, not the tables portion. In fact, when you enter that large a number of columns per a row, and you query that row, the database has to read the entire row and think that you want all of that data. Even if every field is not in use, they may still take up space on the disk, so you're losing storage space AND time to transfer the data back to the application because you're essentially trying to communicate that "50 fields in this row are unused."

    Let's consider your borehole example for a moment. You would have a table representing each unique borehole followed by a way of identifying and attributes tied to the borehole. At first glance, your approach is right, but at second glance when all attributes are not in use, it isn't necessarily a good idea. So take for example your approach, your table would look something like:

    Borehole ID
    Company
    Date
    DirectionX
    DirectionY
    Directi onZ
    ...
    Attribute100

    A simple structure, right? Except the notion that you don't necessarily use each column all the time. That alone is a hint that perhaps the columns are not always necessary. Instead, I would organize the data into two tables: one to identify the borehole and key pieces of information directly associated with the borehole and another table to store the attributes:

    -BOREHOLE TABLE-
    Borehole ID
    Date
    Name

    -BOREHOLE ATTRIBUTES TABLE-
    Attribute ID
    Borehole ID
    Attribute_Name
    Attribute_Value

    Now to reconstruct the data, I need to perform a join. Luckily, the Borehole ID is a primary key foreign key relationship so the database should have built some index to quickly map the pieces of data together so the join isn't that bad at all. It only becomes bad if I happen to ask for all of the data (in this case one table *might* be faster) or there is no index or relationship between the joined columns.

    So here comes the critical question: how much of the data do you need? If you are querying for all of the data at a time, but do not use any of the SQL query features, why did you use RDBMS in the first place? The benefit of RDBMS is that you can look at slices of the data quickly and easily. For example if I wanted to know what companies operated on which boreholes, that is an easy query even with my proposed data arrangement:

    select b.NAME from BOREHOLE b, BOREHOLE_ATTR batr where batr.borehole_id = b.borehold_id and batr.attribute_name = 'Company' and batr.attribute_value = 'ACME'

    If implemented as a flat file, I lose that functionality and have to implement it myself. In fact, in a flat file format, that operation can be horrible because I would have to search the entire flat file. In Oracle, I can use indexes and optimizer hints to speed up the query.

    The hard part now becomes what happens when you want to query on multiple attributes? For example if I wanted to query on boreholes by company ACME and on directionX = 0 and directionY = 1, now I have to use the set operations. One straight-forward way to do that is:

    select NAME from BOREHOLE where BOREHOLE_ID in (
    select borehole_id from BOREHOLE_ATTR where attribute_name = 'Company' and attribute_value = 'ACME'
    intersect
    select borehole_id from BOREHOLE_ATTR where attribute_name = 'directionX' and attribute_value = 0
    intersect
    select borehole_id from BOREHOLE_ATTR where attribute_name = 'directionY' and attribute_value = 1)

    Now I

  119. It's the moron at computerworld you ignore.. by Anonymous Coward · · Score: 0

    Actually **his** article is fine - he is simply saying the current horozontal approach of RDMBS is not as efficient as vertical for data wharehousing applications. He is saying the concept "one size fits all" is obsolete as data wharehouses provide different stresses than OLTP applications. Nowhere does he say that RDBMS as a concept is obsolete or anything like that.

    It's the moron at computerworld that misread his article and thought that he's saying that vertical is superior to RDMBS, which complete nonsense (that's like arguing TCP is better than Ethernet - meaningless!). Stonebraker is arguing about the on-disk layout used by common RDBMSs and saying what they currently use is less than ideal for some applications, which I can appreciate. He argues that his product, Vertica, outperforms conventional RDBMSs for datawharehousing.

    As proof that the computerworld monkey is the moron, Vertica itself is described as "a brand-new column-oriented RDBMS". Yeap, Vertica is a RELATIONAL DATABASE. Sheesh.

  120. Queries that don't make sense by cdn-programmer · · Score: 1

    One might occasionally want to query by company or some other attribute in order to find and correct data.

    However with geological data in general queries like a borehole drilled by company X in such and such a direction makes no sense whatsoever.

    In fact - this is where we really run into problems with data like this. In general - one is interested in data that falls within some 3-D space. It might just be 2-D mind you - like data that intersects a zone or is above a zone or below a zone.

    The thing is that in all cases, one is generally best to determine what data could fall within a volume or zone of interest and then simply scan the data after that. This means a flat file approach actually works pretty good for data of this type. Geophysical data falls into this.

    So with the borehole data one would want to carry the xyz or top hole location and bottom hole location with the main data for simple holes and then add a flag if there is a deviation survey. Sometimes these holes do look like a snake.

    Then one wants to group this data via an artificially determined key which is 3-D in nature if possible to eliminate all data which cannot possibly participate in the query.

    The borehole ID field doesn't accomplish much other than to let someone look up and possibly correct some data. As such - one is generally not even interested in how efficient this is. The number of updates and lookups by borehole ID are very low.

    Its the areal retrievals and the volume scans that take the time. This is the mapping side of the picture. You are correct that flat files actually can make more sense because spinning through even the best RDBMS's is quite slow when compared to reading a flat file.

    So then the question becomes, how long are the records. Does it make sense to split them up. The answer to this depends on the type of query. In our case we have a suite of queries that go through and generate flat files of certain formats. When one looks at these flat files it didn't make much sense to split them up.

    The thing is in an RDBMS if you read from two tables your disk heads may seek constantly between the two read points = and then there is the merge operation. Even a fairly large record can be fetched from the disk within a few reads... a few for the index and one or two to fetch the data.

    My point is that the power of an RDBMS is really not applicable to this type of data. However RDBMS's do confer other advantages such as standardized interfaces, maintanance tools, ability to do ad hoc queries and updates. All in all the RDBMS is a decent choice.

    My experiances however is that the API sux and this is a major distraction.

  121. Arther makes great point, complex subject by yl_mra · · Score: 3, Informative

    To be a relational database the database must meet a very specific set of requirements. While a standard view of the databases from the DB administrators and normal users view may allow limited ways to manipulate the data, looking at the actual storage structures and how to efficiently use the resultant files can provide some extremely efficient computational methods. For example, if one structures data in a relational database with few fields, perhaps as low as two, per representation where each representation represents a single file then the data read directly from the file, the reads may be via flat file techniques, into an array or efficient storage table, then you achieve the best of both worlds. If the above example requires compressed data, then the relational data must be read from a single file via DB operations with two different mappings overlaying the single database file. The programmer must understand the data orientation so that compression/decompression occurs correctly. Extremely fast reads/writes may be achieved this way. The advantage of a column oriented database is that the files are inherently optimized for data mining without the need to hire an expensive programmer. If the company had multiple requirements for the same data then multiple databases may be required, unless they are willing to hire the expensive programmer. A key problem results when multiple databases loose synchronization. BTW, row based databases optimized for storage size (footprint) compress data by column, not by row or record. Using these techniques, I have achieved far greater data compression than comparable Google stored data.

  122. Re:Perl Objects have both column and row DB advant by Corwn+of+Amber · · Score: 1

    What?

    Replying just to tag the comment... hoping I won't post 23 more before I understand the first two paragraphs.

    --
    Making laws based on opinions that stem up from false informations leads to witch hunts.
  123. For most users this does not matter by ChrisA90278 · · Score: 2, Insightful

    Michael Stonebraker is certainly a well respected nae and he was been right on these issues in the past. Coinsidently I'd testing my software with a new version of PostgreSQL as I type. I think colum vs. row storage can be considered simply a option. I can even see it being an option that you specify at the table level. Most DBMS users really don't have much data. Today a 1,000,000 row table can be cached in RAM on even a low-end PC based server. Once cached in RAM row vs. column storage does not matter. I would imagine that 99& of the database table in te world have far fewer then a million rows. This discussion applies only to the very few that are really large.

  124. Oh Dr. Mark Pauker, where art thou? by Anonymous Coward · · Score: 0

    Dr. Pauker, with Borland's Dr. Kahn, gave us Paradox for DOS. It was the column oriented Query by
    Example. Just another implementation of Relational Algebra for SQL, but it was great for us business users.

    I paid $800 for it, still use it. With 20 million records, it gives an answer before your pinkie comes off the enter key.

    Don't forsake me.

  125. Re:IMS--Hierarchical DB harder to use? by einhverfr · · Score: 1

    With regard to say 100 fields in a row.

    In many cases your observation would be true that once you are getting up to 100 fields in a row that this is really pushing it.

    My experience is that db tables of more than about 10 columns tend to be indications that something may be wrong. The only tables I have ever seen with more than 50 columns have all been badly designed (thus far). While this is not a mathematical rule, it does seem to match my experience.

    However with geological data this is not always the case. Consider a borehole in a mining setting. The hole lives at a certain location and has directional information. The borehole is identified by an ID.

    Ok, so far we have latitude, longitude, and id.

    It was drilled by such adn such a company at such and such a time and hence it still makes sense to carry all this data in a row identified by the borehole ID. If one breaks this data into multiple rows - then a merge may need to take place and this can be costly compared to the much easier job of just ignoring a few fields which are not needed.

    Ok, we add a timestamp, and a company identifier (which joins to another company table).

    So far we have:

    create table borehole (
    id int not null unique,
    latitude numeric not null,
    longitude numeric not null,
    company_id int not null references company (id),
    time_drilled timestamp not null,
    PRIMARY KEY (latitude, longitude, time_drilled)
    );

    Add to this it issue that if you have split the rows up then some of the data may be missing and some may be duplicate because the data structure allows it.

    THe above seems very straight-forward. Note the NOT NULL constraints, and the foreign key constraints.

    Note that if oyu put all company information in the same row you are adding duplicate information, and this can cause data management problems.

    Now if you had two contractors drilling the same borehole then I would agree and say split the contractor data into one table and the technical data into another. In fact that was done even though there is no possiblity of having two or more contractors on a borehole. If something funny happens one can always assign a new borehole ID anyway.

    No, I think you have that backwards. If you can have two holes drilled by the same contractor, you want to split the contractor data into another table. If you have two contractors drilling the same hole, you will need to create a pivot table for the many->many mapping.

    Look at the technical data.

    Isotopic signatures may be taken. There are over 100 isotopes alone.

    Each isotope signature record belongs in a different row.

    Various mineral assays can be conducted. The number of fields here is also large.

    Same as above.

    What happens if you want to start looking for another isotope or run a different mineral assay? Do you want to alter the schema of the table? Or rather add a field into an isotope or assay table that tells people about the test so you can store the results in another one.

    In short it is quite easy to end up with more than 100 fields.

    You still need to look up normalization.

    Ok, take the above table for borehole and add the following tables:

    create table isotope (
    id int not null unique,
    element_number int not null,
    element_mass int not null,
    PRIMARY KEY (element_number, element_mass)
    );

    create table assay (
    id int not null unique,
    name text not null,
    description not null,
    PRIMARY KEY (name)
    );

    CREATE TABLE isotope_result (
    borehole_id int not null references borehole(id),
    isotope_id int not null refere

    --

    LedgerSMB: Open source Accounting/ERP
  126. Re:IMS--Hierarchical DB harder to use? by ozphx · · Score: 1

    This problem has been solved many many times over with modern OR mapping tools. It used to be the case where your options were pretty much:
    * Stuff around with a bunch of mapping files and write the code by hand. (Time consuming)
    * Use a template code generator (shitloads of code - gets you like 90% of the way there, and then you are stuffed and its impossible to maintain)

    Now theres a third option, which is a plugin for your environment of choice that keeps some definitions synchronised behind the scenes and reads a bunch of metadata so nothings hardcoded and it can optimise queries at runtime.

    Case in point: Earlier this week we bought a copy of Diamond Binding. Configuration was pretty minimal - it might not be the most customisable of tools, but I was never one for learning mapping theory anyway. Basically the setup time was ticking what tables I wanted in my data layer - it detected all the relationships and just worked out the box. I dont recall the price, but it was a bit over $500AU - which is about 2 days salary here.

    We're a .net shop and afaik DB was for VS2005/.net2 only - but I have no doubt theres similar tools for Netbeans or whatever your language of choice is half a google away.

    So basically if you want to be tight and not spend any money, then its going to suck for you. Same as coding without an IDE. The tools are there though - and it seems that "mainstream" OR/M tools are out there - or at least "instant data layer" stuff for your average noobie VB.Net programmer thats actually got a solid OR/M foundation (instead of template codegen BS).

    Actually its probably a bad thing. Anything that makes it easier for your average coder is just going to drive my rates down /cry

    --
    3laws: No freebies, no backsies, GTFO.
  127. Again with the computerworld blah, blah by dgun · · Score: 1

    Having read Slashdot for a couple of years now, I have come to the conclusion that computerworld magazine is a trashy rag of a publication. Just from recent memory, I recall articles on text messaging replacing email and a top ten list of obsolete programming languages that included C. Now RDMS, the most critical piece of technology underlying our information age, has been marked for deletion. Flame us no longer, computerworld, with your tales of misery and woe, and just stick with what you guys do best: hailing the latest Microsoft "innovation". (My Cheerios had a strange taste this morning. And the milk looked kinda yellowish.)

    --
    FAQs are evil.
  128. SSDs and column stores... by Kjella · · Score: 2, Insightful

    Will the imminent transition to SSDs make any difference? Because row-based DBs means you're typically reading large chunks (one raw) sequentially, while columns stores means you're reading many small (number of columns) chunks for every row. I'd think that if random access time was almost none, you'd get almost the same write performance while read performance could be greatly improved because you only read the columns you need. It'd certainly make DB design easier too if you didn't have to worry about putting very light information in the same table as heavy blobs.

    --
    Live today, because you never know what tomorrow brings
  129. Search engines work this way and do more by Anonymous Coward · · Score: 0

    This is exactly how search indexing works only with proper multi-node scaling and a ton of different ways to query it and push/extract structured data from non-structured data. So congratulations to this guy and his company. This is about 20% of the software most people in enterprise search/'data warehousing' have been using for over a decade.

  130. Just like PIC... by FlyingGuy · · Score: 1

    Now there was another reply concerning the PIC system. I worked with Ultimates ( such hubris! ) version of PIC and DBasic in the late 80's. It was BLAZINGLY fast at reading data, but as others have mentioned writing data was another story.

    There is an issue of preallocating your files. In PIC's implementation there was the concept of overflow frames as well. These were to handle the growth of Multivalued fields, which were in fact row based data. Now you have to understand that this was a time when hard disk storage was VERY expensive and nothing was counted in Gigabytes.

    But even these days when you are talking about truly HUGE databases running into the Tera bytes of data both the preallocation overflow to be contended with. Now PIC had a pretty fine solution for this, but it waisted disk space. Tera Byte sized hard drives are on their way but I still do not think that the aforementioned problems are going to go away, in fact I think they will be amplified and here is why:

    Databases of yesterday, and this new proposal is in fact not new, but its rather an old idea were there before CLOBS and BLOBS. Imagine a situation when you attempt to store a CLOB or BLOB with a row oriented approach. You will either have to have a massive overflow capability or you will have to preallocate your file to such a large extent as to make the technique cost prohibitive as your enormous, yet empty file occupies almost all of the available disk space. Additionally if you anticipated CLOB or BLOB storage estimates are wrong the file(s) will need resizing and reallocation on a fairly regular basis and this is not optimal as doing so requires you to have at least the size of the current data file in empty storage available.

    I like the idea, but row oriented storage is not a prevalent design feature in any of the current databases out there, and I think for good reason.

    --
    Hey KID! Yeah you, get the fuck off my lawn!
  131. bullshit by m2943 · · Score: 1

    Row stores and column stores are an implementation issue; both can (and often are) used with relational calculus. That row stores were a stupid idea in many application areas was obvious to many people already even when Stonebreaker was still pushing them. Now, he seems to have fallen into the other extreme. The biggest thing wrong with relational databases is probably SQL. SQL has become complex, and it is hard to predict performance of particular expressions across different implementations.

    People will continue to use row stores, column stores, array stores, object stores, link stores, and simple persistent hash table, and they'll use it with relational and non-relational models. None of those approaches are obsolete, and none of them will be in the foreseeable future.

  132. Physical level vs. data model? by harmonica · · Score: 1

    I don't get it. Storing data by column instead of row is a change at the physical level and has nothing to do with the data model. In other words, storage by column can be implemented with the relational data model, there is no contradiction.

    If a column-based DBS is something entirely new, what data model does it use?

  133. No!!! Not attribute-value modelling!!! by einhverfr · · Score: 1

    Ok, attribute-value modelling has its uses. We use it in LedgerSMB in exactly 2 places (argument lists for functions called on menu element activation and system settings). However it is *horrible* when you are trying to impose a natural structure on data.

    --

    LedgerSMB: Open source Accounting/ERP
  134. not just a specialized technology by neonsignal · · Score: 1

    What the blog refers to is an alternative to row based databases, not just a database catering to a niche part of the market. The author plays up the 'faster' bit too much (it turns into an advertisement).

    However, the more important reason that relation databases fail to meet many application requirements is the lack of query flexibility (only noted in passing in the blog).

    I don't think it is controversial to say that when one designs a relational database, you need to have an idea of what queries are likely to be made on that data. If the data is highly orthogonal, the queries don't matter much; there is only one way to factor the data. But in real life, collections of data are often not orthogonal, and the information being gathered may change over time. In a relational database, this means prophetic planning. Or refactoring the database every few years, a painful process for everyone concerned.

    I would suggest that novel database architectures can be widely relevant even if they are not faster than the relational databases of our youth... Twenty years ago the relational database was necessary because of the slow speed of the hardware; now that speed has become a niche requirement, it is time for new architectures.

  135. XML DB by Anonymous Coward · · Score: 0

    The XML database will take over eventually.

    1. Re:XML DB by _Shad0w_ · · Score: 1

      God I hope not.

      --

      Yeah, I had a sig once; I got bored of it.

  136. Re:IMS--Hierarchical DB harder to use? by Anonymous Coward · · Score: 0

    The example I cited was from PostgreSQL where over 1000 parameters were generated by their precompiler. It was poorly implemented and I have not looked at it since. You are looking at libpq. I don't know how one is looking at it right now, but you would be far better off by normalizing your database, and then building object hierarchies using relational queries.

    Actually, it sounds like he is using ECPG or another such precompiler. I'm sure those things have their place, but I'd personally never use anything like that. It's just too restrictive and all of the interfaces like that I've encountered are downright archaic, mostly due to being invented 3 decades ago.

    Straight libpq is much nicer.

  137. Re:IMS--Hierarchical DB harder to use? by jadavis · · Score: 2, Insightful

    From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.

    We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.

    What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.

    You are confusing an RDBMS with a persistent storage mechanism. It's really not hard to just keep data persistent any more. You don't need a 3GL or anything fancy, just some hooks to record your modifications on permanent storage, and keep a small working set cached in memory. It's an easy, trivial, solved problem. And it was solved before relational databases were invented.

    RDBMSs do a lot more. Here are just a few advantages:

      * different applications can access the same data
      * guarantee integrity via declarative constraints that can validate against all of the data at once, not just the single record in question
      * different applications can have the same guarantees of integrity, and a bug in the first application can't break the guarantee for the second application

    RDBMSs were invented for a reason. Many, many software bugs can be traced back to a bad data state -- some invariant that was broken and uncaught. Often, these bugs are not caught until long after the insert has taken place, and often cause a cascade of new bad data and you don't find out until many records are wrong. A lot of code is imperative, and re-stating the invariant declaratively (i.e. a database constraint) helps catch a lot of those bugs.

    Trying to put these declarative constraints in the application is a bad idea. When should they be checked? And in which applications should they be checked (all of them, one would hope)? If you see a declarative constraint, are you sure it's correct, or might it have been added after inconsistent data was entered and before the constraint was actually run?

    Databases solve this by making some promises. If you put a ".. CHECK (age > 0)" on an attribute, it will check all the records before applying it, and then all the records afterward will need to pass through that constraint. That's a lot better of a guarantee, and you know it's true for all applications. Someone else's bug or quick hack won't violate it, so your application can rely on that as the truth. Same with UNIQUE or FOREIGN KEY.

    If you think about your reasoning for a moment, it's very narrowly focused on storing and retrieving single records. Presumably, anything needing to look at the data as a whole would need to read it fully into the application and process it from application code.

    You don't take into account other readers of data who might require consistent reports or anything else that needs to look at more than one record. You also don't take into account the horrible mess you have when the application is wrong and stores bad data, or when you need to do data format changes. In the types of databases you describe, almost any change requires reorganizing the data physically. In an RDBMS, you can make many changes without physically changing the physical layout.
    --
    Social scientists are inspired by theories; scientists are humbled by facts.
  138. Re:Perl Objects have both column and row DB advant by tgv · · Score: 2, Insightful

    I don't think that has anything to do with the article. That is about storage on disk, not about manipulating pointers in memory to such an extent that a programming language that should never have been invented in the first place becomes even more ununderstandable.

  139. No with comments by Anonymous Coward · · Score: 0

    If only I could add tags...

  140. Metakit is a great column-oriented database engine by fbonnet · · Score: 1

    ... and best of all, it's Open Source and free!

    Official page
    Wikipedia info

    The real interest of column-based storage is that each column only carries a given data type. This means that allocation can be optimal (up to the bit level). Moreover, memory-mapping a column gives a contiguous array of uniform data, which is randomly accessible. Scanning a column consumes much less memory than for row-based tables, since only the relevant data has to be present in memory. Here is a great intro to its view model.

    Another big advantage of Metakit (and probably other column-based DBMSs) is that it supports hierarchical tables (dubbed subviews). A column type can thus be a table itself. This means that rows can contain variable data, which eliminates most of the needs for table joins and relational algebra. Moreover the table definitions can be self-referential: one can easily represent tree data in a single table, such as an XML document or a file hierarchy. For example, the following table definition is sufficient to store a whole tree of ints: tree[value:I,children[^]] where ^ references the table itself.

    I've used Metakit with success in several commercial projects. The learning curve is a bit steep, mostly because of the sometimes obscure documentation, the radical choices, and terminology which often depart from that used in traditional RDBMS. I had to experiment a bit to make the right choices. However, once in hand, it is lightning fast. I was able to design an object-based system where all data were stored in memory-mapped storage, and were accessed in a fully transactional manner. This means that in-memory objects were only wrappers around memory-mapped Metakit storage: setting an object property actually altered the data in the right column inside the storage.

  141. No. by Anonymous Coward · · Score: 0

    Stonebraker is a crank.

    None of his previous efforts have set the database world on fire, this will be nothing different.

    I wish there was a "Bile Blog" for database stupidity.

  142. Re:Perl Objects have both column and row DB advant by jandersen · · Score: 1

    Blessed hash, you say? Are you some kind of Rastafari?

  143. Re:Perl Objects have both column and row DB advant by aproposofwhat · · Score: 1
    I think he's probably a Perl monk - anyone who can use the words 'naturally' and 'perl' in the same sentence without crossing their fingers would surely qualify ;)

    --
    One swallow does not a fellatrix make
  144. No by chriseyre2000 · · Score: 1

    Relational databases are not obsolete.

  145. Not obsolete, never implemented by Channing · · Score: 1

    Are there any Relational databases available commercially? That is, a DBMS that actually implements the relational model? Not flaming, I genuinely want to know.

    Here's an interesting site on the subject: DB Debunk.

  146. What is it with tech people? by Anonymous Coward · · Score: 0

    Sometimes, I think they shouldn't be allow to speak in public. Making statements like this simply shows he's either a) ignorant or b) bias for some reason (which has already been discussed).

    Anyone who works with databases professionally knows there is always a way between making the database fast for people entering data and making it fast for people reading the data. There's no 'right' answer and the entire statement of this article should have been laughed off - it certainly isn't news nor does it present a solution to the problem. I suspect that the overhead of adding information far outweighs the advantages of retrieval else there would already be widley available and in use solutions using it.

    Column oriented DBMS's aren't some sort of new tech. For whatever reason, row oriented simply rules the market at this time. I remember when XML was the 'answer' to everything for a while when it was first 'discovered' (My buddies and I looked it over - weren't particularly impressed - and just sat and listend to the rantings) - everyone thought (stupidly) that we'd start storing everything in XML - which is assinine.

  147. Column Store Database = Index by MarkH · · Score: 1

    Last time I checked you create Index's on columns which are regularly accessed in a serial manner for lookups

    And you make a balance between the number of index's you need versus the type of queries to be done ( more index's = faster lookup but slower row by row insertion )

    Does sound like Column Store Database is a subset of the functionality of a half decent Relational Database

  148. Everything is obsolete in the future by overlook77 · · Score: 0

    Perhaps we should consider RDBM's obsolete when they aren't widely used anymore.

  149. Re:Perl Objects have both column and row DB advant by goombah99 · · Score: 1

    I don't think that has anything to do with the article

    1) those variables can be perl-ties to disk files
    2) There are profound simmilarities.
    Namely when storing variables by column they are (generally) the same type, allowing much greater access speed, serial processing speed, and compression.
    Object oriented programming usualy has problems with speed when one must iterate over some deeply nested attribute value in an large collection of objects (for example to sort on a number). Column oriented storage already has those attributes collected in a single place.

    So this has everything to do with the original article

    --
    Some drink at the fountain of knowledge. Others just gargle.
  150. consider mysql and db2 by kpharmer · · Score: 1

    Well, mysql allows multiple storage engines. Each has its own proprietary file structures - anything from key-value pairs to isam.

    DB2 used to support ISAM (and perhaps still does in its mainframe version?), but now mostly uses one main engine (which can store data in file system directories, in raw devices or raw volumes) but also has a completely separate xml storage engine that's pretty good. And just within the primary storage engine allows many different table types:
        - mdc (multi-dimensional clustering) tables (like oracle - table is split into smaller tables on one server)
        - compressed tables (gets you about 80% compression with old, nasty rows)
        - typed tables (object tables)
        - range partitioned tables (like oracle - table is split into smaller tables on one server)
        - hash partitioned tables (partitions are split across many servers)

    So, although I'm sure it would take work - I can't imagine they couldn't also offer a column-oriented table.

  151. Do I misundersand something here? by brokeninside · · Score: 1

    It looks to me like this argument is conflating an abstraction (relational theory) with its physical implementation. An RDBS shouldn't really care about whether data is stored by rows or columns.

  152. but keep in mind how a warehouse really works by kpharmer · · Score: 1

    Ok, so aside from warehouse vs mart discussions - when we're talking about a data warehouse we're generally talking about a star-schema. And in this data model you've typically got:
    1. many dimension tables
            - columns: many wide character columns
            - rows: few (generally less than 5,000)
            - examples: time, location, product, organization
            - bottom line - column-orientation would be great here - except these tables are typically only a few mbytes anyway and are easily cachable.
    2. very few (maybe 1) central fact table
            - columns: a couple dozen integer columns - mostly just keys to the dimensions
            - rows: many (often billions)
            - examples: product sales, security events, etc
            - bottom line: column-orientation doesn't buy you much - since the columns are so small and are just integers.
    3. a dozen or more summary tables (pre-aggregated subsets of the fact table)
            - columns: less than a dozen
            - rows: typically 1% or less than what's in the fact table
            - bottom line: column-orientation doesn't buy you much here either - since the columns are small and the number of rows aren't that significant.

    Now consider:

    a. partitioning: data is often partitioned by time ranges since time-series or current-version queries are the most common. This works extremely well with row-orientation since all the rows can be put into their own virtual table. But it won't work with column orientation. So, for example when you need to get all data for march 2007 - you won't be able to just scan that 2% of the data, you'll have to scan 100% of the column-oriented data.

    b. compression: in db2 you can expect to achieve 80% compression of your fact tables. All those integers are just simple recurring patterns. Column orientation would certainly improve on that. But by how much?

    c. writing: data warehouses may be read-mostly, but more and more there's a desire to get close to real-time reporting. The warehouse that I support gets loaded every hour for one set of data, and every minute for another. And users expect one set of data to be available for reporting within 30 minutes of being created. That set comes in batches of 100,000 or more rows. If the write-delay was sufficient we wouldn't be able to meet that requirement.

    So, I'd consider that there may be some advantages to column-orientation. However, if you've got a good star-schema data model, are using partitioning and compression - then the benefits shouldn't be that significant and the loss of fast writes and easy partitioning may completely offset those meagre benefits.

  153. Just one comment... by foniksonik · · Score: 1

    Anyone for a game of Tetris? I'm betting it's Stonebraker's favorite as the goal of each stage is to rack up points by using columns to build and destroy rows....hehehehe

    --
    A fool throws a stone into a well and a thousand sages can not remove it.
  154. A small explenation by einhverfr · · Score: 1
    First, the reason why many or most languages' floating point types is that they internally represent their data as binary numbers. The rounoff error occurs when you convert a decimal floating point number into a binary floating point number. No base system is entirely free from these. For example, one woul dnot be able to convert 0.1 base 3 to base 10 without truncation.

    Not all languages have this problem, however because it is not the only way to procede. Basically in any base number system, you have an ordered list of digits each of which is related to its neighbors by being a multiplier to the base raised to a power related to its position. So if I write: 1034.23, what I really have is 1x10^3 + 0*10^2 + 3*10^1 + 4*10^0 + 2*10^-1 + 3*10^-2. Addition is then straight-forward, and multiplaction is based on its distributive property. Since any integer digit can be stored in binary, one can build arbitrary base systems (and operate with them) even though the data is actually stored in binary. Decima-safe floating point systems tend to use a power of 10 as its base. For example, PostgreSQL uses base 1000 for its numeric data type (larger bases are more efficient when it comes to binary representation because you have fewer bits lost).

    I know that C and Javascript use floating point types which are not decimal-safe. Perl, Python, and Ruby seem to have decimal safety built-in.

    AI would also note that arbitrary precision libraries can give you roundoff errors too but for a different reason. These libraries are generally designed to work primarily with scientific measurements where you have to take into account the accuracy and precision of those measurements. Operations like rounding numbers can have unexpected effects down the road if you are not familiar with these.

    For example, suppose we take:

    $float = Math::BigFloat::new(10.011);
    $float->bfround(-2); # produces 10.01
    $float->bdiv(10); # produces 1.00 due to the fact that we now have a set precision (-2) on the number!
     
    $float2 = Math::BigFloat::new(10.001);
    $float2->bsub($float ); # produces 9.00 because of the fact that float has a precision of -2!
    Hope this makes more sense.
    --

    LedgerSMB: Open source Accounting/ERP
  155. Re:IMS--Hierarchical DB harder to use? by NoOneInParticular · · Score: 2, Insightful

    Ouch. And this exactly why SQL should die as the primary interface to the RDMBS. How the hell is my compiler going to help me find simple typing errors when the interface to use the RDBMS is built upon ... strings! You are generating code from code, that's not integration, that's a hack! It's useful, as there is no sound other way to approach an rdbms, but it's not a pretty sight.

  156. Egads! by swusr · · Score: 1

    ... having worked mostly in Oracle for the last year or so, I've been missing SQL Server.

    This would be the first time I read such a comment.

    I went from SQL Server to Oracle/PostgreSQL and would never consider going back, ever.

    T-SQL is the most obtuse PL I've ever had the disgrace to use. Compared to it, both PL/SQL and PL/pgSQL are a breeze.

    --
    - Sw Usr
    1. Re:Egads! by Doctor+Faustus · · Score: 1

      What was the last version you used? Table variables were introduced in 2000, and they're a big improvement. I haven't used 2005, but the error handling is supposed to be drastically improved in that version.

      Anyway, I won't deny that PL/SQL has some major advantages over T-SQL. The "For record In (Query) Loop... End Loop" construct is wonderful, you can just directly read and write files without dealing with DTS nonsense, you have TCP and HTTP libraries (I've actually written a couple of web service clients in PL/SQL), anchored variables are very nice, triggers are more versatile, etc. It's clearly a far superior procedural language to Transact-SQL.

      Oracle's downsides are more subtle than T-SQL's very limited feature set. While the "@" sigils in T-SQL variable names seem silly, they make name collisions between columns and variables impossible. T-SQL is just one language, while Oracle programming is split between SQL, PL/SQL and SQL*Plus, with hard boundaries between them, and different functionality in code depending on whether you're working in SQL*Plus, SQL Developer, TOAD, ADO/JDBC, or some other client. While Oracle is better at procedure work, T-SQL lets you do a lot more in set-based operations, with table variables and much easier use of temporary tables. In T-SQL, whether you're in a stored procedure or not, any SELECT query with no target is just returned to the client connection (especially handy for ad-hoc data analysis in Query Analyzer), versus a reference cursor system in Oracle that I still haven't figured out the syntax for. Oracle makes everything all caps internally, and displays all caps, forcing names into a__naming_style I outgrew when I was 17 years old and have avoided since I discovered ABetterOne. Oracle names can't be longer than 32 characters. Blank strings in Oracle are automatically turned into Nulls. PL/SQL has a real binary integer type, but Oracle SQL doesn't, so it can't be put into a table, or used as the return type of a function you want to use in a query. Oracle SQL doesn't have SQL Server's (admittedly non-standard) "Update Table1 Set Field1='' From Table1 Join Table2 On Table1.ID = Table2.FK" syntax, necessitating more subqueries and/or loops instead.

      I won't say T-SQL is better than PL/SQL right now, but I do think it's going to improve more in the future by adding features, while Oracle is basically stuck with the poor model they've got.

  157. Well, bless my hash! A fellow Perl programmer by Oryx3 · · Score: 1

    Your comments are well received. Of course, Perl hashes are in-memory data structures, and in-memory structures are infinitely more flexible than on-disk structures (not just in Perl). The topic is actually about comparing methods of on-disk storage.

    Of course, don't expect any Java weenies to understand the beauty and flexibility of the inside-out objects technique.

  158. SQL Servers just need multidimensional keys by tjstork · · Score: 1

    I wrote a time series database, Commodity Server, and I've come to the conclusion that the only thing a SQL database really needs to do this is to have the notion of a key as a range, rather than a discrete value.

    Say, for example, I want to store:

    5/1/2007 - 6/1/2007, 22
    5/8/2007 - 5/10/2007 10

    I would expect the history to be, on query:

    5/1/2007 - 5/7/2007, 22
    5/8/2007 - 5/10/2007 10
    5/11/2007 - 6/1/2007 22

    (neglecting time).

    Storing that in SQL Server can be done in several ways, and, none of them are all that great. One way would be to store each row as 5/1/2007 - 5/7/2007, 22, and so forth, and another might be 5/1/2007, 22, 5/8/2007 10. In both cases, you get the problem. There's no real way within SQL to ask "4/1 - 10/1" and get a sane answer back.

    To get that, you really just need to have something that hacks on the little tiny part of how keys are searched. You need to do what time series databases do, and understand ranges of dates. But, really, ultimately, to do it the right way, you need a sort of relational algebra that describes keys not in terms of discrete values but as linear functions of x or at least ranges of values across 2..n dimensions. Applications would be far greater than just time series and stock markets if you solved this problem. It would be good for maps, neural networks and all sorts of other problems as well. And THAT would make relational databases as we know them today obsolete for certain jobs.

    --
    This is my sig.
  159. Re:IMS--Hierarchical DB harder to use? by Anonymous Coward · · Score: 0

    You are either trolling or you need to fire the DB architect who designed that. THere is *no way* that a 100-column table is good DB design.



    Never say never. I have parts that I am legally required to keep nearly 300 pieces of information on for each and every part. I have a table with just over 300 columns that is 3NF. It is good DB design. Of course if some of the information only applied to certain types of parts then it would be possible to normalize and use smaller tables.



    I've seen several other systems that needed huge tables because every item in the row needed every piece of information. It is not bad design. It's just a fact of life. Calling someone a bad designer because they're accurately modeling the real world is ridiculous.


  160. of course that's just my opinion; I could be wrong by Gazzonyx · · Score: 1
    Funny you say that... I was just thinking about an hour ago how much I hate SQL. The exact thought that shot through my mind was "SQL can only be qualified as a language when we consider Pig Latin to be a language!", or something to that effect.

    Upon thinking about it, I've realized that it probably has to be somewhat ugly, being that it's a glue layer. We're stuck interfacing on both sides of the call, and interfacing interfaces to interfaces, via an interface (I know... just let it go, you know what I'm saying) is ugly because you're trying to create that piece of the puzzle that makes everything else fit and you're doing it in about 4 dimensions at the same time, sometimes blindfolded (see DAO documentation if you can find it).

    And on top of that, there are 'dialects' for SQL, and JDBC is a slow implementation of a slow protocol. Check out ODBC vs. JET with a 3.6 MDAC, JET has committed a SQL query before ODBC has the connection fully open (I'm exaggerating, but not by much) - and JET is from the Access '95 era! What we need is to stop gluing interfaces and working between 5 layers. Let's simplify this mess by starting fresh with an RFC or something. 1 language, 1 engine programmed in that language, 1 dialect, 1 database object with which to interface.

    I know it can't be done, and I know it's a bad way to do it, but it would make my life so much easier than trying to fix old Access '95 (upgraded to '97, upgraded to 2K) DAO code. I'm so sick of memory leaks from recordsets of workspaces of databases that haven't been explicitly closed in the correct order or scope (not that Access really has correct scoping, but we can pretend for my sanities sake) or whatever. Rant over. Thank you, I feel better, now... what was the question, again?

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  161. Re:IMS--Hierarchical DB harder to use? by einhverfr · · Score: 1

    I was not aware that mathematics changed depending on what the regulations were. The normal forms are defined by cold hard mathematics.

    One of the main points of third normal form is that there are no transitive functional dependencies. While it is not mathematically impossible that you could have a 300-row table which has no transitive funcitonal dependencies, I would find it highly unlikely.

    Suppose, for example, that you are required to keep the supplier's current address and tax id for a part (and you only order from one supplier). First normal form would allow you to put all these in the same table. Third normal form would not because you have a transitive functional dependency. Part_id -> supplier_tax_id -> supplier_current_address. Hence you would have to break off the supplier information into a new table.

    --

    LedgerSMB: Open source Accounting/ERP
  162. Re:IMS--Hierarchical DB harder to use? by Anonymous Coward · · Score: 0

    Are you trolling or simply just inexperienced? If I have >200 floating point measurements I have to keep for each part that is described by the row, how could I possibly normalize further? Are you claiming that because I have a 5.51111 in two different rows that I would be required to move that to a single row in another table to be 3NF? That is absolutely not true.



    Suppose, for example, that you are required to keep the supplier's current address and tax id for a part (and you only order from one supplier)



    Nice strawman. It's annoying to argue with idiots. They get into bad habits like this. They get tired of losing arguments so they get into the bad habit of building strawmen to knock down that are so ridiculously simple that even they can understand their made-up issue completely. Of course your simple example is true. The real world is much more complicated. In most nontrivial systems you will have tables with a large number of columns that are 3NF.


  163. Re:IMS--Hierarchical DB harder to use? by einhverfr · · Score: 1

    Are you trolling or simply just inexperienced? If I have >200 floating point measurements I have to keep for each part that is described by the row, how could I possibly normalize further? Are you claiming that because I have a 5.51111 in two different rows that I would be required to move that to a single row in another table to be 3NF? That is absolutely not true.

    I said that nothing mathematically precluded the possibility, but that I think it would be very difficult to imagine a situation where even a 100-column table would not have transitive functional dependencies within the table.

    If you are telling me that there are definitely *none* of these, then I will take your word for it. BUt if it were my application, I would look for transitive dependencies.

    Secondly my example, was only an example of the math to illustrate the concept of transitive dependencies. I will now give a clearer math example.

    Basically, in algebra, suppose Y is a function of X. This means that for every X there is *exactly* one value of Y. Correct? It also means that Y is functionally dependant on X (i.e. Y -> X)

    Now, suppose that at the same time, Z is a function of Y. This means that for every Y there is exactly one value of Z, correct?
    (Z -> Y)

    This also means that for every X, there is exactly one function of Z because functional dependencies are transitive. Hence in this case, 2NF would allow:
    (Z -> Y -> X implies Z -> X)

    create table coordinates (
      x float not null,
      y float not null,
      z float not null,
      primary key (x)
    );
    However, to reach 3NF, we should do:

    create table x_y (
      x float not null,
      y float not null,
      primary key (x)
    );
     
    create table y_z (
      y float not null,
      z float not null,
      primary key (y)
    );
    Hope this helps a little more.

    Again, there is no matheamtical limit to the number of direct functional dependencies one can have, but in general, you are going to have to start justifying dependencies when you start to have a lot of columns.
    --

    LedgerSMB: Open source Accounting/ERP
  164. New ODBMS debate offers $300 purse by LeeAn · · Score: 1

    A beta site called DemocraSay has posted a purse on this very subject, with a $300 purse split 50/50 among the winning author and the commenters. The competition lasts through 9/16. If you're interested, give it a try: http://www.democrasay.com/node/202. / L