Slashdot Mirror


The Future of Databases

gManZboy writes "Ever wonder where database technology is going? This is something that Turing award winner Jim Gray from Microsoft has given a lot of thought to. He recently published an article in which he looks at the many forces pushing database technologies forward, and what those new technologies will look like. Gray writes, 'the greatest of these [research challenges] will have to do with the unification of approximate and exact reasoning. Most of us come from the exact-reasoning world -- but most of our clients are now asking questions that require approximate or probabilistic answers.'"

25 of 315 comments (clear)

  1. Why complicate things so much? by bigberk · · Score: 5, Interesting

    How many times have we heard of huge sites going down because databases become corrupt or unrecoverable, or of the huge resource strain (memory and CPU) from a large database?

    In my opinion, the future of databases is nothing so complicated as pitched here -- but rather a move to simpler, more reliable back ends where the filesystem is the database. This is certainly the vision pitched by Hans Reiser and reiserfs, which aims to put more database like intelligence within the filesystem. So you eliminate extra unnecessary layers that just eat up resources and create fragile databases.

    1. Re:Why complicate things so much? by quinnharris · · Score: 3, Interesting

      Hans Reisers vision is about unifying namespaces (filesystem, relational database, XML, etc...) by providing the functionality in a filesystem to make this reasonable. In otherwords, making the file system better than current databases.

      Do we evolve the file system into a database (Reiser approach) or evolve a database into a file system (Microsoft WinFS approach)?

    2. Re:Why complicate things so much? by Nutria · · Score: 2, Interesting

      <i>databases that are 150 GB large with hundreds of thousands of records</i>

      That's not very big. It's down right small, in fact.

      These figures, on one of many systems I manage, are about 30 minutes old. And they don't include index space, rollforward logs, etc, etc.

      Names have been changed for privacy, of course.

      TABLE_NAME CARDINALITY TOT_BYTES
      TABLE_1 850,719,662 195,665,522,260
      TABLE_2 756,309,106 223,867,495,376
      TABLE_3 317,181,446 72,951,732,580
      TABLE_4 179,099,344 11,462,358,016
      TABLE_5 103,419,546 4,343,620,932
      TABLE_6 95,075,479 9,222,321,463
      TABLE_7 67,378,918 20,820,085,662
      TABLE_8 64,940,525 12,598,461,850

      Since I am fully aware that "my" databases are no where near the biggest, this is not the beginning of a pissing contest.

      --
      "I don't know, therefore Aliens" Wafflebox1
    3. Re:Why complicate things so much? by Anonymous Coward · · Score: 1, Interesting

      I think really that WinFS is Microsoft's implementation of Oracle's iFS, the Internet File System. Remember that? (Oracle 8/8i days, but still installable from Oracle 9i).

      Again, until there are sort of reliable and meaningful machine-based classification engines that can peep into all file types and extract and index some sort of "meaning", then it's going to always rely on a human to do the classifications.

      Since no one uses document properties in MS office documents, even though Office has its own indexing engine, that other COM structured storage docs can do the same thing, as well as XML files, etc., how is a computer system going to keep up with things, especially when a relevant meaning or context today is totally different in 6 months?

  2. A real problem comes full circle by Anonymous Coward · · Score: 5, Interesting

    Data is data. Just data. Save it, read it, sort it how you like. Efficient results mean having rapid, low-latency access to data.

    Add code to it, and you have data+code.

    OF course, code is data, and thus data can be treated as code, and handled by other code. LISP does this moderately well.

    But you can't avoid the fact that, as it stands, databases are just engines for keeping your data structures outside your code, or when you add code to them, engines for reading your data structure for you so that you don't have to think about how to do it. ... except that you still do, because SQL isn't a way of avoiding logical errors. ... and that they still don't save time. At best, they allow for some parallelism, external access to the data, and a separation of concerns.

    I'm getting rather tired of the fad that databases should be tacked on to everything, ranging from a shopping list to guidance systems. When did adding overhead become the mark of skill?

    1. Re:A real problem comes full circle by Anonymous Coward · · Score: 1, Interesting

      Thank you for your kind assessment of my abilities; something I had had great difficulty in achieving with the help of professional training.

      Courtesies aside, I'm very well aware of the uses of encapsulation. I am in fact in the process of writing just such a system right now (oracle, python to handle a CGI frontend, and no, the choice of technologies can not really be said to be my own).

      I have been soul-searching in my effort to establish what, precisely, oracle brought to the party. I identified two changes effected by its presence:

      1) It permits readily scripted queries on accumulated data. This isn't as big a gain as it might seem, since the same could be said of a directory full of flat files. Or files in a format suitable for reading in and analysis by a custom-written program. But I will allow that this benefit does exist.

      2) It adds a lot of overhead. Yes, oracle is a beefy beast. So is MS-SQL, and Informix, and DB/2, and Sybase, and many others.

      The trouble is that they also add overhead in terms of manpower. All of a sudden your system administrators have more machines to watch. (You do run critical data on redundant systems, don't you? I mean, it is critical, right?)

      They also add overhead in terms of cost. The packages themselves are expensive, and the machines to run them are expensive, and the electricity is expensive, and the cooling is expensive, and the floorspace is expensive, and when you need more, simply expanding isn't all that easy any more ...

      They add a need for DBAs. It has been amply observed that your DBA can't be a fool; it's not a game for amateurs. Especially not with enterprise systems. Especially not these days.

      And so on, and so forth. I'm sure anyone as deeply knowledgeable as you plainly are with the day-to-day realities of fortune-50 companies will be able to add quite a few more line items in terms of cost, man hours, internal communication needs and so on.

      And what did it buy? Anything which can't be achieved with a few well-chosen data structure libraries and routines? Anything which a developer, for all the hair-pulling over SQL's quirks on different platforms (and the quirks are there, rest assured) could not have done for the same trouble?

      Maybe, if you're storing terabytes of data and utterly require hard guarantees of retrieval times to any given piece of data. I have worked in such an environment, and the funny thing is that for all the vaunted scalability and power of these enterprise database systems ... they have real trouble scaling to that level. So does everything else, or almost everything, but I find it interesting that at that level even the mighty Google rolls a lot of their own code. Specialised need? Yes, but look how flexible they've been in application of it.

      Ultimately, I'm not saying the emperor has no clothes. He just looks rather natty in his polkadotted speedo.

    2. Re:A real problem comes full circle by Anonymous Coward · · Score: 1, Interesting

      Uh, no, you misunderstood the problem. The point of fuzzy database queries is to get queries that return faster or give early partial results. You can't do that efficiently outside the database architecture because you don't have access to all the internal stats the DB is using.

      So you can't just "add some code between you and the database."

      This is a real-world problem, not just something he made up. Today we have enormous terabyte databases that can take forever to do a query, so finding ways to get faster, but approximate, answers is something I know many people are researching currently.

  3. Re:Umm, Yep! by Sinus0idal · · Score: 4, Interesting

    Imagine for example, you want to use a database to store information about packets flowing through your network. Thats all dandy on normal network links, but if we are talking about a multigigabit link, it is likely that your hard disk can't keep up with storing that data. Or that the hardware to do so would cost too much. So instead you could take every second packet and look at that, and approximate. This particular example is refering to data stream management systems.

  4. I predict... by rainman_bc · · Score: 3, Interesting

    Better indexing, faster lookups...

    That's... about... it...

    Object relational was the "new thing" that didn't really take off as well as they'd hoped.

    Hell, I work with people who still can't handle compound keys and joins well...

    --
    09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
  5. I want clustered databases for high-availability by SpecialAgentXXX · · Score: 5, Interesting

    The "next great advancement" in databases will be when I can setup 2 or more linux servers and have them act as a single database server. Our database server is the most expensive item in our datacenter because it's an N-way IBM server.

  6. It warms my heart... by Baldrson · · Score: 2, Interesting
    To see Bill Gates' organization of "really smart people" spinning their wheels so energetically really warms my heart.

    I can see they've no hope of being any competition at all come the real db revolution.

  7. Re:moving past relational model? I thinketh not by Spiked_Three · · Score: 2, Interesting

    I doubt if he is "confusing two issues" as he probably knows a lot more about it than you and I. He may indeed have a different opinion, but that is not confusing two issues.
    I will admit I was around before relational databases. Back then there was good old hierarchical databases, and they did a damn good job of what a relational database does 50% of the times these days. The problem was the other 50% they couldn't do. So along came relational databases. Now to think that there is nothing beyond relational is like IBM saying no one will ever need more than 16 colors on their PC, shortsighted.
    The part I really wish would die is SQL. It was invented as a way end users could enter data queries. It became adapted to be imbedded in COBOL programs, and the fact that it's at the center of most enterprise applications today is hideous. I don't care when the next database technology comes along, but please get rid of the SQL dinosaur.
    Personally, I'd just as soon get rid of databases, I have already designed my business logic, why must I now design and code ways to store objects? Yes, I know, some technology is already out (and I use it), but it is not mainstream yet. That is what I would like to see sooner, persistent object oriented databases become mainstream.

    --
    slashdot troll = you make a compelling argument I do not like the implications of.
  8. Hmmm Databases by Chitlenz · · Score: 4, Interesting

    As a 15 year DBA, currently we are working with some of the would-be far reaching (to most people) concepts described in this paper. The idea of a TRUE SQL Debugger is like, so big it's sick. Quest offers some tools that kinda sorta do this for Oracle systems, but a true realtime debugger would save me YEARS of work during my career as a SQL coder. For an Idea of scale, The last replication project I wrote for an employer propogated over Oracle DB_LINKS via triggers to synchronize a dataset in two cities, log it, and do something with errors. Because this particular system was a Peoplesoft installation, it was a subset of 6800 tables and 15k lines of code give or take some triggers, with NO debugger. OMG, it's like a "finally" moment to have someone even claim to be fixing this soon in their architecture.

    Next, there was some inane reference to reiserfs above, which clearly ignores what a database fundamentaly both is, and is becoming. It really began (and I hate to admit this as a former Solaris/Oracle admin) with SQL Server 7 and Oracle 8, and the concept that a database should be object programmable. Reiser is not going to be streaming still frames of image data fast enough to a remote client to rebuild seamlessly into a movie, for instance. Or recalculate all of a company's business logic for point of sale systems so that, for instance, the wrong type of credit card gets rejected, or so a supply chain gets populated, the list is endless. Reiser, and for that matter VFS and the other myriad of database enhanced filesystems, are tools. Good ones, but tools...

    It's interesting to note that MS has finally figured out that the "n-tier" was a dumb idea. It's almost like, well you take all this shit, then sell it through a middle man, but expect to not have to pay him anything for brokering. Like, duh. We actively benchamrked this process, in fact, and discovered that it does, not suprisingly, take time to pass data through an extra server.

    Workflow is life. It's what make this page exist (SD is I believe run in MySQL). The idea of publishing-subscribers with atomic transactions is hardly new, but I agree with the authors that this is the direction of the market, simply because businesses now are getting spread all over. Read - If your job just went to India, learn to be a DBA, cuz when all that shit they sent over there comes back, you can bet its going to be a mess (and is a mess actually already, which is why, in particular, people in ERP fields that intertwine with mine(as a DBA) demand and recieve very large salaries, 200$US an hour is not unusual). The reason this particular ramble is relevant, is because lots of global companies are either looking at, or are already implementing, the idea of data grids, where all the data servers inside a global network stay in sync. Suzy the secretary checks out a document in Baltimore, and that document flags as in use in Madrid through transactional replication within a kind of database trust-relationship network. It's a very very good way for companies with lots of data to keep it all together, but today it's still a pain in the ass to manage.

    Vertical partioning is pretty much worthless except to data warehousing installations, most of whom are probably running on strong equipment already (to have that much data). Not to mention, I believe (I'd have to check, since it's not a feature I'd really use) Oracle's 10G product allows for this already if you really want it. Materialized views is another point here that raises my hackles. This guy is writing about the wonder of materialized views and column partitions, which ARE a cool performance cheat in large systems, but make no mistake that by the time you get to this point, you are probably rearranging deck chairs on the titanic anyway. Essentially Materialized views precache SQL resultsets into a temporary table which gets constantly updated so it can always provide a full resultset without having to parse the parent table. This is processor and space expensive. Vertical par

    --
    Imagination is the silver lining of Intelligence.
    1. Re:Hmmm Databases by julesh · · Score: 3, Interesting

      [XML] isn't bad as an over-the-wire
      protocol.


      Yes, it is. I've worked on a project that allowed offline modification of a database by replicating a copy to user's PCs, and it originally used XML as the format for data transfer. We got a 30% speedup by switching to tab-separated variables with a line of metadata at the start of each chunk of the stream. Any technology that costs that much in overhead and provides little or no perceivable benefit is a waste of time. (Of course, if your data isn't relational, this is probably not much use to you, but then... what are you storing it in? XML documents?)

      The only justification for XML is that there are a lot of tools out there that work with -- I use it is an intermediate interchange format between different environments because the libraries available make it easy with just about anything I want to access the data with.

  9. Re:Atomicity in filestores is a great benefit by Unordained · · Score: 2, Interesting

    ... oh, and the file system should also verify the integrity of the files, and the system as a whole -- make sure that your changes are "allowed" (both state-constraints and transition-constraints), make sure that everything works together (imagine your FS making sure that your changes to your mail server config match up with your changes to the user list?) ... ... oh, and allowing multiple users to modify files at the same time, and know enough about the file formats to reconcile possible conflicts (not stupidly like CVS does, where everything is either binary or treated as sequences of lines of text delimited by a carriage return) ... ... oh, and maybe we should resolve the issue of putting the type of the file in the filename (variables have names, values have types) ... ... oh, and don't forget support for, say, two-phase commits, nested transactions, and all those other things ... which, by the way, Jim Gray has one of the authoritative books on.

  10. Re:moving past relational model? I thinketh not by Anonymous Coward · · Score: 1, Interesting

    comments are better when the articles are actually read

    True, but this article is very difficult to read, it's just a barrage of noise.

    He seems to be confusing two issues-

    He is confusing *many* issues: Model. Implementation. Language syntax. Application features. He even talks about web browser architecture and query optimizers! It's all over the freakin' map.

    A database is a way to identify unique things by their properties, literally, a way to distinguish "things" from each other.

    A database is a set of assertions about the real world.

    The relational model is not going anywhere- and that's what every database is - an implementation of the relational model; some better than others.

    Exactly, the relational model is a theoretical model for data storage, manipulation, and retrieval, perhaps the only one, and any data storage/retrieval system can be described as some subset or sloppy implementation of it.

    Gray is a great researcher, and maybe he was talking about the need to make current databases PRODUCTS "better"

    I saw nothing in that paper that makes me think he's "great" at anything. And he's definitely not a researcher, he's a VENDOR, no matter what his title. His paper was simply incoherent at worst, and at best, a metaphor for the state of the IT and DB industries today.

  11. The Future Of Databases? by brian_olsen · · Score: 2, Interesting

    I see the future now and it will happen in two phases: getting rid of SQL and then replacing it with something half-way decent (like a properly implemented relational algebra.)

  12. The future of databases is... no Database at all!! by vhogemann · · Score: 2, Interesting

    Picture this... memory nowdays is a hell lot cheaper than a full Oracle Licence. So, instead of investing on a DBMS why not buy massive quantities of ECC memory and keep all instances of your data in-memory for near instant access?

    Crazy idea, huh? What if I said that this can be as fast as 8000 times faster than Oracle? And 3000 times faster than MySQL!

    Crash recovery? No big deal, keep a serialized version of your in-memory-objects, and a transaction log and you're set!

    Read more at:
    http://www.prevayler.org/

    --
    ---- You know how some doctors have the Messiah complex - they need to save the world? You've got the "Rubik's" complex
  13. Google or something like it by Anonymous Coward · · Score: 1, Interesting
    Will Google or a post-Google answer questions like:

    Show me the cost of airline tickets when The Who were touring during winter and compare that, inflation adjusted to airline tickets today that I can purchase now.

    Don't laugh, these people have one goal in mind - answering questions based on data on your disk or on the web.

  14. Database as file system by bananahead · · Score: 2, Interesting
    The only force that can change the nature and architecture of current database technology is a fundamental change in the way they are used. Change the requirements and the technology will change to meet the new requirements. Change the requirements in a radical way and you will get radically new technology.

    The use of a database as a file system will require radical new technological advances in database theory as the current methods break down under the new requirements. The functionality of the file system will change as the capabilities of an underlying database are realized. The two forces together will create an interesting discontinuity in the industry, the kind the venture capitalists look for.

    It's all good. Pray for WinFS.

    --
    A most overlooked advantage to owning a computer is if they foul up there's no law against wacking them around a bit.
  15. Re:In other words ... by jallen02 · · Score: 2, Interesting

    Tragic really. I have seen it as well. I always held CS people to a higher standard for coding.. what with the hours of courses you spend just learning how to design and implement basic things like data structures. How do people make it out of college without being able to code these things? It always amazes me.

    Jeremy

  16. What ever happened to OODBs? by elgee · · Score: 2, Interesting

    At one time, I though object oriented databases were going to be the next big thing.

  17. A difference between "DBA" and "clown" by Moraelin · · Score: 4, Interesting

    Yes, a good DBA and/or Database Developper is a very valuable addition to any team.

    The problem is that in a lot of corporations (e.g., the one I work for), they -- and all other admins -- have been taken and put in a different building. And more importantly they don't actually have to cooperate with any team.

    Their job's goal is no longer the same as the developpers: to get a program done by a deadline. They've been turned into a bureaucracy whose only job is to see that the servers run. No more.

    That's an _awful_ job description, because it directly makes the developpers their enemy. I'm not even talking "slippery slope", but direct cause-effect. Instead of being "the other half of the team that will make this program work", developpers just become "those assholes who crash our servers."

    It's not hard to get from that point of view to pathologic cases like the admin that limited our productive servers to 3 connections per server. He kept his own servers running perfectly (which is his job description) at the expense of making the company's productive programs grind to a halt (hey, it's not in his job description to care about those.)

    That's the problem with that kind of internal organization. As one BOFH-wannabe once said "The source of the problems on my network are the users. Would you prefer that I cut your access? Then there wouldn't be any problems any more." Another one threw a hissy fit that we dared ask that he does his job, during work hours. Yeah, how dare we bother him by asking if he could please reboot the test server he's managing.

    That's the underlying problem. Instead of providing a service _to_ the users, a whole caste has been created whose job is to serve the computer, and the users are just those pesky assholes disturbing his majesty the computer. That's a very unproductive situation to create.

    Worse yet, a bunch of companies invented the devastating practice of internal invoices. The admins in one department won't even go to the toilet unless they can send an bill to another department for it.

    They won't even talk to each other (e.g., the WebSphere admin telling the DBA and the Unix admin that he needs a Solaris patch and a newer version of Oracle for the "transactionBranchesLooselyCoupled" setting.) No, you have to personally talk to all three of them, because otherwise they can't send three bills for it.

    And predictably, they'll do _nothing_ more than the bare minimum that was requested and billed. E.g., you have to tell the DBA explicitly to set this and that, to this and that value, because she won't do that on her own. Which basically means you already need to have all the knowledge of a DBA, and she is just acting as a proxy over the phone... and sending you a bill for it.

    Basically if you're not that kind of a DBA, you have my respect. All I'm saying is that when you read about "teams of clowns" or about people who'd rather invent their own storage than deal with a DBA... well, they're not necessarily avoiding _your_ kind, but the kind of clown I've described above.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  18. The future may be open source databases by Anonymous Coward · · Score: 2, Interesting

    There's a lot of talk in database circles about the fact that open source databases may do to commercial databases what linux did to commercial unixes. i.e. wipe them out. Recently LazyDBA one of the most well known websites for database administrators started supporting open source databases. Add to that the fact that Oracle is going on an app buying fest (Peoplesoft and now maybe Siebel), database people see that the commercial database in danger.

  19. Re:Pretty long by Woody77 · · Score: 2, Interesting

    (this is serious)

    Aside from the access mechanism on top, really, what's the difference? I've used both (OODBs heavily), and really, I've always looked at it as a bunch of tables with columns for member variables and rows for objects.

    Is it really all that different under the hood? Or is this more marketing hype/spin?