Slashdot Mirror


Streaming a Database in Real Time

Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."

6 of 194 comments (clear)

  1. Seriously, Michael by Anonymous Coward · · Score: 4, Insightful

    How much Does Roland Piquepaille pay you to link to his shitty articles?

    It must be alot since the pay for play is so obvious.

  2. Has nothing to do with relational databases by Wesley+Felter · · Score: 5, Insightful

    If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.

    1. Re:Has nothing to do with relational databases by Anonymous Coward · · Score: 5, Insightful

      I'm not sure that is an acurate critque of Roland. He likely did RTFA -- he just didn't UTFA

  3. Re:Duh by Anonymous Coward · · Score: 4, Insightful

    You've possibly misunderstood the point of this software.

    At no time is the data 'stored' in any way .. As it's collected (or INSERTed) it passes through a collection of preconfigured SELECT statements, and then disappears. There are no tables full of data, only tables as defined structures for handling incoming and outgoing data.

    You cannot query anything that happened in the past, because the program doesn't remember it.

  4. Read the article before posting by bigtallmofo · · Score: 5, Insightful

    Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.

    --
    I'm a big tall mofo.
    1. Re:Read the article before posting by kpharmer · · Score: 4, Insightful

      Right, and this solution has its own limitations within this context: namely that if you crunch your data real time, rather than read it from a data store:

      1. if you decide to add a new analytic you have to start with new data - you can't deploy a new analtyical component and against historical data.

      2. if your machine crashes - it takes all your accumulated analytical data along with it. Maintaining a distribution of activity calculated every 5 minutes over 90 days? Great, but after the server comes back up your data starts all over.

      3. if your analtyical component needs to run against a lot of history each time (ex: total number of unique telephone numbers accessed by day, calculate rolling median) then you'll have to maintain that detail data in memory. As you can imagine - you can *easily* identify calculations that will exceed your memory. So, to tune you'll be forced to keep your calculations to relatively recent data only.

      ken