Slashdot Mirror


Streaming a Database in Real Time

Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."

19 of 194 comments (clear)

  1. Seriously, Michael by Anonymous Coward · · Score: 4, Insightful

    How much Does Roland Piquepaille pay you to link to his shitty articles?

    It must be alot since the pay for play is so obvious.

  2. speed focus by Random+Web+Developer · · Score: 2, Insightful

    if they are so much focused on speed, couldn't this be the mysql killer for web applications that don't need funky features but where concurrency and speed are important

    --
    Artists against online scams http://www.aa419.org/
    1. Re:speed focus by Aeiri · · Score: 3, Insightful

      One question might be...why write the data directly to a database initially? Why not utilize a faster format, then write to the DB when things have slowed down (i.e. caching)?

      If the server crashes while it's still in "write later mode", then data will be lost.. Since most of the time servers crash BECAUSE of high traffic, this can be kind of bad.

    2. Re:speed focus by jonadab · · Score: 2, Insightful

      > if they are so much focused on speed, couldn't this be the mysql killer
      > for web applications that don't need funky features but where concurrency
      > and speed are important

      As near as I can make out from the (somewhat nontechnical) article, this
      is not a traditional database in any normal sense; it's more like a query
      engine for streaming data. It doesn't permanently store all the data in
      the stream that's passing through it. What it does store, I take it, is
      query results. So I guess basically you set up your queries ahead of time,
      and the results accumulate as the data flows through.

      This could be useful for some things, but it's not going to kill off any
      of the current database engines because, fundamentally, it doesn't do the
      same thing. Indeed, the article claims that big companies have tried to
      (ab)use Oracle to do what this does, and it didn't work out. Translation,
      the jobs this thing will be taking over are not currently jobs that Oracle
      is performing -- nor, presumably, MySQL either. It isn't made to do what
      they do and compete with them; it's made to do something different, but
      (the company behind it hopes) also useful.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  3. I call foul by RFC959 · · Score: 3, Insightful
    I call foul. This quote from the article was what got to me:

    Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.

    So they manage to do their analysis without even touching main memory? Nifty! What do they do, make it all fit in the L1 data cache? OK, maybe the guy was misquoted - I trust reporters about as far as I can throw them - but the whole thing just smells funny to me. I'm betting that the massive speedup they report is only for carefully selected, pre-groomed data sets. I agree that analyzing data as it comes in rather than storing it up to recrunch later is the smart thing to do, but that insight isn't a breakthrough of the kind the article is spinning this as.
  4. Has nothing to do with relational databases by Wesley+Felter · · Score: 5, Insightful

    If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.

    1. Re:Has nothing to do with relational databases by Anonymous Coward · · Score: 5, Insightful

      I'm not sure that is an acurate critque of Roland. He likely did RTFA -- he just didn't UTFA

  5. Re:Duh by Anonymous Coward · · Score: 4, Insightful

    You've possibly misunderstood the point of this software.

    At no time is the data 'stored' in any way .. As it's collected (or INSERTed) it passes through a collection of preconfigured SELECT statements, and then disappears. There are no tables full of data, only tables as defined structures for handling incoming and outgoing data.

    You cannot query anything that happened in the past, because the program doesn't remember it.

  6. Read the article before posting by bigtallmofo · · Score: 5, Insightful

    Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.

    --
    I'm a big tall mofo.
    1. Re:Read the article before posting by kpharmer · · Score: 4, Insightful

      Right, and this solution has its own limitations within this context: namely that if you crunch your data real time, rather than read it from a data store:

      1. if you decide to add a new analytic you have to start with new data - you can't deploy a new analtyical component and against historical data.

      2. if your machine crashes - it takes all your accumulated analytical data along with it. Maintaining a distribution of activity calculated every 5 minutes over 90 days? Great, but after the server comes back up your data starts all over.

      3. if your analtyical component needs to run against a lot of history each time (ex: total number of unique telephone numbers accessed by day, calculate rolling median) then you'll have to maintain that detail data in memory. As you can imagine - you can *easily* identify calculations that will exceed your memory. So, to tune you'll be forced to keep your calculations to relatively recent data only.

      ken

  7. ACID? by plopez · · Score: 2, Insightful

    How do they deal with the durability of aspect of ACID? If the system crashes without any data in a durable data store, it dissappears forever. It sounds more like high speed data analysis vs. a true database which implies longer term storage.

    --
    putting the 'B' in LGBTQ+
  8. Seems more like MOM than DB by richardoz · · Score: 2, Insightful

    This seems more like Message Oriented Middleware than a Database...

    --
    All the worlds indeed a .sig, and we are mearly players..
  9. Got the wrong end of the stick by t_allardyce · · Score: 2, Insightful

    For a minute there I thought they were trying to store a large database by just forwarding all the little bits of it around the net constantly and then grabbing them when they came back around to save disk space.. but thats a thought!

    This idea really doesn't seem that new though? its just real-time DSP on text-based data! with a front-end that pretends to be a database.

    --
    This comment does not represent the views or opinions of the user.
  10. Seems kinda silly to me. by boodaman · · Score: 3, Insightful

    OK, I get what they're trying to do, but my question: so what?

    Sooner or later you have to put something somewhere. Let's say you monitor a battalion in battle in realtime. All of these messages are streaming in and being analyzed. Great. But now what? So something triggers an alert, say. Well, what's tracking the status of the alert? Wouldn't you want to track the status of an alert saying "this Humvee is off course"? Wouldn't you want to track whether someone had acknowledged the alert, and what they did about it?

    And don't forget there are liability issues, historical issues, and more. You're a stock trader, all of these messages are coming and being analyzed. You get an alert...one of your triggers tripped. You make a trade as a result, only to find out 30 minutes later that the trigger was WRONG and your trade was WRONG and you (or your company) is out $10 million. How do you prove that you made the trade based on the trigger like you were supposed to and not because you f**ked up? The trigger, and the data that caused it to trip, is long gone. What do you do now?

    Eventually something has to be written (stored) somewhere, sometime. I guess I can see the need for summarizing data and only storing what StreamBase says is "important" but how would you know if everything was OK if the actual data driving everything was long gone?

  11. Re:What does it do? by Anonymous Coward · · Score: 1, Insightful

    It sounds to me that the application is to apply a set of rules to data as it comes running into the system. Imagine a database with "triggers" but no tables. Obviously, the rules are all cached in RAM, and they're not persisting the data stream at all (at least on this box, perhaps someplace else).

    That's just a SWAG, but from the article that's what it sounds like to me.

  12. Re:Duh by hanshotfirst · · Score: 2, Insightful

    Sounds more like a messaging queue than a database. Of course, I work with Oracle DB's all day, so I have a rather targeted perspective/perception on the topic. The big dogs have messaging queues and data streaming technology built into the database, is this perhaps a way for it to come to the more "vanilla" MySQL/postgres world?

    --
    Why, oh why, didn't I take the Blue Pill?
  13. This is old news by Chitlenz · · Score: 2, Insightful

    I remember seeing a RAM-Cacheing scheme for Oracle a few years ago that had the same claims. In actuality Microsoft, for all the love they'll have here, allows you to do this exact thing in a Dataset object within .NET. There are several solutions to this kind of problem, but the .NET way is the one I'll focus on here.

    The CommandBehavior.SequentialAccess descendant of the SelectCommand Class in C# can be assigned in a way that allows binary objects, or otherwise ... data..etc., to 'stream' in a way back and forth in realtime within the relational Dataset objects created at app instantiation. Essentially, .NET allows for the same type of action by instantiating a 'database' within the Client-side apps by building a schema of sorts, up through and including relational refernces such as foreign keys. At this point, we have a 'database' of RAM (dataset) that can now be resynched via ports to any other client or server using the same architecture.

    I do this today to provide a distribution network for doctors who need access from several places to a pool of active patient data. This is a data volume of Serveral Terrabytes per location, so I assure you that we are discussing the same scale here as the article.

    Consequently, the TPC benchmarks show 3,210,540 TpCM as the current posted record for AIX on a Big Blue machine, so their numbers are skewed if not wrong. Most processes, including those using binaries, can be proceduralized at the back end anyway, thus make call -> server -> stored_procedure ->return (); be the flow, with all data living inside of RAM, and sorts happening in 'real-time', that is from a pinned table into another location in memory at the server layer, returning into a dataset that is kept in RAM on the client.

    I don't really see anything revolutionary about all this, correct me if I'm mistaking something?

    -chitlenz

    --
    Imagination is the silver lining of Intelligence.
  14. no storage = no problem by mshurpik · · Score: 2, Insightful

    This press release says a lot about analyzing streams and nothing about altering them. Most of the weight of a database is in manipulating a permanent record. INSERTS are slow. Streambase may not have any.

  15. Stop encouraging people to visit by dj42 · · Score: 2, Insightful

    This comment seems fishy: "Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/ [primidi.com]) to see it for yourself." Why would someone AGAINST primidi.com getting tons of money from per view /. floods suggest, constantly, on /. that others should go check it out? Wouldn't he be encouraging people to NOT go and visit?

    --
    We are one consciousness experiencing itself subjectively. Back to you with the weather, Bob!