Streaming a Database in Real Time
Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."
How much Does Roland Piquepaille pay you to link to his shitty articles?
It must be alot since the pay for play is so obvious.
if they are so much focused on speed, couldn't this be the mysql killer for web applications that don't need funky features but where concurrency and speed are important
Artists against online scams http://www.aa419.org/
So they manage to do their analysis without even touching main memory? Nifty! What do they do, make it all fit in the L1 data cache? OK, maybe the guy was misquoted - I trust reporters about as far as I can throw them - but the whole thing just smells funny to me. I'm betting that the massive speedup they report is only for carefully selected, pre-groomed data sets. I agree that analyzing data as it comes in rather than storing it up to recrunch later is the smart thing to do, but that insight isn't a breakthrough of the kind the article is spinning this as.
If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.
You've possibly misunderstood the point of this software.
.. As it's collected (or INSERTed) it passes through a collection of preconfigured SELECT statements, and then disappears. There are no tables full of data, only tables as defined structures for handling incoming and outgoing data.
At no time is the data 'stored' in any way
You cannot query anything that happened in the past, because the program doesn't remember it.
Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.
I'm a big tall mofo.
How do they deal with the durability of aspect of ACID? If the system crashes without any data in a durable data store, it dissappears forever. It sounds more like high speed data analysis vs. a true database which implies longer term storage.
putting the 'B' in LGBTQ+
This seems more like Message Oriented Middleware than a Database...
All the worlds indeed a
For a minute there I thought they were trying to store a large database by just forwarding all the little bits of it around the net constantly and then grabbing them when they came back around to save disk space.. but thats a thought!
This idea really doesn't seem that new though? its just real-time DSP on text-based data! with a front-end that pretends to be a database.
This comment does not represent the views or opinions of the user.
OK, I get what they're trying to do, but my question: so what?
Sooner or later you have to put something somewhere. Let's say you monitor a battalion in battle in realtime. All of these messages are streaming in and being analyzed. Great. But now what? So something triggers an alert, say. Well, what's tracking the status of the alert? Wouldn't you want to track the status of an alert saying "this Humvee is off course"? Wouldn't you want to track whether someone had acknowledged the alert, and what they did about it?
And don't forget there are liability issues, historical issues, and more. You're a stock trader, all of these messages are coming and being analyzed. You get an alert...one of your triggers tripped. You make a trade as a result, only to find out 30 minutes later that the trigger was WRONG and your trade was WRONG and you (or your company) is out $10 million. How do you prove that you made the trade based on the trigger like you were supposed to and not because you f**ked up? The trigger, and the data that caused it to trip, is long gone. What do you do now?
Eventually something has to be written (stored) somewhere, sometime. I guess I can see the need for summarizing data and only storing what StreamBase says is "important" but how would you know if everything was OK if the actual data driving everything was long gone?
It sounds to me that the application is to apply a set of rules to data as it comes running into the system. Imagine a database with "triggers" but no tables. Obviously, the rules are all cached in RAM, and they're not persisting the data stream at all (at least on this box, perhaps someplace else).
That's just a SWAG, but from the article that's what it sounds like to me.
Sounds more like a messaging queue than a database. Of course, I work with Oracle DB's all day, so I have a rather targeted perspective/perception on the topic. The big dogs have messaging queues and data streaming technology built into the database, is this perhaps a way for it to come to the more "vanilla" MySQL/postgres world?
Why, oh why, didn't I take the Blue Pill?
I remember seeing a RAM-Cacheing scheme for Oracle a few years ago that had the same claims. In actuality Microsoft, for all the love they'll have here, allows you to do this exact thing in a Dataset object within .NET. There are several solutions to this kind of problem, but the .NET way is the one I'll focus on here.
... data..etc., to 'stream' in a way back and forth in realtime within the relational Dataset objects created at app instantiation. Essentially, .NET allows for the same type of action by instantiating a 'database' within the Client-side apps by building a schema of sorts, up through and including relational refernces such as foreign keys. At this point, we have a 'database' of RAM (dataset) that can now be resynched via ports to any other client or server using the same architecture.
The CommandBehavior.SequentialAccess descendant of the SelectCommand Class in C# can be assigned in a way that allows binary objects, or otherwise
I do this today to provide a distribution network for doctors who need access from several places to a pool of active patient data. This is a data volume of Serveral Terrabytes per location, so I assure you that we are discussing the same scale here as the article.
Consequently, the TPC benchmarks show 3,210,540 TpCM as the current posted record for AIX on a Big Blue machine, so their numbers are skewed if not wrong. Most processes, including those using binaries, can be proceduralized at the back end anyway, thus make call -> server -> stored_procedure ->return (); be the flow, with all data living inside of RAM, and sorts happening in 'real-time', that is from a pinned table into another location in memory at the server layer, returning into a dataset that is kept in RAM on the client.
I don't really see anything revolutionary about all this, correct me if I'm mistaking something?
-chitlenz
Imagination is the silver lining of Intelligence.
This press release says a lot about analyzing streams and nothing about altering them. Most of the weight of a database is in manipulating a permanent record. INSERTS are slow. Streambase may not have any.
This comment seems fishy: "Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/ [primidi.com]) to see it for yourself." Why would someone AGAINST primidi.com getting tons of money from per view /. floods suggest, constantly, on /. that others should go check it out?
Wouldn't he be encouraging people to NOT go and visit?
We are one consciousness experiencing itself subjectively. Back to you with the weather, Bob!