Streaming a Database in Real Time

← Back to Stories (view on slashdot.org)

Streaming a Database in Real Time

Posted by michael on Friday January 21, 2005 @11:20AM from the never-query-in-the-same-river-twice dept.

Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."

6 of 194 comments (clear)

Has nothing to do with relational databases by Wesley+Felter · 2005-01-21 11:30 · Score: 5, Insightful

If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.
1. Re:Has nothing to do with relational databases by grub · 2005-01-21 11:36 · Score: 5, Funny
  
  Uh oh... You dared to slam Rolly. Prepare for the wrath of Mikey and his infinite mod points.
  
  Lube thy anus.
  
  --
  Trolling is a art,
2. Re:Has nothing to do with relational databases by Anonymous Coward · 2005-01-21 11:59 · Score: 5, Insightful
  
  I'm not sure that is an acurate critque of Roland. He likely did RTFA -- he just didn't UTFA
Read the article before posting by bigtallmofo · 2005-01-21 11:33 · Score: 5, Insightful

Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.

--
I'm a big tall mofo.
Re:I call foul by ComputerSlicer23 · 2005-01-21 11:50 · Score: 5, Interesting

Hmmm, I guess. My guess is that they have implemented something akin to SQL for datastrems. You define a message format. Think of each message as a row in the table. The message format is the table schema.
You have a "standing query". So you can ask things, like, what's the rolling average for the last 60 seconds for this ticker name. What's the minimum price for this commodity.
You can ask to correlate things. Store the last 90 minutes worth of transactions on these commodities. Search for these types of patterns.
It sounds like what they have done is build an OLAP cube that builds its dataset on the fly by processing messages coming over a streaming interface.
It's much smarter to do that, then write every last transaction to disk, and then query the transactions after the fact. That'd be the natural way to thing about it if you used a Relational database.
Essentially, it sure sounds like he's written a generalized packet filter, that can compute interesting functions on the data. Think snort, think ethereal, think iptables, think policy routing. Now apply those kinds of technology to "The price of this stock", "the location of that soldier", where those values are embedded in a network packet frame somewhere.
While each single application of this sounds trivial to implement, if he has done it in a generalized way, that can keep pay with larger systems, bully for him.
The irony of all this for me is that at a former job, I used to process medical data exactly this way. It sounds like the HL7 interface issues we used to have. You couldn't possibly take a full HL7 stream and process it, so you'd filter it down to just the patients that this department was interested in. Then only process messages about those patients.
There were rows that even about those patients you weren't interested in that you had to filter out. You spent a bunch of time filtering, and re-filtering.
We wrote the raw messages to disk, and spooled them to ensure we didn't miss messages due database problems (if the database was down, you had to spool until the database came back up, it was unacceptable to miss patient records for database maintience).
Kirby
For the record--Taco's response to this by bonch · 2005-01-21 16:43 · Score: 5, Informative

I asked him why so many Roland articles get accepted, and he said he doesn't even look at the submitter's name and that Roland must be submitting good articles.

I then told him about the controversy over it in posters' minds, and he said it was just a "new successful troll meme." Good luck getting through to Slashdot's editors, because clearly Malda does not consider this anything to take seriously.