Slashdot Mirror


Streaming a Database in Real Time

Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."

16 of 194 comments (clear)

  1. Practical Considerations by Anonymous Coward · · Score: 1, Informative

    Streaming data? Data must have some correlation otherwise it's useless. I doubt that all that can be kept in memory alone and so a permanent storage medium (disk, DAT, or holographic cubes) must be used.

    I used to work with a mySQL variant which facilitate queries by using a RAMDisk and an optimized version of Watcom Pascal to enhance query functionality. We made it open source, but last I heard, the last administrator had converted it into a MP3-labelling shareware package.

  2. THE TRUTH ABOUT ROLAND PIQUEPAILLE by Anonymous Coward · · Score: 4, Informative

    Roland Piquepaille and Slashdot: Is there a connection?

    I think most of you are aware of the controversy surrounding regular Slashdot article submitter Roland Piquepaille. For those of you who don't know, please allow me to bring forth all the facts. Roland Piquepaille has an online journal (I refuse to use the word "blog") located at http://www.primidi.com/. It is titled "Roland Piquepaille's Technology Trends". It consists almost entirely of content, both text and pictures, taken from reputable news websites and online technical journals. He does give credit to the other websites, but it wasn't always so. Only after many complaints were raised by the Slashdot readership did he start giving credit where credit was due. However, this is not what the controversy is about.

    Roland Piquepaille's Technology Trends serves online advertisements through a service called Blogads, located at www.blogads.com. Blogads is not your traditional online advertiser; rather than base payments on click-throughs, Blogads pays a flat fee based on the level of traffic your online journal generates. This way Blogads can guarantee that an advertisement on a particular online journal will reach a particular number of users. So advertisements on high traffic online journals are appropriately more expensive to buy, but the advertisement is guaranteed to be seen by a large amount of people. This, in turn, encourages people like Roland Piquepaille to try their best to increase traffic to their journals in order to increase the going rates for advertisements on their web pages. But advertisers do have some flexibility. Blogads serves two classes of advertisements. The premium ad space that is seen at the top of the web page by all viewers is reserved for "Special Advertisers"; it holds only one advertisement. The secondary ad space is located near the bottom half of the page, so that the user must scroll down the window to see it. This space can contain up to four advertisements and is reserved for regular advertisers, or just "Advertisers". Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/) to see it for yourself.

    Before we talk about money, let's talk about the service that Roland Piquepaille provides in his journal. He goes out and looks for interesting articles about new and emerging technologies. He provides a very brief overview of the articles, then copies a few choice paragraphs and the occasional picture from each article and puts them up on his web page. Finally, he adds a minimal amount of original content between the copied-and-pasted text in an effort to make the journal entry coherent and appear to add value to the original articles. Nothing more, nothing less.

    Now let's talk about money. Visit http://www.blogads.com/order_html?adstrip_category =tech&politics= to check the following facts for yourself. As of today, December XX 2004, the going rate for the premium advertisement space on Roland Piquepaille's Technology Trends is $375 for one month. One of the four standard advertisements costs $150 for one month. So, the maximum advertising space brings in $375 x 1 + $150 x 4 = $975 for one month. Obviously not all $975 will go directly to Roland Piquepaille, as Blogads gets a portion of that as a service fee, but he will receive the majority of it. According to the FAQ, Blogads takes 20%. So Roland Piquepaille gets 80% of $975, a maximum of $780 each month. www.primidi.com is hosted by clara.net (look it up at http://www.networksolutions.com/en_US/whois/index. jhtml). Browsing clara.net's hosting solutions, the most expensive hosting service is their Clarahost Advanced (http://www.uk.clara.net/clarahost/advanced.php) priced at £69.99 GBP. This is

  3. Duh by Saint+Stephen · · Score: 1, Informative

    Any of the enterprise databases will with gobs of memory end up caching the entire database in memory.

    As long as it's read only, the disk won't be touched.

    A writeable database that doesn't need to be written to disk is not a database, it's called a nonpersistent cache.

  4. Re:speed focus by Unknown+Relic · · Score: 2, Informative

    According to the article what makes Streambase different is that it's able to query new data that is coming in at an extremely fast rate. Instead of writing the new data to disk before a query can be executed against it, the database is able to query it as soon as it is streamed into memory. According to the article the current customers testing the software are financial services companies who need to be able to analyze stock ticker information which comes in at an extremely high rate of speed. The $100,000 to $300,000 per year cost the current customers are paying is also a bit of a deterrent for use in web space.

  5. A Better Solution by logicnazi · · Score: 3, Informative

    Just to let everyone know this is not the only product or even the first product to do this.

    Another option is EPL server by ispheres . Unlike the product mentioned here, which seems to be just some extra code thrown on top of a database EPL server is built from the ground up for this sort of application.

    --

    If you liked this thought maybe you would find my blog nice too:

  6. Re:speed focus by epiphani · · Score: 4, Informative

    The idea sounds a lot like the software I develop. We sit on a server-peer network, and process messages - without ever hitting disk. We can query state information out of the network, even though most traffic is dynamic and not stored past initial processing and resending. Two parts to our software, I guess. State data and traffic. Pretty impressive peice of software I think. Maintaining the network state is far more difficult than most people realize. We generally keep around 100 megs of state in RAM, more depending on the traffic levels. My software has been around, in various incarnations, since the 80s.

    Its called IRC.

    --
    .
  7. More Information by adesai9 · · Score: 2, Informative

    DB Group @ Stanford is doing some Stream projects as well. Incase anyone is interested in more technical information check out: http://www-db.stanford.edu/stream/

  8. Article text minus the spam by Anonymous Coward · · Score: 2, Informative



    Streaming a Database in Real Time

    Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In "Data On The Fly," Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk as are doing other relational database software gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? Read more...

    Here are some excerpts from the Forbes article.

    "Relational databases are one to two orders of magnitude too slow," says Stonebraker, who is chief technology officer at Streambase, a 25-person outfit based in Lexington, Mass. "Big customers have already tried to use relational databases for streaming data and dismissed them. Those products are non-starters in this market."

    In a recent pilot program, Streambase was able to analyze 140,000 messages per second, while a leading relational database -- Stonebraker won't say which one -- could handle only 900 messages per second. Streambase has 12 customers now testing its software, all of them financial services companies that need to analyze rapid-fire ticker feeds and other streaming data.

    Unlike traditional database programs, Streambase analyzes data without storing it to disk, performing queries on data as it flows. Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.

    The software, which should be commercially available next month, runs on Linux and Solaris, but a Microsoft version should be available soon.

    The database business is not a cheap one. So how much this new company will charge for a -- largely -- unproven software?

    Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year, says Barry Morris, Streambase's chief executive.

    In "StreamBase eyes real-time streaming apps," InfoWorld wrote the prices shoud be lower.

    The software is available via a subscription model, with pricing in the range of approximately $50,000 per year, Stonebraker said. Subscriptions are sold on a per-CPU basis.

    Who will be the customers for these speedy accesses to their databases? Let's come back to Forbes.com.

    For now Streambase is focusing attention on financial services companies, which hope to do things like track how well traders are performing on a real-time basis, rather than aggregating trades at the end of the day and analyzing them overnight.

    A bigger opportunity involves processing real-time data feeds generated by sensor networks and RFID tags. A military contractor wants to use Streambase to keep track of soldiers and vehicles in the battlefield. A casino in Las Vegas is considering using Streambase to track the performance of individual gamblers.

    In an interview with InfoWorld, Stonebraker gave more details about military applications.

    We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or [take other actions]. They already have a GPS system in every vehicle, but that didn't keep Jennifer Lynch's convoy from getting lost.

    They want to turn this into a system to watch the position of every vehicle and compare it against where you're supposed to be. They also want to put a sensor on the

  9. Re:speed focus by epiphani · · Score: 3, Informative

    Oh, and 140,000 messages on a $1500 PC sounds a little low accually. We handled 40,000 -sockets- on an AMD Duron 900Mhz. Each socket recieved a few messages per second, and we were recieving far more from the uplink.

    --
    .
  10. Re:ACID? by ray-auch · · Score: 2, Informative

    Difficult to tell from the vague article, but my guess is they don't, and they throw the data away after analysis. They might map some kind of database schema to the incoming data and provide some form of SQL for queying, but still no real database anywhere.

    So, throw out ACID (if problem domain doesn't require it) and get performance increases, wow! Probably they are now patenting it because no one had thought of that before...

  11. This isn't streaming, this is message queuing... by X · · Score: 4, Informative

    This isn't streaming, it's standard message queuing. Most messaging products allow you to have non-persistent queues and allow you to extract data based on arbitary queries. There are well over a decades worth of products for doing this kind of stuff.

    I'm sure this is a great product, but both the submitter and the writer of the story seem to not grok what makes it great.

    --
    sigs are a waste of space
  12. Data IS written to disk/backed-up. by univgeek · · Score: 3, Informative

    It's just that if you start querying AFTER you store it on disk, the I/O makes it much more slower. So what you do is pick up some of the information from the flowing data, and some other system behind yours saves the data.

    Every time you get some thing interesting, you save that on disk too - but separately, into a much smaller db. This way state is also saved, and since state is going to be much smaller than the data, there will be no speed issues.

    Now the clever thing to do would be to link this flowing-state dbms (FSDBMS) to a standard rdbms working from the disk. Then you could verify the information from the FSDBMS, and ensure that things aren't screwed up. Also, based on patterns seen by the rdbms with long term data, new queries could be generated on the FSDBMS, allowing it to generate results from the data on the wire.

    Sounds like it would have applications primarily where response time is at a premium, and long history is not such a large component of the information.

    So in the case of military info, where a HumVee could be in trouble (a situ someone else has mentioned), the FSDBMS would raise the alarm, and some other process would then follow up and ensure that the alarm was taken care of.(The data itself would be backed up for future analysis, such as whether the query was correctly handled).

    Dynamic queries in such a situ could be - get the id of the closest Apache reporting in, or closest loaded bomber en-route to some other target. Then the alarm handling program would re-route the bomber/apache to the humvee for support. While querying the disk database may be time intensive, the FSDBMS would have delivered a sub-optimal FAST solution.

    So imagine the FSDBMS as a filter, giving different bits of information to different people. With the option that you could change the filter on the fly. And the filter could be complex, based on previous history etc., just like a DB query.

    --
    All bow to his Noodliness!! His Noodle Appendage has touched me!
  13. Combining this with an RDBMS by kiwi_mcd · · Score: 2, Informative

    When I was a project manager at ECONZ http://www.econz.co.nz/ in 1999 I did a high level design for a product similar to this but we merged it with a relational database (Oracle in this instance).

    Other posts are correct that what is talked about here is a message queuing mechanism to some degree. What I had designed and built was what we called an event server.

    Basically how it worked was that you sent what SQL statement you wanted registered and then you got the initial data set back and then any changes to it. Anytime somebody did an UPDATE or INSERT or DELETE statement the results got sent to whoever had registered for it. We sent it through our own message queue software.

    This worked very well although not at the speed claimed here and was much more complex to write than we anticipated. It was written in C++ on Linux which was quite revolutionary back then...

    How did it work in practice? The software that we replaced was running on SGI boxes that cost more than $10 million. We built our total hardware solution for less than $1 million (large cost was Sun boxes for Oracle). The response time dropped from minutes to seconds or less. The applicaiton was a dispatch system for jobs in the telco area with over 500 users.

  14. Re:speed focus by jedidiah · · Score: 2, Informative

    This is also how Oracle works by default. You can have a database entirely resident in memory just due to the fact that Oracle will try to aggressively cache as much as it can. This is obviously not limited to Oracle or SQLServer.

    What distinguishes RDBMS systems is the fact that their storage is permanent and engineered to perform crash recovery. This means that even a memory resident Oracle database will be doing synchronous writes to it's transaction logs. This ensures that any transaction can be regenerated should the whole system take a dive.

    There's a secret switch inside oracle to turn this all off if you really want to.

    An RDBMS might not be the right tool for the job. Companies quite often have no business using Oracle or even an RDBMS. This fact is not news.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  15. Stonebraker gave a guest lecture to my class. by wirelessbuzzers · · Score: 2, Informative

    Some financials company is using this software to check incoming stock feeds for problems. It takes thousands of messages per second, and if certain stocks don't come in at least once in 5 seconds, it counts a miss. For others it's 1 in 30 seconds.

    If a given provider is consistently slow, it sounds a low-level alarm against the provider, not to trust their data because it's slow. Similarly for various markets, and probably other groupings too. It probably does other processing on the data.

    This data is almost useless within 5 minutes, and it has to be processed very fast. If you change your application, nothing will matter within 5 minutes. If your machine crashes, you have bigger problems, as is generally the case when you want real-time processing. And you don't need a lot of history.

    Streambase is much faster than the company's previous custom-coded C++ program, largely because it has better multithreading and more query optimization. It's designed to cut across multiple layers of a traditional database platform (transport, database, application).

    Of course, Stonebraker could be puffing his product, but it sounds pretty effective to me.

    --
    I hereby place the above post in the public domain.
  16. For the record--Taco's response to this by bonch · · Score: 5, Informative

    I asked him why so many Roland articles get accepted, and he said he doesn't even look at the submitter's name and that Roland must be submitting good articles.

    I then told him about the controversy over it in posters' minds, and he said it was just a "new successful troll meme." Good luck getting through to Slashdot's editors, because clearly Malda does not consider this anything to take seriously.