Streaming a Database in Real Time

Seriously, Michael by Anonymous Coward · 2005-01-21 11:22 · Score: 4, Insightful

How much Does Roland Piquepaille pay you to link to his shitty articles?

It must be alot since the pay for play is so obvious.

Queue dancin' by RenHoek · 2005-01-21 11:22 · Score: 3, Funny

From what I hear, Blizzard should think about hiring this guy ;)

Re:Queue dancin' by xstephx · 2005-01-21 11:24 · Score: 1

these guys.

speed focus by Random+Web+Developer · 2005-01-21 11:22 · Score: 2, Insightful

if they are so much focused on speed, couldn't this be the mysql killer for web applications that don't need funky features but where concurrency and speed are important

--
Artists against online scams http://www.aa419.org/

Re:speed focus by Unknown+Relic · 2005-01-21 11:32 · Score: 2, Informative

According to the article what makes Streambase different is that it's able to query new data that is coming in at an extremely fast rate. Instead of writing the new data to disk before a query can be executed against it, the database is able to query it as soon as it is streamed into memory. According to the article the current customers testing the software are financial services companies who need to be able to analyze stock ticker information which comes in at an extremely high rate of speed. The $100,000 to $300,000 per year cost the current customers are paying is also a bit of a deterrent for use in web space.
Re:speed focus by Anonymous Coward · 2005-01-21 11:33 · Score: 3, Funny

"Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year"

Yeah, this will outright kill mysql, I'm swapping tomorrow, got any cash to spare?
Re:speed focus by epiphani · 2005-01-21 12:01 · Score: 4, Informative

The idea sounds a lot like the software I develop. We sit on a server-peer network, and process messages - without ever hitting disk. We can query state information out of the network, even though most traffic is dynamic and not stored past initial processing and resending. Two parts to our software, I guess. State data and traffic. Pretty impressive peice of software I think. Maintaining the network state is far more difficult than most people realize. We generally keep around 100 megs of state in RAM, more depending on the traffic levels. My software has been around, in various incarnations, since the 80s.

Its called IRC.

--
.
Re:speed focus by epiphani · 2005-01-21 12:04 · Score: 3, Informative

Oh, and 140,000 messages on a $1500 PC sounds a little low accually. We handled 40,000 -sockets- on an AMD Duron 900Mhz. Each socket recieved a few messages per second, and we were recieving far more from the uplink.

--
.
Re:speed focus by Anonymous Coward · 2005-01-21 12:19 · Score: 0

Cool.

How do you do relational queries on IRC traffic BTW? I must have missed that - "#select something" I guess...
Re:speed focus by airjrdn · 2005-01-21 12:21 · Score: 3, Interesting

SQL Server Table Variable, and, to a certain extent, derived table, same basic premise...it's in RAM, not on Disk.

One question might be...why write the data directly to a database initially? Why not utilize a faster format, then write to the DB when things have slowed down (i.e. caching)?

Admittedly I haven't read the article, but I am familar with 200+G databases, and there are ways to deal with performance with current DB tech.

I do welcome any new competition, but there are ways of querying data in memory already. Heck, put the whole thing on a RAM Drive...how much data can there be for stock tickers?

--

My Tech Posts on Twitter
Re:speed focus by Anonymous Coward · 2005-01-21 12:31 · Score: 0

One question might be...why write the data directly to a database initially?

Do you really want transactions to complete successfully before it's written to disk? That would be very dangerous when the power goes out.
Re:speed focus by brer_rabbit · 2005-01-21 12:47 · Score: 4, Funny

How do you do relational queries on IRC traffic BTW? I must have missed that - "#select something" I guess...
you start with /join, of course...
Re:speed focus by Aeiri · 2005-01-21 13:34 · Score: 3, Insightful

One question might be...why write the data directly to a database initially? Why not utilize a faster format, then write to the DB when things have slowed down (i.e. caching)?

If the server crashes while it's still in "write later mode", then data will be lost.. Since most of the time servers crash BECAUSE of high traffic, this can be kind of bad.
Re:speed focus by jedidiah · 2005-01-21 15:27 · Score: 2, Informative

This is also how Oracle works by default. You can have a database entirely resident in memory just due to the fact that Oracle will try to aggressively cache as much as it can. This is obviously not limited to Oracle or SQLServer.

What distinguishes RDBMS systems is the fact that their storage is permanent and engineered to perform crash recovery. This means that even a memory resident Oracle database will be doing synchronous writes to it's transaction logs. This ensures that any transaction can be regenerated should the whole system take a dive.

There's a secret switch inside oracle to turn this all off if you really want to.

An RDBMS might not be the right tool for the job. Companies quite often have no business using Oracle or even an RDBMS. This fact is not news.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:speed focus by anactofgod · 2005-01-21 15:38 · Score: 1

Based on how I imagine this system is architected, the question should be...

"Do you have any cache to spare?"

*grynn*

--

---anactofgod---

"Equal opportunity swindling - *that* is the true test of a sustainable democracy."
Re:speed focus by winterlens · 2005-01-21 15:46 · Score: 1

Sometimes the best ideas leverage old ones. (I think we sometimes call that innovation.)
Re:speed focus by jonadab · 2005-01-21 16:20 · Score: 2, Insightful

> if they are so much focused on speed, couldn't this be the mysql killer
> for web applications that don't need funky features but where concurrency
> and speed are important

As near as I can make out from the (somewhat nontechnical) article, this
is not a traditional database in any normal sense; it's more like a query
engine for streaming data. It doesn't permanently store all the data in
the stream that's passing through it. What it does store, I take it, is
query results. So I guess basically you set up your queries ahead of time,
and the results accumulate as the data flows through.

This could be useful for some things, but it's not going to kill off any
of the current database engines because, fundamentally, it doesn't do the
same thing. Indeed, the article claims that big companies have tried to
(ab)use Oracle to do what this does, and it didn't work out. Translation,
the jobs this thing will be taking over are not currently jobs that Oracle
is performing -- nor, presumably, MySQL either. It isn't made to do what
they do and compete with them; it's made to do something different, but
(the company behind it hopes) also useful.

--
Cut that out, or I will ship you to Norilsk in a box.
Re:speed focus by airjrdn · 2005-01-21 18:09 · Score: 1

We've got UPS's with enough juice to power all of our racks for a few hours. Add to that the fact that we've got a generator outside that can power them for 3 days, and I think you'll see why I wouldn't worry about that much.

Any IT shop required to keep their products up will have these things in place.

--

My Tech Posts on Twitter
Re:speed focus by airjrdn · 2005-01-21 18:11 · Score: 1

Well, first of all, data isn't necessarily lost. It's fairly easy to bring the data in to one server, then push it to the online server. The only server that's going to go down in your scenario is the online server (BECAUSE of high traffic). If it goes down, the caching server isn't necessarily losing anything.

--

My Tech Posts on Twitter
Re:speed focus by Tolookah · 2005-01-21 21:41 · Score: 1

Any IT shop required to keep their products up will have these things in place. Do we have to remind you of the power one button has over millions of angsty teenage girls? LiveJournal sure knows... even with backup, its still a big old mess...
Re:speed focus by the_duke_of_hazzard · 2005-01-21 22:55 · Score: 1

Servers only ever go down because of high traffic? I'm *very* interested in the technology you use.
Re:speed focus by airjrdn · 2005-01-22 01:33 · Score: 1

I was *quoting* the parent.

--

My Tech Posts on Twitter
Re:speed focus by Lord+Crc · 2005-01-22 02:25 · Score: 1

One question might be...why write the data directly to a database initially? Why not utilize a faster format, then write to the DB when things have slowed down (i.e. caching)?

From what I understood, thats the point. Except you never write to the DB. Instead of reading from the socket, storing in memory buffer, then performing filtering, you filter directly the data you're reading from the socket. No need to store in memory first!
Re:speed focus by Anonymous Coward · 2005-01-22 04:14 · Score: 0

Sounds like it's basically just a big filter/splitter.

If it uses normal SQL commands, it just precompiles them and applies them to each line of input.

Not too surprising that it's way faster than something that has to store stuff to disk and then look it up.
Re:speed focus by ahdeoz · 2005-01-23 09:40 · Score: 1

There you go. All they do is add some processing (based on known patterns) before storing. In other words, if you want to know in real time how many hits your website had, your counter would count the hits as they occur instead of counting the hits after being recorded in the log.

Practical Considerations by Anonymous Coward · 2005-01-21 11:23 · Score: 1, Informative

Streaming data? Data must have some correlation otherwise it's useless. I doubt that all that can be kept in memory alone and so a permanent storage medium (disk, DAT, or holographic cubes) must be used.

I used to work with a mySQL variant which facilitate queries by using a RAMDisk and an optimized version of Watcom Pascal to enhance query functionality. We made it open source, but last I heard, the last administrator had converted it into a MP3-labelling shareware package.

Re:Practical Considerations by yintercept · 2005-01-21 13:43 · Score: 1

I doubt that all that can be kept in memory alone

The dropping cost of memory wipes out your practical concerns. You can have all of the logical correlations that you want in memory. We tend to think that we have to write data to disk to make it organized because today's operating systems and programming languages give us very little direct control of memory, but they give us a great deal of control over what we write to the disk.

If we had more operating systems that gave us direct control of memory, organizing memory would not seem so foreign. With programs like Java the allocation of memory and garbage collection are left a mystery to programmers. So the only time we see correlations is when we output to a disk or screen.
Re:Practical Considerations by Anonymous Coward · 2005-01-21 15:10 · Score: 0

If we had more operating systems that gave us direct control of memory, organizing memory would not seem so foreign. With programs like Java the allocation of memory and garbage collection are left a mystery to programmers.
I'm with you on the first sentence, or at least I thought I was until I read the second. What sort of nonsense is this? One might also say that with filesystems, the layout of data on disk is left a mystery to programmers. What's the point, exactly?
Re:Practical Considerations by corngrower · 2005-01-21 15:47 · Score: 1

The dropping cost of memory wipes out your practical concerns.
You're right there. At about $150/Gig, and not using disk space, that $1500 PC system could possibly be an Athlon 64 with 8 Gig of memory.
Re:Practical Considerations by yintercept · 2005-01-21 16:13 · Score: 1

My post was an exploration of why someone might think that data written to disk was structured and not think of data in memory as structured.

First, you are correct. With operating systems, the actual layout of the disk is a mystery to the programmer. I use fopen(), but really don't know where the file is.

The layout of the file, however, is completely under my control. I often know the complete logical structure of the data in the file.

In early assembler type programming languages, the programmer would allocate and deallocate and directly access registers in memory. Just like a a file written to disk, the programmer would have intimate knowledge of the actual structure of the data stored in memory. An assembly programmer would actually visualize the memory accessed by their program. A c programmer would visualize an array as a large block of memory and they would access those pieces of memory with pointers.

The goal of OO languages like Java was to make the memory an abstraction. I tell java to create an array of objects, but I really haven't a clue what that array of objects looks like written in memory. Garbage collection is the ultimate abstraction. I stop using an object. Some time after I stop using the object, Java reallocates the memory. The layers of abstraction between us and the hardware changes the way that we look think about memory.

If our programming languages gave us direct access to large fields of on and off switches we would have a completely different impression of the way memory worked than we we create variables and objects.

This does not mean that Java is bad. It means that I see a different ways that we could have gone that would give us a different perception of memory. If we were working more directly in created the structures that exist in memory, then we would see it in the same light the we see the database file that we wrote to the disk.
Re:Practical Considerations by betelgeuse68 · 2005-01-22 03:11 · Score: 1

No offense but you clearly don't understand the problem space of "tick databases". In a nutshell you have data that is constantly flowing in and needs to be indexed real time. With the ultimate goal of being able to satisfy queries on this data in "real time."

A relational database cannot give this to you. If you take the naive approach and try to build such a system with a RDBMS product you wind up with a situation where eventually your system keels over since all it's trying to do is update its indices. Remember, a RDBMS has to keep its indices up to date to satisfy queries in a "reasonable" period of time. Eventually what happens is that the RDBMS winds up thrashing and spends all of its time trying to update indices. Never mind that the tick data is flowing in and completely saturating your ability to assimilate it into your database. The end result is that the system fails.

Some of the exchanges (Chicago Mercantil Exchange) have symbols where sixty transactions per second occur. And this is just for one symbol/instruction, e.g., some contract in the commodities market. When you take the Chicago Mercantile Exchange along with all the other exchanges and various derivatives, e.g., options, single stock futures, this amounts to TENS OF THOUSANDS of symbols/instruments. Most people are used to hearing NASDAQ and NYSE symbols, e.g., MSFT. But the commodity markets are incredibly busy and it's a space your average person hears very little of. When a hurricane comes in and wipes out the orange crop in Florida, people holding futures contracts on oranges get rich (one example). Places like the Chicago Mercantile Exchange and Chicago Board of Trade are two such places where these kinds of contracts are traded.

Anyway, it doesn't matter WHAT version of MySQL you were using - a MySQL database is not going to allow you to build a tick database.

-M

WWGT by Mikmorg · 2005-01-21 11:24 · Score: 2, Funny

What Would Google Think?

--
Codito, ergo sum.

Re:WWGT by Anonymous Coward · 2005-01-21 11:25 · Score: 0

That you throw as much hardware as you can at it instead of speeding it up software side.

The question isn't what would Google do, but is what Google does the right way.

relational database? by Anonymous Coward · 2005-01-21 11:25 · Score: 0

with the goal of speeding access to relational databases.

Oh, so somebody's finally written a truly relational database that's true to relational principles? That treats relvars as sets and allows arbitrary relational expressions and arbitrary declarative contraints? Where can I download it?

Or did you mean "SQL database"?

You mean Trump.com? by Me-The-Person · 2005-01-21 11:26 · Score: 1

I trust forbes.com about as much as I would trust, donaldtrump.com! http://www.trump.com/

Re:You mean Trump.com? by glib909 · 2005-01-21 11:59 · Score: 2, Funny

What?? You're fired!

--
Suudsu, that stuff is G-E-W-D.

Finally by Anonymous Coward · 2005-01-21 11:26 · Score: 0

This is a great breakthrough in RDBMS development. Remember that you nead real atomic ACID transactions for this thing to work reliably. Watch out for the gotchas. Good luck.

Go Roland by Anonymous Coward · 2005-01-21 11:26 · Score: 0, Interesting

Yet another winning post by Roland Piquepalle!

The guy should write a book. It would be bland, devoid of content, have an ad on every page, but it would quote prolifically from NYT Top 100 Best-sellers list.

Re:Go Roland by mabinogi · 2005-01-22 11:02 · Score: 1

and the depressing thing is, people would buy it.

--
Advanced users are users too!

THE TRUTH ABOUT ROLAND PIQUEPAILLE by Anonymous Coward · 2005-01-21 11:27 · Score: 4, Informative

Roland Piquepaille and Slashdot: Is there a connection?

I think most of you are aware of the controversy surrounding regular Slashdot article submitter Roland Piquepaille. For those of you who don't know, please allow me to bring forth all the facts. Roland Piquepaille has an online journal (I refuse to use the word "blog") located at http://www.primidi.com/. It is titled "Roland Piquepaille's Technology Trends". It consists almost entirely of content, both text and pictures, taken from reputable news websites and online technical journals. He does give credit to the other websites, but it wasn't always so. Only after many complaints were raised by the Slashdot readership did he start giving credit where credit was due. However, this is not what the controversy is about.

Roland Piquepaille's Technology Trends serves online advertisements through a service called Blogads, located at www.blogads.com. Blogads is not your traditional online advertiser; rather than base payments on click-throughs, Blogads pays a flat fee based on the level of traffic your online journal generates. This way Blogads can guarantee that an advertisement on a particular online journal will reach a particular number of users. So advertisements on high traffic online journals are appropriately more expensive to buy, but the advertisement is guaranteed to be seen by a large amount of people. This, in turn, encourages people like Roland Piquepaille to try their best to increase traffic to their journals in order to increase the going rates for advertisements on their web pages. But advertisers do have some flexibility. Blogads serves two classes of advertisements. The premium ad space that is seen at the top of the web page by all viewers is reserved for "Special Advertisers"; it holds only one advertisement. The secondary ad space is located near the bottom half of the page, so that the user must scroll down the window to see it. This space can contain up to four advertisements and is reserved for regular advertisers, or just "Advertisers". Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/) to see it for yourself.

Before we talk about money, let's talk about the service that Roland Piquepaille provides in his journal. He goes out and looks for interesting articles about new and emerging technologies. He provides a very brief overview of the articles, then copies a few choice paragraphs and the occasional picture from each article and puts them up on his web page. Finally, he adds a minimal amount of original content between the copied-and-pasted text in an effort to make the journal entry coherent and appear to add value to the original articles. Nothing more, nothing less.

Now let's talk about money. Visit http://www.blogads.com/order_html?adstrip_category =tech&politics= to check the following facts for yourself. As of today, December XX 2004, the going rate for the premium advertisement space on Roland Piquepaille's Technology Trends is $375 for one month. One of the four standard advertisements costs $150 for one month. So, the maximum advertising space brings in $375 x 1 + $150 x 4 = $975 for one month. Obviously not all $975 will go directly to Roland Piquepaille, as Blogads gets a portion of that as a service fee, but he will receive the majority of it. According to the FAQ, Blogads takes 20%. So Roland Piquepaille gets 80% of $975, a maximum of $780 each month. www.primidi.com is hosted by clara.net (look it up at http://www.networksolutions.com/en_US/whois/index. jhtml). Browsing clara.net's hosting solutions, the most expensive hosting service is their Clarahost Advanced (http://www.uk.clara.net/clarahost/advanced.php) priced at £69.99 GBP. This is

Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by Hard_Code · 2005-01-21 17:21 · Score: 0

Wow, you're right, we can't let this guy with a fishy-sound french name, "Roland Piquepaille" get away with this scheme! To the pitchforks men!

--

It's 10 PM. Do you know if you're un-American?
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by Anonymous Coward · 2005-01-21 17:58 · Score: 0

The blow jobs?
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by acslat3r · 2005-01-21 22:57 · Score: 2, Interesting

Perhaps you should stop worrying about how much he is making off of his online journal and instead put that time into a competitive "online journal" that will net you $1200 a month!! Go for 20 accepted submissions and sit back and watch the cash come in by the truckloads...
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by nazsco · 2005-01-22 04:54 · Score: 1

The other side:

Slashdot doesn't even do the job of coping pasting the article from wired themselves. they wait for YOU to do that form them.

Then they just sit back and see the truckload of money coming.

So, why the hell are you still reading slashdot?
Go read something original... like a book!
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by swhalen · 2005-01-22 13:07 · Score: 1

So what if Roland Piquepaille's web site get's a lot of Slashdot references. I'm glad it does. I started to notice that nearly all of the Slashdot references to his site were for very interesting articles or technologies.

Now I read Roland's site before I read Slashdot because some articles he publishes don't make it into Slashdot. I don't know Roland personally, but from his resume and what he publishes he seems to me to be a very good technologist. His pointers to changes in technology and new breakthroughs in many fields are more carefully selected from a wider field than many other magazines and web sites, and he publishes them at a summary level that's perfect for me.

I don't know if my Slashdot userid (30,377) makes me a real old timer or not, but I've been on Slashdot for quite a long time, and I have to say I read and get something useful from a much higher percentage of Roland's articles than from the average Slashdot submission.

Steve

P.S. Roland is French and his English is one hell of a lot better than my non-existant French, so I'd hope that whoever was complaining about the grammar on his site would give him a break. I appreciate his making the site available in English, and I've never had a single problem understanding what he was saying.
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by reassor · 2005-01-23 04:44 · Score: 1

I like that some people collect all the Information floating around to provide me or others news.Only 2 Years ago,i surf half the day for the latest news.Collection there,picking there,it was a mess.So now somebody helps me by collecting these news and i first save a lot of time and also i can read news,which are worth reading!!!! I want ask others,have you really time to visit microsoft,sun,apple,others daily for the latest?Is Slashdot not really better in it?You only have to visit one site,to get all the news.I dont visit theNASA to search for the hubble mission.One look here and i am informed. So i can live with the fact,that Mr.Malda or Mr. Piquepaille earn some money with it.
Re:THE TRUTH ABOUT ROLAND PIQUEPAILLE by FunkyMonkey · 2005-01-23 07:38 · Score: 1

You'll have to explain to me how this is any different than what Slashdot itself is all about - write a blurb about an article, maybe quote it, and link to it. This is not unique, a scam, or even interesting - it's the way things are.

And if, like you say, Rolland's journal entries are "lame rehashes of original and insightful technology articles", where can I find a source for fresh and interesting articles? That source used to be Slashdot.

Duh by Saint+Stephen · 2005-01-21 11:27 · Score: 1, Informative

Any of the enterprise databases will with gobs of memory end up caching the entire database in memory.

As long as it's read only, the disk won't be touched.

A writeable database that doesn't need to be written to disk is not a database, it's called a nonpersistent cache.

Re:Duh by Anonymous Coward · 2005-01-21 11:33 · Score: 4, Insightful

You've possibly misunderstood the point of this software.

At no time is the data 'stored' in any way .. As it's collected (or INSERTed) it passes through a collection of preconfigured SELECT statements, and then disappears. There are no tables full of data, only tables as defined structures for handling incoming and outgoing data.

You cannot query anything that happened in the past, because the program doesn't remember it.
Re:Duh by Anonymous Coward · 2005-01-21 11:55 · Score: 0

No, it is still a database. It's a database that is STORED in nonpersistent cache.
Re:Duh by Anonymous Coward · 2005-01-21 11:56 · Score: 0

Then it doesn't seem that this is a 'database' anymore than a RSS feed or video/audio streaming media that are available now. Just a slightly different format of media that's streamed.
Re:Duh by dubl-u · 2005-01-21 12:20 · Score: 3, Interesting

As others have pointed out, the article is talking about something completely different than what you had in mind. Even so:

Any of the enterprise databases will with gobs of memory end up caching the entire database in memory.

That's still much slower than in-memory approaches that don't use a database at all. For apps that are amenable to the stick-it-all-in-RAM approach, serializing all your data access is a performance killer.

A writeable database that doesn't need to be written to disk is not a database, it's called a nonpersistent cache.

Well, there are different ways of guaranteeing reliability than the way databases do it. If you're keeping all your data hot, transaction logs with lazy snapshots may be a better solution than the database's approach, which treats the disk as the master copy and RAM as a place to story temporary copies.
Re:Duh by corngrower · 2005-01-21 13:57 · Score: 1

This sounds pretty much like data-flow computing. Not really new. Just a new name. I'm sure it's quite fast, if all it does is look at an incoming stream and decide on which output stream it goes to. Databases have always been slow compared with in-memory tables. Of course I'm sure this system doesn't have to deal with all the record locking ans synchronization issures that an actual database would be doing.
Re:Duh by hanshotfirst · 2005-01-21 15:19 · Score: 2, Insightful

Sounds more like a messaging queue than a database. Of course, I work with Oracle DB's all day, so I have a rather targeted perspective/perception on the topic. The big dogs have messaging queues and data streaming technology built into the database, is this perhaps a way for it to come to the more "vanilla" MySQL/postgres world?

--
Why, oh why, didn't I take the Blue Pill?

Storage by Anonymous Coward · 2005-01-21 11:28 · Score: 0

I gonna need more RAM!

Re:Storage by Dasein · 2005-01-21 12:34 · Score: 1

I bet that what they are doing is building a set of queries to be executed over the stream so that the only "records" kept in memory are those that match some predicate.

That means that the amount of memory needed is very much dependent on the type of query written. If you are looking for army units that don't have enough gas to complete the objective or are off course, hopefully the number of matching "records" is small.

Speculation but seems likely.

--
You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
Re:Storage by enewhuis · 2005-01-21 17:49 · Score: 1

Perhaps. I once had the shocking idea of inverting the system. Persist a mapping from queries to interested parties. Then as data streams through the system it pattern-matches and sends the hits to the recipients. The trick here is to build an optimal tree that can quickly find matches. The answer is NOT to simply store queries and walk through them on every update/stream message.
Re:Storage by Anonymous Coward · 2005-01-21 20:05 · Score: 0

Right. Strange how little new there is to monitoring real-time feeds and performing extensive finite-state pattern matching. I recall dedicated PLA-based supercomputers from TRW being used for this for certain government agencies in the early 90s. The cool algorithmic details to doing really fast is to tokenize the incoming stream and hash the essential patterns, then you just check a subset of the rules that match stream tokens. I think Verity has some IP on this.
Re:Storage by Dasein · 2005-01-22 06:21 · Score: 1

A quick poiter for you. Look at RETE networks the original author is Forgy (Sp?) 1986 or so.

It all about fast pattern mathing in expert systems but is easily applied to other fields.

--
You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
Re:Storage by enewhuis · 2005-01-24 02:00 · Score: 1

Thanks for the pointer! I will research that. I already found an interesting tidbit that discusses the relationship to tuple spaces--something I've used in the past as a basis for some high-performance asynchronous messaging. http://www.artima.com/weblogs/viewpost.jsp?thread= 72527

what they do by virtualone · 2005-01-21 11:28 · Score: 0

they don't have a generic database which performs this well.
they just take a specific problem, and write a custom-made application which produces new output as soon as there is enough new data aviable.

--
Only morons moderate based on a sig.

Hmm by Anonymous Coward · 2005-01-21 11:28 · Score: 0

I am missing something here ...

What if the whole thing crashes - what happens to the data then if nothing was stored on the harddrive ?

What does it do? by metalhed77 · 2005-01-21 11:29 · Score: 1, Redundant

I'm curious as to exactly what this does. The article is rather vague.

--
Photos.

Re:What does it do? by Anonymous Coward · 2005-01-21 12:46 · Score: 1, Insightful

It sounds to me that the application is to apply a set of rules to data as it comes running into the system. Imagine a database with "triggers" but no tables. Obviously, the rules are all cached in RAM, and they're not persisting the data stream at all (at least on this box, perhaps someplace else).

That's just a SWAG, but from the article that's what it sounds like to me.
Re:What does it do? by metalhed77 · 2005-01-21 16:02 · Score: 1

Ok, that's what I thought. But how this product is being compared to a database is odd. A database provides persistence. This sounds like it needs a constant stream of data.

--
Photos.
Re:What does it do? by Donny+Smith · 2005-01-22 01:27 · Score: 1

>But how this product is being compared to a database is odd. A database provides persistence.

I guess the idea is that you can run SQL queries on those in-memory tables (as opposed to searching memory in some non-standard way).

>This sounds like it needs a constant stream of data.

It doesn't _need_ a constant stream of data - data streams are there anwyay.
It replaces disk-based databases which are aparently useless for real-time decision support systems that must process huge constant streams of data.
Traditional databases write stuff to disk files (or its transaction logs) and read it for analysis later. This new thing eliminates writes because there is no need for persistency - they just need to make decisions based on current data which is already there. SQL language support is the key, I would say.

Data Space Transfer Protocol [DSTP]? by mosel-saar-ruwer · 2005-01-21 11:30 · Score: 1

Scientific programming question: Anybody have any experience with the Data Space Transfer Protocol? Also known as the "Data Socket Transfer Protocol"? National Instruments [NI] wrote a DSTP front end into LabVIEW, but if any major vendors have a DSTP back end, I haven't discovered it.

Or does anyone have any experience with any other methods of moving large amounts of [strongly-typed] data across the wire so that it comes to rest in a central repository in some sort of a coherent fashion?

Thanks!

Re:Data Space Transfer Protocol [DSTP]? by prostoalex · 2005-01-21 12:34 · Score: 1

No, Data Space Transfer Protocol is not "also known as" Data Socket Transfer Protocol. DataSocket is a National Instruments server that can reside on your test machine and enable streaming the test data across the Internet. So if you have a measurement test stand on one end, and LabView front-end on the other, DataSocket will take care of gluing two together.
Re:Data Space Transfer Protocol [DSTP]? by johannesg · 2005-01-21 12:39 · Score: 1

I've written a large data handling system for ESA that takes data from a variety of sources (thermocouples, PT100's, PLC's, vacuum gauges, ...) and stores it on a central server. From there the data is transferred on to prentation and control modules. We are geared towards large numbers of channels, fairly slow data updates (once a minute or so, although it will also work at much quicker rates), and large numbers of acquisition, presentation, and control stations.
I've written my own wire protocol + packers and unpackers. I tag every data value with its type (number, time, string, ...) and message position (this I use to selectively leave out values under specific circumstances, i.e. to send partial messages). This arrangement works just fine: the wire format is machine independent, and quick to read and write. The coding overhead for message packing and unpacking is limited to pretty much a single function per message type (to identify the various fields), and conversion from and to wire format is done using two universal conversion routines.
If you have any questions feel free to ask. Alternatively, we'd love to sell you a copy of the system ;-)
Re:Data Space Transfer Protocol [DSTP]? by speculatrix · 2005-01-21 20:19 · Score: 1

large amounts of strongly typed data?

sounds like some sort of persistent object database... google for CORBA, OMG, EJB (enterprise java beans)

I wonder how this is different from MySQL by flabbergast · 2005-01-21 11:30 · Score: 1

I wonder how this is different from MySQL Cluster an in memory only DB. From my own comparisons of regular MySQL versus MySQL Cluster, I didn't see much of a performance increase. But, I guess it wasn't "streaming" either. I didn't really see too many technical specs for their new DB, but I didn't really look either. I wonder how they handle saving stuff to disk? Or do they not even bother and hope that the generator holds out until the power is restored?

I call foul by RFC959 · 2005-01-21 11:30 · Score: 3, Insightful

I call foul. This quote from the article was what got to me:

Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.

So they manage to do their analysis without even touching main memory? Nifty! What do they do, make it all fit in the L1 data cache? OK, maybe the guy was misquoted - I trust reporters about as far as I can throw them - but the whole thing just smells funny to me. I'm betting that the massive speedup they report is only for carefully selected, pre-groomed data sets. I agree that analyzing data as it comes in rather than storing it up to recrunch later is the smart thing to do, but that insight isn't a breakthrough of the kind the article is spinning this as.

Re:I call foul by ComputerSlicer23 · 2005-01-21 11:50 · Score: 5, Interesting

Hmmm, I guess. My guess is that they have implemented something akin to SQL for datastrems. You define a message format. Think of each message as a row in the table. The message format is the table schema.
You have a "standing query". So you can ask things, like, what's the rolling average for the last 60 seconds for this ticker name. What's the minimum price for this commodity.
You can ask to correlate things. Store the last 90 minutes worth of transactions on these commodities. Search for these types of patterns.
It sounds like what they have done is build an OLAP cube that builds its dataset on the fly by processing messages coming over a streaming interface.
It's much smarter to do that, then write every last transaction to disk, and then query the transactions after the fact. That'd be the natural way to thing about it if you used a Relational database.
Essentially, it sure sounds like he's written a generalized packet filter, that can compute interesting functions on the data. Think snort, think ethereal, think iptables, think policy routing. Now apply those kinds of technology to "The price of this stock", "the location of that soldier", where those values are embedded in a network packet frame somewhere.
While each single application of this sounds trivial to implement, if he has done it in a generalized way, that can keep pay with larger systems, bully for him.
The irony of all this for me is that at a former job, I used to process medical data exactly this way. It sounds like the HL7 interface issues we used to have. You couldn't possibly take a full HL7 stream and process it, so you'd filter it down to just the patients that this department was interested in. Then only process messages about those patients.
There were rows that even about those patients you weren't interested in that you had to filter out. You spent a bunch of time filtering, and re-filtering.
We wrote the raw messages to disk, and spooled them to ensure we didn't miss messages due database problems (if the database was down, you had to spool until the database came back up, it was unacceptable to miss patient records for database maintience).
Kirby
Re:I call foul by DogDude · 2005-01-21 12:01 · Score: 1

I agree... how in the hell is TCP/IP going to be faster than going to memory? This kind of sounds like a cross between "Cold-Fusion-will-change-the-world" hype and "Make-everything-Internet-based" hype to me.

--
I don't respond to AC's.
Re:I call foul by astrojetsonjr · 2005-01-22 04:56 · Score: 1

Nice explanation. We do a similar thing as part of our SCADA monitor system.
Each message processor registers "interest" in a message type or messages from a particular instrument(s). The main message process gets the messages, flings them off to the interested parties and gets them into a queue to be persisted.
The message process looks at the message and does the analysis (fast, slow, trends, alarms, etc.) Each process keeps the data they need to do the longer term trending, multiple alarm relationships, etc.) They can also publish messages that others may be interested in, so if many processes want the same filter, one filter does it and then the result gets sent on.
It's fast and pretty simple to set up. Nice thing is that it will scale provided you clump processes that are interested in the same messages on the same set of boxes. We do dual network ports, messages flow "downstream" to the people that are interested.

Website not streaming... by iluvgfx · 2005-01-21 11:30 · Score: 0, Offtopic

.. for me. /.effect. nothing can accomodate /. user visits.

--
...imagine...create...express...share...enjoy...

Re:A Poem for Michael and Roland by Anonymous Coward · 2005-01-21 11:30 · Score: 1, Funny

First comes love,
Then comes marriage,
Then come a spam-advertisement campaign earning thousands a month at the expense of companies which Roland rips off.

Shit, that didn't rhyme...

Has nothing to do with relational databases by Wesley+Felter · 2005-01-21 11:30 · Score: 5, Insightful

If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.

Re:Has nothing to do with relational databases by grub · 2005-01-21 11:36 · Score: 5, Funny

Uh oh... You dared to slam Rolly. Prepare for the wrath of Mikey and his infinite mod points.

Lube thy anus.

--
Trolling is a art,
Re:Has nothing to do with relational databases by no_mayl · 2005-01-21 11:40 · Score: 1

And most probably the data streaming through is repeated at regular intervals (e.g. from the military stuff, vehicle coords, gun angles,...).

So, if you need extra info on some result (e.g. look for out-of-place vehicles, followed by, what are the drivers vitals?) you just run another query on the new data stream and definitively don't do look at past data. That is where the "don't store" comes in.
Re:Has nothing to do with relational databases by Anonymous Coward · 2005-01-21 11:59 · Score: 5, Insightful

I'm not sure that is an acurate critque of Roland. He likely did RTFA -- he just didn't UTFA
Re:Has nothing to do with relational databases by rbbs · 2005-01-21 12:18 · Score: 1

At the risk of grossly oversimplifying a very complicated concept:

From a purely theoretical pov, is there any reason not to be able to do this kind of thing relatively easily anyway?
if you're just getting information from streaming data, then surely the analogy would be putting rocks in a river - each rock would represent a set of if/then conditions and the time the water spent passing by the rock would be the time for the system to discretise the data, perform the if loop, then let it pass...
the then diverted river represents your outputs from the algorithms..

i guess the trick would come in the discretisation...but then they do describe their processing in 'transactions per second' so is 'real-time data analysis' not just a function of being able to parse the data at sufficient speed...?

(i know very little about databases, but to my lay-eyes this does just sound like a wrong application of the word 'base when it could in fact be closer to an interface to a selection of looped algorithms which you can dynamically alter in a nice pretty way -)

but then trying to remember back to my complex systems courses, you do have to be careful in discretising chaotic functions as they don't play nicely in time...so how do you analyse the data in discrete packets when a key input in their evolution in time may have passed and not been stored...if you have the data to begin with, you have the luxury of being able to choose your approximations and discretisation techniques in advance, but if the data is flowing towards you, how do the rocks in the river know how the river is to be moved if it can't everything going on around it....which somehow brings us on to supersonic shock waves and the end of my rambling...

if anyone has any idea what i'm trying to go on about, feel free to point my wafflings in the right direction...
Re:Has nothing to do with relational databases by idlake · 2005-01-21 18:08 · Score: 1

If you had read it, you'd know, though, that a lot of people use relational databases for purposes for which they are ill suited.

Nevertheless, streaming databases is a topic that a lot of companies and research groups are working on right now.
Re:Has nothing to do with relational databases by new500 · 2005-01-21 22:31 · Score: 1

He likely did RTFA -- he just didn't UTFA

But he definitely did UTF$
Re:Has nothing to do with relational databases by betelgeuse68 · 2005-01-22 15:46 · Score: 1

YES!!!! I'm laughing at all these "I used X to solve problems like this" where X is some RDBMS. Listen up folks, you can't trivially implement a tick database with such a system. 'Nuff said.

-M

coming soon to a terminal near you: steaming vdo by Anonymous Coward · 2005-01-21 11:32 · Score: 0

creators' planet/population rescue short of funds? (Score:mynuts won, insidious PostBlock devise foiled again)
by Anonymous Coward on Friday January 21, @06:28PM (#11437328)

fortunately, the whole wildly popular initiative/mandate runs on an unlimited supply of newclear power.

also fortunate (deepending on who you are/yOUR motives), is that the daze of the felonious corepirate nazi execrable are #ed/WANing into coolapps, at the (sometimes slow) speed of right.

lookout bullow.

consult with/trust in yOUR creators, disempowering unprecedented evile, & restoring (&/or wiping out) civilizations since/until forever. see you there?

Read the article before posting by bigtallmofo · 2005-01-21 11:33 · Score: 5, Insightful

Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.

--
I'm a big tall mofo.

Re:Read the article before posting by Anonymous Coward · 2005-01-21 11:49 · Score: 0

For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.
For the applications described in the article I think they would most likely want to store the data. The military tracking application is a good example. Even with a system like this, things will go wrong on the battlefield (or the trading floor, for that matter) and managers/commanders will want to see, in detail, how things went wrong.
Second, it's not immediately clear how this "database" is different from a proprietary server that accepts datastreams from sensors and processes them. How do they generalize custom, real-time analysis? I don't think I'd call this a database at all, it's more like a streaming calculator.
Re:Read the article before posting by Anonymous Coward · 2005-01-21 11:50 · Score: 0

If the data is not persisted somewhere how does the system recover from a failure or a system shutdown? I see this only working where if something is missed it does not affect the results. But in financial if a live stream of transactions are sent but the receiving server goes down... the sending server must cache these and wait for a receiving server to return online... and then.. the sending server goes down.. how does it recover the lost data.. how does this system persist data?!?
Re:Read the article before posting by Anonymous Coward · 2005-01-21 12:00 · Score: 0

If the data is useless after 1 second, where it is stored is no longer called a database. It's called "history RAM."

The story is a victim of taking quotes out of context: Relational databases are one to two orders of magnitude too slow should have been followed by for this particular application
Re:Read the article before posting by kpharmer · 2005-01-21 12:05 · Score: 4, Insightful

Right, and this solution has its own limitations within this context: namely that if you crunch your data real time, rather than read it from a data store:

1. if you decide to add a new analytic you have to start with new data - you can't deploy a new analtyical component and against historical data.

2. if your machine crashes - it takes all your accumulated analytical data along with it. Maintaining a distribution of activity calculated every 5 minutes over 90 days? Great, but after the server comes back up your data starts all over.

3. if your analtyical component needs to run against a lot of history each time (ex: total number of unique telephone numbers accessed by day, calculate rolling median) then you'll have to maintain that detail data in memory. As you can imagine - you can *easily* identify calculations that will exceed your memory. So, to tune you'll be forced to keep your calculations to relatively recent data only.

ken
Re:Read the article before posting by CounterZer0 · 2005-01-21 17:43 · Score: 2, Interesting

Why do distributions and such on the live data set? Stream through this system at highspeed, and drop the data onto a datawarehouse, who's *entire purpose in life* is to do historical crap.

A Better Solution by logicnazi · 2005-01-21 11:37 · Score: 3, Informative

Just to let everyone know this is not the only product or even the first product to do this.

Another option is EPL server by ispheres . Unlike the product mentioned here, which seems to be just some extra code thrown on top of a database EPL server is built from the ground up for this sort of application.

--

If you liked this thought maybe you would find my blog nice too:

For sensor networks by Anonymous Coward · 2005-01-21 11:41 · Score: 4, Interesting

So this is mostly for sensor networks.. where you have hundereds (or thousands) of small, cheap sensors sending data to a nearby controller.. the controller doesn't need to store every bit of data it receives; it just calculates some prespecified queries (histograms, running sums, checking for trigger conditions, etc) on them and might store some small window of data for ad hoc queries... these systems are more simlar to dataflow applications than traditional databases.

seems similar to his Auroa project... stonebraker has a history of turning his university research projects into successful startups.

CopyAndPaste, INc by Stanistani · 2005-01-21 11:41 · Score: 0

The Synopsis:

Streaming a Database in Real Time

Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In "Data On The Fly," Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk as are doing other relational database software gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? Read more...

Here are some excerpts from the Forbes article.

"Relational databases are one to two orders of magnitude too slow," says Stonebraker, who is chief technology officer at Streambase, a 25-person outfit based in Lexington, Mass. "Big customers have already tried to use relational databases for streaming data and dismissed them. Those products are non-starters in this market."
In a recent pilot program, Streambase was able to analyze 140,000 messages per second, while a leading relational database -- Stonebraker won't say which one -- could handle only 900 messages per second. Streambase has 12 customers now testing its software, all of them financial services companies that need to analyze rapid-fire ticker feeds and other streaming data.
Unlike traditional database programs, Streambase analyzes data without storing it to disk, performing queries on data as it flows. Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.
The software, which should be commercially available next month, runs on Linux and Solaris, but a Microsoft version should be available soon.

The database business is not a cheap one. So how much this new company will charge for a -- largely -- unproven software?

Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year, says Barry Morris, Streambase's chief executive.
In "StreamBase eyes real-time streaming apps," InfoWorld wrote the prices shoud be lower.

The software is available via a subscription model, with pricing in the range of approximately $50,000 per year, Stonebraker said. Subscriptions are sold on a per-CPU basis.
Who will be the customers for these speedy accesses to their databases? Let's come back to Forbes.com.

For now Streambase is focusing attention on financial services companies, which hope to do things like track how well traders are performing on a real-time basis, rather than aggregating trades at the end of the day and analyzing them overnight.
A bigger opportunity involves processing real-time data feeds generated by sensor networks and RFID tags. A military contractor wants to use Streambase to keep track of soldiers and vehicles in the battlefield. A casino in Las Vegas is considering using Streambase to track the performance of individual gamblers.
In an interview with InfoWorld, Stonebraker gave more details about military applications.

We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or [take other actions]. They already have a GPS system in every vehicle, but that didn't keep Jennifer Lynch's convoy from getting lost.
They want to turn this into a system to watch the position of every vehicle and compare it against where you're supposed to be. They also want to put a sensor on the gun turret. Together with position, that allows you to detect crossfire

--
You can't talk about Wikipedia's flaws on Wikipedia

Why don't you ask Google? by ShatteredDream · 2005-01-21 11:42 · Score: 1

Ask away!! See, that wasn't so hard...

--
Click here or a puppy gets stomped!

LOL by Luke727 · 2005-01-21 11:42 · Score: 0, Funny

Roland's name links to his old website, but the link in the summary goes to his new website. A GLITCH IN TEH MATRIX?!

--
If you find this post offensive, don't read it! THINK ABOUT YOUR BREATHING! I am what I am because of how apes behave.

ACID? by plopez · 2005-01-21 11:44 · Score: 2, Insightful

How do they deal with the durability of aspect of ACID? If the system crashes without any data in a durable data store, it dissappears forever. It sounds more like high speed data analysis vs. a true database which implies longer term storage.

--
putting the 'B' in LGBTQ+

Re:ACID? by ray-auch · 2005-01-21 12:19 · Score: 2, Informative

Difficult to tell from the vague article, but my guess is they don't, and they throw the data away after analysis. They might map some kind of database schema to the incoming data and provide some form of SQL for queying, but still no real database anywhere.

So, throw out ACID (if problem domain doesn't require it) and get performance increases, wow! Probably they are now patenting it because no one had thought of that before...
Re:ACID? by jschottm · 2005-01-23 20:52 · Score: 1

I don't know how they're handling it, but personally I'd consider having the "streaming" database analysis machine on the same network as a file based server with an ethernet card set to promiscuous mode sniffing the packets aimed at the file server. (With the switch set to route the packets to both machines, of course.) That way you could have multiple file servers (assuming your flow of data was so great that it could bog down a single server) and have the real time server analyzing the incoming flow of data without the client connections ever having to know they were talking to more than one server.

Cyberpunk, Anyone? by tsanth · 2005-01-21 11:45 · Score: 1

This reminds me of cyberpunk-esque network traffic. More specifically, I'm talking about those futures when bandwidth is so cheap that it becomes affordable (even necessary?) to have a constant flow of data coming and going from a datacenter.

Seems to me that something like this would be incredibly useful for that: when the data from a couple seconds ago is now obsolete, you definitely need to be able to parse your queue as fast as you can.

Re:coming soon to a terminal near you: steaming vd by eseiat · 2005-01-21 11:49 · Score: 1

This is without a doubt the most confusing post I have ever read on slashdot. Congratulations on completely failing to convey your thoughts coherently. WTF?

memcached by Anonymous Coward · 2005-01-21 11:49 · Score: 0

This sounds like the same theory as memcached http://www.danga.com/memcached/ . I wonder if their version improves on it?

MDBM sounds a lot cheaper by Anonymous Coward · 2005-01-21 11:51 · Score: 0

Once you have decided you are only reading (what I think they mean by streaming) the data, why use a "database" at all. Shared memory, MDBM, etc etc all will give you high speeds for relaying a snapshot of data.

Typical by Anonymous Coward · 2005-01-21 11:52 · Score: 0

I wonder how this is different from MySQL Cluster an in memory only DB.

I hate it when MySQL fanboys jump into threads like this only to show their ignorance of relational algebra and predicate calculus saying that no one should ever bother with PostgreSQL and ACID-compliance, because MySQL is somehow a "better tool for the job" in the "real world". People, please, I beg you, read this first: [1] [2] [3] [4] [5] [6] [7] before you post yet another misleading plug for your favorite toy. Thank you. A real relational database is more than just a data store with SQL frontend.

Re:Typical by flabbergast · 2005-01-21 16:17 · Score: 1

I'm not MySQL fan boy, I'm genuinely curious about the technical differences between the two DBs. The only reason I'm using MySQL as a comparison is because I did a comparison between regular MySQL and MySQL Cluster beforehand and noticed no difference in performance. Maybe you just need to simma down now!
Re:Typical by Anonymous Coward · 2005-01-22 04:45 · Score: 0

You are a phewl. MySQL is a 100% ASCII certified ACID compliant database like Oracle. Phucking idiot.
Re:Typical by Anonymous Coward · 2005-01-22 16:06 · Score: 0

Maybe you just need to simma down now!

Fuck you, I don't have to waste my time on this shit so stop acting like a dick. If you are "genuinely curious about the technical differences between the two DBs" then great, start from reading these articles:

PostgreSQL vs MySQL: Which is better? by Ian Gilfillan
MySQL Gotchas by Ian Barwick
MySQL and PostgreSQL Wikipedia articles

Things you should know before you start reading the above articles, to fully understand what they talk about, and indeed before starting to use relational databases, to avoid common mistakes like confusing objects with tuples et cetera:

Relational model, transaction processing,
ACID, atomicity, consistency, isolation, durability,
relational algebra, predicate calculus, set theory.

Those Wikipedia articles are a very good start if you really want to know what databases are all about. First of all you have to understand that RDBMS is not an object store. This is the most common mistake. What you get from a database are not objects, but tuples. They don't have an identity and they are not real things that exist, but an information about your data. See this thread for wonderful explanation.

You must have some minimum knowledge about the set theory, predicate calculus and relational algebra to understand it, but once you do, you will have a much better understanding about relational databases and your data, including the importance of ACID features. It is really worth to invest a little time now ro learn the theory and save a lot of time in the future thanks to better understanding those concepts which can be confusing at first, for they don't map into the standard OOP model and standard data models very well. Good luck.

I am looking forward to hear from you if reading those articles was helpful. This is the standard introductory material that I recommend to my students, and I'd like to hear an opinion of someone who is not scared that some criticism will cause him troubles. By the way, sorry for my English.

Memory faster than disk, film at 11 by Toby+The+Economist · 2005-01-21 11:53 · Score: 2

> Streaming data without storing it on disk gives
> them a tremendous speed advantage.

There's a reason people generally don't do this, and that's because memory is expensive.

> The company claims it can process 140,000
> messages per second on a $1,500 PC, when its
> competitors can only deal with 900 messages per
> second.

But I bet you its competitors can serve huge web-sites at 900 messages per second, whereas StreamBase can serve fits-in-memory-only web-sites at 140,000 messages per second.

--
Toby

Re:Memory faster than disk, film at 11 by woah · 2005-01-21 12:12 · Score: 1

The article is about analysing real-time data on-the-fly, which is much more efficient than storing it and then analysing it at regular intervals.
The article is not about databases in the conventional sense.

Classifier Systems: the Genetic Algor of streaming by G4from128k · 2005-01-21 11:53 · Score: 2, Interesting

Classifier Systems are a genetic algorithm analog for this type of streaming data/pattern analysis. With classifier systems a stream of incoming messages interacts with a constantly evolving population of classifier rules and an internally changing pool of working messages to create a stream of outputs. A reward/feedback loop drives adaption of the rule system to reinforce when it creates "good" outputs. The entire Classifier System concept is analogous to the mammalian immune system in the way that neural nets are analogous to brains and genetic algorithms are analogous to Darwinian evolution.

With a high enough stream processing speed (using StreamBase's methods), classifier systems might be useful for AI/adaptive learning scenarios.

--
Two wrongs don't make a right, but three lefts do.

Seems more like MOM than DB by richardoz · 2005-01-21 11:54 · Score: 2, Insightful

This seems more like Message Oriented Middleware than a Database...

--
All the worlds indeed a .sig, and we are mearly players..

Re:Seems more like MOM than DB by Anonymous Coward · 2005-01-21 20:36 · Score: 0

Yup, but the thing is, many financial applications use standard DBs like Oracle or DB2 to handle their applications, where they don't really need the history feature of a real DB. But they spend millions in order to have the greatest performance.

Re:coming soon to a terminal near you: steaming vd by IamTheExpert · 2005-01-21 11:55 · Score: 1

Is this what you feel like after reading this article.... http://www.doyousnap.com/portal/albums/7/7.aspx

why not just use the echo port by mveloso · 2005-01-21 11:57 · Score: 1

why not use the echo port? write data out to an echo port, then tee it off to your echo port. Then you can drink from the never-ending stream of data bouncing between your box and the remote box.

Simple, lots of space, and secure...until a power failure.

Re:why not just use the echo port by Anonymous Coward · 2005-01-21 14:04 · Score: 0

Aaargh...
man wtf. More bran muffins..more C.
Less HighLevel bung poodling okay?

Who runs an echo server these days?
Why trust any *netd to do the right
thing, and who implements or runs a
standalone echo server?

This optimization battle has been fought for years by Anonymous Coward · 2005-01-21 12:01 · Score: 0

But up to now, no database vendor has been bold enough to consider skipping the actual writing to the database to squeeze out that extra overhead. Kudos for thinking outside the box!

Not sure what the selling point is by bushidocoder · 2005-01-21 12:01 · Score: 1

If the article is correct, the only thing that distinguishes this dbms from more traditional is that it doesn't serialize its writes to the disk. If that's true, I don't know what the selling point is. Both MS SQL Server and Oracle have the capacity to run a database in commitless mode, in which changes aren't recorded to the disk (they can optionally be serialized on a timed interval). The military applications they talk about being difficult with traditional dbms' are already largely implemented today - most the examples Forbes offered are perfect candidates for read-only databases, which are screaming fast. What makes them different?

More Information by adesai9 · 2005-01-21 12:01 · Score: 2, Informative

DB Group @ Stanford is doing some Stream projects as well. Incase anyone is interested in more technical information check out: http://www-db.stanford.edu/stream/

sed via SQL? by Stephen+Samuel · 2005-01-21 12:02 · Score: 1

It kinda makes me think of what you'd get if you crossed SED with SQL.

--
Free Software: Like love, it grows best when given away.

Article text minus the spam by Anonymous Coward · 2005-01-21 12:03 · Score: 2, Informative

Streaming a Database in Real Time

Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In "Data On The Fly," Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk as are doing other relational database software gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? Read more...

Here are some excerpts from the Forbes article.

"Relational databases are one to two orders of magnitude too slow," says Stonebraker, who is chief technology officer at Streambase, a 25-person outfit based in Lexington, Mass. "Big customers have already tried to use relational databases for streaming data and dismissed them. Those products are non-starters in this market."

In a recent pilot program, Streambase was able to analyze 140,000 messages per second, while a leading relational database -- Stonebraker won't say which one -- could handle only 900 messages per second. Streambase has 12 customers now testing its software, all of them financial services companies that need to analyze rapid-fire ticker feeds and other streaming data.

Unlike traditional database programs, Streambase analyzes data without storing it to disk, performing queries on data as it flows. Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.

The software, which should be commercially available next month, runs on Linux and Solaris, but a Microsoft version should be available soon.

The database business is not a cheap one. So how much this new company will charge for a -- largely -- unproven software?

Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year, says Barry Morris, Streambase's chief executive.

In "StreamBase eyes real-time streaming apps," InfoWorld wrote the prices shoud be lower.

The software is available via a subscription model, with pricing in the range of approximately $50,000 per year, Stonebraker said. Subscriptions are sold on a per-CPU basis.

Who will be the customers for these speedy accesses to their databases? Let's come back to Forbes.com.

For now Streambase is focusing attention on financial services companies, which hope to do things like track how well traders are performing on a real-time basis, rather than aggregating trades at the end of the day and analyzing them overnight.

A bigger opportunity involves processing real-time data feeds generated by sensor networks and RFID tags. A military contractor wants to use Streambase to keep track of soldiers and vehicles in the battlefield. A casino in Las Vegas is considering using Streambase to track the performance of individual gamblers.

In an interview with InfoWorld, Stonebraker gave more details about military applications.

We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or [take other actions]. They already have a GPS system in every vehicle, but that didn't keep Jennifer Lynch's convoy from getting lost.

They want to turn this into a system to watch the position of every vehicle and compare it against where you're supposed to be. They also want to put a sensor on the

Badges? by Anonymous Coward · 2005-01-21 12:05 · Score: 0

Referential integrity? We ain't need no stinkin' referential integrity.

a BIG DESIGN FLAW by Anonymous Coward · 2005-01-21 12:14 · Score: 0

The network resources are NOT unlimited, you can have that and smoke it.

Got the wrong end of the stick by t_allardyce · 2005-01-21 12:21 · Score: 2, Insightful

For a minute there I thought they were trying to store a large database by just forwarding all the little bits of it around the net constantly and then grabbing them when they came back around to save disk space.. but thats a thought!

This idea really doesn't seem that new though? its just real-time DSP on text-based data! with a front-end that pretends to be a database.

--
This comment does not represent the views or opinions of the user.

The analysis is what will be hard by Ober · 2005-01-21 12:22 · Score: 1

Given that we are able to get ~50k entries / second with a tethereal output parsed via lex/yacc -> postgresql on a moderate pc I would more amazed at what level of analysis they are providing. Also the data does tend to have some importance over time for those transient issues. Add a hash to your parser and you can just aggregate the data to reduce the load on the db.

Seems kinda silly to me. by boodaman · 2005-01-21 12:28 · Score: 3, Insightful

OK, I get what they're trying to do, but my question: so what?

Sooner or later you have to put something somewhere. Let's say you monitor a battalion in battle in realtime. All of these messages are streaming in and being analyzed. Great. But now what? So something triggers an alert, say. Well, what's tracking the status of the alert? Wouldn't you want to track the status of an alert saying "this Humvee is off course"? Wouldn't you want to track whether someone had acknowledged the alert, and what they did about it?

And don't forget there are liability issues, historical issues, and more. You're a stock trader, all of these messages are coming and being analyzed. You get an alert...one of your triggers tripped. You make a trade as a result, only to find out 30 minutes later that the trigger was WRONG and your trade was WRONG and you (or your company) is out $10 million. How do you prove that you made the trade based on the trigger like you were supposed to and not because you f**ked up? The trigger, and the data that caused it to trip, is long gone. What do you do now?

Eventually something has to be written (stored) somewhere, sometime. I guess I can see the need for summarizing data and only storing what StreamBase says is "important" but how would you know if everything was OK if the actual data driving everything was long gone?

Re:Seems kinda silly to me. by oliverthered · 2005-01-21 13:12 · Score: 1

'Wouldn't you want to track the status of an alert saying "this Humvee is off course"?'

It depends on you application, unless your running a black-box the best course of action would be to relay the message to the driver of the Humvee.

It's also real handy when you get asked to produce the data in court.

--
thank God the internet isn't a human right.
Re:Seems kinda silly to me. by Anonymous Coward · 2005-01-21 19:34 · Score: 0

Um. You store the results not the incomming data. That's kind of the point. They don't store all the data and run queries on it.

Of course you keep the results. You would also adjust the queries that you are performing based on the results. Traking things that are off course would be a perfect example. And of course you would have a stored record of suggested or triggered stock buy/sell states.
Re:Seems kinda silly to me. by Anonymous Coward · 2005-01-21 22:03 · Score: 0

As far as i can see, streambase is an attempt to isolate real time processing of data from the traditional database. It is not exactly a replacement of the traditional database (as the article wrongly seems to suggest).

Consider this,
Streambase (doing the real-time analysis) + Traditional database (for other transactional requirements) = Classic OLAP system!
Re:Seems kinda silly to me. by Tarwn · 2005-01-22 00:06 · Score: 1

Just because you kind find situations were this doesn't fit (not that I agree with your examples) does not mean it is a bad idea. In fact, storage of historical data would not be dificult, simply tie another system to this system and have it store data as it is asynchronously sent. At first this seems a little hokey, why use two systems when one would do the job, but you could set up a fairly nice alert-based historical system with somehting like this and a standard db.
Here's my logic, if this system can handle 100,000 transactions and your database can't then you use this system as the first line for incoming data. That data is processed and sent back out base on the rules you define. Maybe you have one set of rules that are alerts to other applications and one set that define what data will be stored historically. The historical rules could be used to treat your common database as a real-time historian, only sending changes greater then certain percentages or certain percentage deltas. This condenses your data and, provided you have sleected tight enough constraints, only loses minor fluctuations in the value that don't mater down the road. In the meantime your applications on the other side have already received the data asynchronously, which means no polling loops, etc.

Of course, this all seems to be fairly similar to most ofthe run of the mill OPC servers out tere that don'tstore historical data, except for the added rules layer. Most OPC servers I have dealt with just let you sign up to receive all the data from a single point in an asynchronous manner. Not sure that they would be complex in terms of what they choose to send. Course, several of them handle as many or more transactions per second, so who knows.

Stock trading errors: I would say that you probably would want historical data stored for this, though not for the reason you say. If an f-up does occur, like a trade being made based on data, etc, then the only reason you would want historical data at that point is to figure out which part of your system broke down. Was it the trader making a bad choice, the remote application calculating somehting wrong with the passed data, the rules defined in the database being incorrectly defined, the data collector feeding the data to the database, or on a way outside chance, a bug in the database that somehow misrepresented one piece of data but was never noticed before when your company tested everything out (and I assume your company would test everything before balancing a $10 millon trade on it).
I don't understand why something _has_ to be written some time. There are plety of Manufacturing systems out there that only deal in real-time data and don't store historical data (I mention this because I worked in that field for a while). Granted there are a lot of things you can do with historical data that can help make you mor emoney, but if you can make it right now without the historical pieces then you probably don't know what your missing and are still raking in the money.

In any case, I agree that historical data can be very imprtant, disagree that it's lack in this situation is a fatal flaw, and again am reminded that there are people out there that would not see a problem with your stock example and probably would create an entire system to do exactly that (wuithout seeing the slew of possible problems we mentioned). It all comes down to using the tools that fit the job. This is just one more tool with a variety of uses that does not include directly replacing a standard historical database. :P

-T

--
Whee signature.

This isn't streaming, this is message queuing... by X · 2005-01-21 12:36 · Score: 4, Informative

This isn't streaming, it's standard message queuing. Most messaging products allow you to have non-persistent queues and allow you to extract data based on arbitary queries. There are well over a decades worth of products for doing this kind of stuff.

I'm sure this is a great product, but both the submitter and the writer of the story seem to not grok what makes it great.

--
sigs are a waste of space

MOD PARENT UP! by Anonymous Coward · 2005-01-21 12:43 · Score: 0

Good point and great links.

Re:Classifier Systems: the Genetic Algor of stream by headkase · 2005-01-21 12:49 · Score: 2, Interesting

Check out this diagram of a classifier system. It's taken from The Computational Beauty of Nature. The website isn't really up to date nowadays, but the full source code for everything in the book is available in both Linux and Windows downloads and there's a java applet of all the examples too.
The material covered in the book is also still very relevant and the books a joy to read.
You should buy it :^) Not astroturfing just really enjoyed the book myself.

--
Shh.

Been there, done that by Anonymous Coward · 2005-01-21 13:02 · Score: 0

When I was in the satellite ground station business in the '80s, we did this with software/hardware commutator and decommutator pairs. The computer software/hardware of the time could not possibly have kept up with the satellite's data stream, so we built custom processors to separate the various streams (images, telemetry, special sensors) and forward the data for processing. It just goes to show that everything old is new again; bell bottoms are back, too.

It's fascinating for me to realize that the data stream we thought huge is now less than a Firewire or USB 2.0 connection.

Goetz

retarded mods. by Anonymous Coward · 2005-01-21 13:09 · Score: 0

the parent is both +1 funny and reasonably on topic...

Could use a J2EE MessageBean instead by Anonymous Coward · 2005-01-21 13:09 · Score: 0

of this StreamBase nonsense. StreamBase is not standard. Sun put a lot of effort and experience in the J2EE spec.

Re:Could use a J2EE MessageBean instead by Anonymous Coward · 2005-01-21 14:48 · Score: 0

Mod Parent Up, +1, Funny.

True they could also dip that J2EE MessageBean in a a few more chocolatey layers of XML wrappers, so they would be sure to have enough code _and_ data bloat.

Ugh. by mosel-saar-ruwer · 2005-01-21 13:11 · Score: 1

I've written my own wire protocol + packers and unpackers. I tag every data value with its type (number, time, string, ...) and message position (this I use to selectively leave out values under specific circumstances, i.e. to send partial messages). This arrangement works just fine: the wire format is machine independent, and quick to read and write. The coding overhead for message packing and unpacking is limited to pretty much a single function per message type (to identify the various fields), and conversion from and to wire format is done using two universal conversion routines.

This is precisely what I feared: You had to write the whole thing from the ground up.

If there's a quality vendor out there that's got any kind of pre-packaged data transfer protocol for moving strongly-typed data, I haven't found it yet.

Re:Ugh. by johannesg · 2005-01-21 20:40 · Score: 1

This is precisely what I feared: You had to write the whole thing from the ground up.
Yeah. I suppose I could have used Corba, but now that I have the basic infrastructure in place there isn't really any advantage to doing so since the effort involved in remote function calls is now as small as it will ever get.
Besides, I can think of at least one major (multi-million euro) software package that is considered almost too slow to be useable precisely because it is attempting to use Corba to shift serious amounts of data. Maybe I shouldn't lay the blame on Corba here, since any form of synchronous communication will be slow and Corba _can_ be asynchronous as far as I'm aware, but unfortunately that is not what this package does.
What sort of data rate are you looking at? And, if you don't mind me nosing around, why the requirement for strong data typing in the message system?
And since I'm nosy anyone, I'm kinda curious what problem you are dealing with here. But maybe that's something better discussed in normal mail ;-)

Semantics. by mosel-saar-ruwer · 2005-01-21 13:20 · Score: 1

No, Data Space Transfer Protocol is not "also known as" Data Socket Transfer Protocol.

First of all, Grossman's group at UIC tends to call it Data Space Transfer Protocol. On the other hand, the promotional and marketing material at National Instruments tends to call it Data Socket Transfer Protocol.

Second, there seems to be some confusion as to what is meant by a backend. I want some sort of a server [something traditional, like Oracle/DB2/SQLServer, or something a little new-fangled, like Objectivity/Caché/Poet/ObjectStore] to serve as the backend, receive all that data, and store it in some kind of a coherent "database".

kx / kdb / q by darrellsilver · 2005-01-21 13:28 · Score: 0

kx has been doing this for ~4 years in the financial space. There's also at least one company selling software based on it in other industries...

kx is on its second major version and legitimately handles 1m-100m records per second and historical/real time databases in memory and on disk of unlimited size using standard linux/solaris/win hardware, filesystems, etc.

q, the latest version, is the k language w/ sql syntax embedded.

www.kx.com

--

I am a sig.

My RTOS will do more than 1400 messages/sec by flyingrobots · 2005-01-21 13:39 · Score: 2, Interesting

But the idea of a query engine in front of those messages is interesting.

Yet, then what is LabView? We've been processing live real-time data streams for years.

I still don't get the scope of it. It seems on one hand to be a lot of the same. This idea that they need this type of software to process data from remote sensors doesn't click. I process data from remote sense in real-time all the time (no pun intended). There is no need to store it in a DBMS and then query it in order for the data to be useful. For historical reasons, yes, but it's never necessary.

Who cares whether it runs on a $1500 PC ? by murr · 2005-01-21 13:42 · Score: 2, Interesting

... if the software costs $300K

I'd call it event processing or filtering. by Anonymous Coward · 2005-01-21 14:16 · Score: 0

Part of Tivoli does that with a prolog derivative and storing events in a database (Oracle), neither of which did too much for speed. You could crash Tivoli with a large burst of events. Our group was supposed to write an event adaptor to preprocess the events to filter them down so Tivoli didn't choke but our company went bust first.

Anyway if you're talking realtime event processing, you only care about a relatively narrow window of events, so you don't really need a database as events will age out too rapidly to bother storing.

Stweaming Datavases by Anonymous Coward · 2005-01-21 14:25 · Score: 0

Michael Palin and Bwian want some of this.

Are there databases of porn? by xXunderdogXx · 2005-01-21 14:39 · Score: 0

n/t

Data IS written to disk/backed-up. by univgeek · 2005-01-21 15:05 · Score: 3, Informative

It's just that if you start querying AFTER you store it on disk, the I/O makes it much more slower. So what you do is pick up some of the information from the flowing data, and some other system behind yours saves the data.

Every time you get some thing interesting, you save that on disk too - but separately, into a much smaller db. This way state is also saved, and since state is going to be much smaller than the data, there will be no speed issues.

Now the clever thing to do would be to link this flowing-state dbms (FSDBMS) to a standard rdbms working from the disk. Then you could verify the information from the FSDBMS, and ensure that things aren't screwed up. Also, based on patterns seen by the rdbms with long term data, new queries could be generated on the FSDBMS, allowing it to generate results from the data on the wire.

Sounds like it would have applications primarily where response time is at a premium, and long history is not such a large component of the information.

So in the case of military info, where a HumVee could be in trouble (a situ someone else has mentioned), the FSDBMS would raise the alarm, and some other process would then follow up and ensure that the alarm was taken care of.(The data itself would be backed up for future analysis, such as whether the query was correctly handled).

Dynamic queries in such a situ could be - get the id of the closest Apache reporting in, or closest loaded bomber en-route to some other target. Then the alarm handling program would re-route the bomber/apache to the humvee for support. While querying the disk database may be time intensive, the FSDBMS would have delivered a sub-optimal FAST solution.

So imagine the FSDBMS as a filter, giving different bits of information to different people. With the option that you could change the filter on the fly. And the filter could be complex, based on previous history etc., just like a DB query.

--
All bow to his Noodliness!! His Noodle Appendage has touched me!

Combining this with an RDBMS by kiwi_mcd · 2005-01-21 15:22 · Score: 2, Informative

When I was a project manager at ECONZ http://www.econz.co.nz/ in 1999 I did a high level design for a product similar to this but we merged it with a relational database (Oracle in this instance).

Other posts are correct that what is talked about here is a message queuing mechanism to some degree. What I had designed and built was what we called an event server.

Basically how it worked was that you sent what SQL statement you wanted registered and then you got the initial data set back and then any changes to it. Anytime somebody did an UPDATE or INSERT or DELETE statement the results got sent to whoever had registered for it. We sent it through our own message queue software.

This worked very well although not at the speed claimed here and was much more complex to write than we anticipated. It was written in C++ on Linux which was quite revolutionary back then...

How did it work in practice? The software that we replaced was running on SGI boxes that cost more than $10 million. We built our total hardware solution for less than $1 million (large cost was Sun boxes for Oracle). The response time dropped from minutes to seconds or less. The applicaiton was a dispatch system for jobs in the telco area with over 500 users.

This is old news by Chitlenz · 2005-01-21 16:09 · Score: 2, Insightful

I remember seeing a RAM-Cacheing scheme for Oracle a few years ago that had the same claims. In actuality Microsoft, for all the love they'll have here, allows you to do this exact thing in a Dataset object within .NET. There are several solutions to this kind of problem, but the .NET way is the one I'll focus on here.

The CommandBehavior.SequentialAccess descendant of the SelectCommand Class in C# can be assigned in a way that allows binary objects, or otherwise ... data..etc., to 'stream' in a way back and forth in realtime within the relational Dataset objects created at app instantiation. Essentially, .NET allows for the same type of action by instantiating a 'database' within the Client-side apps by building a schema of sorts, up through and including relational refernces such as foreign keys. At this point, we have a 'database' of RAM (dataset) that can now be resynched via ports to any other client or server using the same architecture.

I do this today to provide a distribution network for doctors who need access from several places to a pool of active patient data. This is a data volume of Serveral Terrabytes per location, so I assure you that we are discussing the same scale here as the article.

Consequently, the TPC benchmarks show 3,210,540 TpCM as the current posted record for AIX on a Big Blue machine, so their numbers are skewed if not wrong. Most processes, including those using binaries, can be proceduralized at the back end anyway, thus make call -> server -> stored_procedure ->return (); be the flow, with all data living inside of RAM, and sorts happening in 'real-time', that is from a pinned table into another location in memory at the server layer, returning into a dataset that is kept in RAM on the client.

I don't really see anything revolutionary about all this, correct me if I'm mistaking something?

-chitlenz

--
Imagination is the silver lining of Intelligence.

How do you like them apples? by donethat · 2005-01-21 16:31 · Score: 1

Forgive my ignorance, but last time I checked "database" implied "persistence" of some sort. It's great that it can **process** 140,000 messages per second, but how many can it **store**?! Show me something that can store 140,000 items per second and I'll be duly impressed. Until then, let's compare apples with apples and keep everybody honest.

Re:How do you like them apples? by enewhuis · 2005-01-21 17:54 · Score: 1

Persistence is only necessary for as long as something is useful to persist. When trading futures one might devise a trading system that only cares about data that has been collected over the past 20 minutes. Yet the amount of data might surpass several gig over a moving 20 minute window. And the kinds of queries executed over the data may indeed be determined after the data has been persisted. Mechanical automated "Black Box" trading systems are doing this already today.
Re:How do you like them apples? by donethat · 2005-01-22 01:40 · Score: 1

I wasn't saying the thing is useless. Just that they shouldn't compare it to a database and claim it's 156 times faster than a "competitor", a "leading database" which will remain anonymous. That's not only technically incorrect, but it's intellectually dishonest and grossly unfair. I'm not commenting on the technical merits of the software, which is pretty good actually. I don't think it's fair however for Stonebraker to say that "relational databases are one to two orders of magnitude too slow," and then proceed to compare apples and oranges to make himself look good.
Re:How do you like them apples? by rly2000 · 2005-01-22 04:16 · Score: 1

I actually worked closely with Stan Zdonik and Mike Stonebraker on the research project Aurora which preceded Streambase, before I graduated Brown.

the reason why Streambase is considered a database is because it uses traditional database querying mechanisms to process the data. It isn't meant to store data for the purposes of later retrieval; it only needs to store "windows" of streams that may be required for certain queries such as join.

Consider it a traditional database turned upside down. Whereas traditional databases process queries in realtime and store data, a streaming database processes data in realtime and stores the queries.
Re:How do you like them apples? by enewhuis · 2005-01-24 02:04 · Score: 1

Yes I concur. What I'd like to see are possibly some standards by which these new "databases" are measured both in terms of capabilities and performance thereof. www.vhayu.com www.kx.com www.streambase.com ...

Stonebraker gave a guest lecture to my class. by wirelessbuzzers · 2005-01-21 16:32 · Score: 2, Informative

Some financials company is using this software to check incoming stock feeds for problems. It takes thousands of messages per second, and if certain stocks don't come in at least once in 5 seconds, it counts a miss. For others it's 1 in 30 seconds.

If a given provider is consistently slow, it sounds a low-level alarm against the provider, not to trust their data because it's slow. Similarly for various markets, and probably other groupings too. It probably does other processing on the data.

This data is almost useless within 5 minutes, and it has to be processed very fast. If you change your application, nothing will matter within 5 minutes. If your machine crashes, you have bigger problems, as is generally the case when you want real-time processing. And you don't need a lot of history.

Streambase is much faster than the company's previous custom-coded C++ program, largely because it has better multithreading and more query optimization. It's designed to cut across multiple layers of a traditional database platform (transport, database, application).

Of course, Stonebraker could be puffing his product, but it sounds pretty effective to me.

--
I hereby place the above post in the public domain.

Re:Stonebraker gave a guest lecture to my class. by flyingrobots · 2005-01-22 02:08 · Score: 1

Still...what you have here is real-time DSP work being done on the data.

The point I would make here is: It is just as critical that a real-time controller of a vehcile respond to alerts and events as it is a financial institution. The parallels are very similar.

In real-time control of a system that could kill someone (nuclear reactor, auto-pilot, etc). These systems have to react to bad events in milliseconds or less (much tighter timeframe than 5 minutes).

I suppose the neat factor with this system would be the fact that it can intercept data on its way to a database. But this isn't all that new either. Many real-time applicaitons process data as it is coming in real-time on its way to a database as well.

The talk about being able to respond to events and such is nothing more than domain logic (i.e. what to do with the data).

I do realtime systems where I am processing 10,000 messages a second. I can understand the issues with multithreading and maybe streambase cleans some of that up, but I think this is real-time data processing in a new package with a lot of clever marketing.

Don't get me wrong, I think the clever marketing is just as important and good as any technical inovation. But I don't see any technical inovation here...
Re:Stonebraker gave a guest lecture to my class. by wirelessbuzzers · 2005-01-22 18:00 · Score: 1

As I understand it, the innovation is being able to express this domain logic in the form of SQL-like queries as well as C++ or whatever code. This makes it a lot easier to write.

--
I hereby place the above post in the public domain.

For the record--Taco's response to this by bonch · 2005-01-21 16:43 · Score: 5, Informative

I asked him why so many Roland articles get accepted, and he said he doesn't even look at the submitter's name and that Roland must be submitting good articles.

I then told him about the controversy over it in posters' minds, and he said it was just a "new successful troll meme." Good luck getting through to Slashdot's editors, because clearly Malda does not consider this anything to take seriously.

Re:For the record--Taco's response to this by Fnkmaster · 2005-01-22 01:55 · Score: 0

Malda is always like that. He clearly has no concept of how to manage his editors or ensure quality in his publication. Sure, *he* may not look at the names of submitters, but clearly Michael does, and posts an outrageous amount of Roland Piquepaille click-whoring crap.

no storage = no problem by mshurpik · 2005-01-21 17:02 · Score: 2, Insightful

This press release says a lot about analyzing streams and nothing about altering them. Most of the weight of a database is in manipulating a permanent record. INSERTS are slow. Streambase may not have any.

This is an old concept. by enewhuis · 2005-01-21 17:11 · Score: 3, Interesting

My first reaction is: He is late in the game. Check out www.kx.com. They've already done this. And this kind of thing has been used for years to analyze real-time stock and commodities trading data as the trades occur in real-time. I've deployed several systems that are essentially streaming databases like this. Or did I miss something here?

whats the value of speed by jbplou · 2005-01-21 17:48 · Score: 1

an rdms does so much more. availability and redundency. What happens in a power outage if you have several gigs in memory your ups better be able to stand up long enough for everything to be backed up to disk. Primary memory is not the place for important datastores, unless your trying to lose your job. Poof power outage or software patch leakes some memory and your screwed. Raid 10 Oracle or all my critical data sitting on a $100 dimm. I'll choose the slow Oracle.

Stop encouraging people to visit by dj42 · 2005-01-21 18:18 · Score: 2, Insightful

This comment seems fishy: "Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/ [primidi.com]) to see it for yourself." Why would someone AGAINST primidi.com getting tons of money from per view /. floods suggest, constantly, on /. that others should go check it out? Wouldn't he be encouraging people to NOT go and visit?

--
We are one consciousness experiencing itself subjectively. Back to you with the weather, Bob!

Re:Stop encouraging people to visit by Tim+C · 2005-01-21 22:28 · Score: 0

So you can verify the information for yourself, perhaps?

If nothing else, the poster asserts that the blog carries ads from blogads.com; verifying that is most easily done by going to it and checking to see where the ads are served from.

--
It's official. Most of you are morons.

If you are interesting in technology like this... by mbtmbt · 2005-01-21 19:05 · Score: 1

...namely, distributed systems that process large volumes of data in real-time, please send your resume to mbt@alum.mit.edu. I'm a co-founder of a stealth-mode startup which builds a bleeding edge platform similar to the one described in the article. We are funded, have a great management team, and are located in the San Francisco Bay Area. If there is a match between your background and our needs, we will contact you.

Re:If you are interesting in technology like this. by Anonymous Coward · 2005-01-21 20:06 · Score: 0

Cool! I am in!

Re:If you are interesting in technology like this. by lsh123 · 2005-01-21 20:09 · Score: 1

Good, glad to know there are more companies working on this. I heard the previous Stonebraker's company was not very successfull.

Re:coming soon to a terminal near you: steaming vd by Anonymous Coward · 2005-01-21 21:33 · Score: 0

By Ronald Piquepalli & Michael... "the most confusing post"? So you haven't read any others by them?

Mods: The truth about bonch/rd_syringe/OverlyCrGuy by Anonymous Coward · 2005-01-22 03:48 · Score: 0

Moderators: Please note that "bonch" is a known fanatical psycophant whose obnoxious offtopic rants are legend here on Slashdot. It doesn't matter what the topic is, he'll find a way to scrape in some pointless Microsoft shilling. While nobody expects us to love Microsoft in any way, his particularly tepid style of calling anyone he replies to "troll" or "liar" because he happens to disagree with whatever they're saying is well documented and should not be rewarded. If anything, bonch is the type of person that should not be part of the open source/free software community. He is an anathema to all that is good about free software.

I'm posting this so that you (the moderator) have some context to consider bonch and not mod him up whenever he posts his filler preformatted rants about installing Windows or whatever that unfortunately get him karma every single time and allow him to continue posting his trademark toxic crap (read on) day in and day out. You may consider this a troll - I consider it community service. And I ain't kidding.

If you're a /. subscriber, I invite you to look through some of his posting history. I guarantee that you'll be hard pressed to find someone that is more "out there" than bonch. You'll also probably notice he's got quite an AC following. Don't just read his posts, make sure you go through the replies.

For example, in this recent post bonch not only calls the OP a troll but attempts to "tell it like it is" while making some vague argument about "MS". Yes, if you're confused, you're not alone. The reply (modded +0) proceeds to simply destroy his bogus argument. You will notice he did not reply. This is what some people call "drive-by advocacy". A sort of I'll just leave you with my thoughts here and move on to the next flamebait kind of deal. In fact, he almost never replies because he knows that his fanatical arguments simply do not hold up to any sort of discussion. It's not that he's chosen the wrong cause - he's just going at it in a completely wrong way.

More? Just read though this post and the subsequent replies. I guess this stands on its own.

More? Bad spelling in astounding conspiracy theories, more offtopic FUD and uninformed "I'm right, look at me" rants, promptly proven wrong. Worse even, bonch wants to be Bill Gates, apparently (that first one is a winner). I mean, really. You think?

FUD, FUD, FUD, FUD, offtopic FUD, and more FUD. This guy is like the Monty Python SPAM skit, but with FUD and more FUD instead of canned meat. Amazed yet? Don't forget that KDE and Gnome make you dumb, and it's all a Slashdot conspiracy. How low do you want to go? Maybe as low as this?

The infamous Slashdot Front Page Troll? Nuclear fireballs? It goes on and on and on and on and on and on and on (troll?). Like the energizer bunny. Or take these two, which stretch the definition of weird.

It's up to you. We can get rid of this guy and make Slashdot a better place. I don't know about you, but I'd rather take the trolls and crapflooders over people like "bonch" any day. And I sure as hell don't want to be categorized along with him. This is not how you advocate free software, period.

TRUTH ABOUT ROLAND by 198348726583297634 · 2005-01-22 04:48 · Score: 1

He doesn't exist. The editors are pocketing the $600 ad revenue themselves. He almost never posts, and when he does there is very little stylistic denotation in his writing--it's bland, so it would not be difficult for anyone to step into the role. He has posted some self-portraits to his journal, but that doesn't mean anything; they could be got from anywhere. The whole thing is just a sleazy ploy to take a few extra bucks home each month.

At it again Bonch? by Anonymous Coward · 2005-01-22 17:39 · Score: 0

Hatred of Roland and more bitching about Slashdot. Bonch, don't you have anything better to do with your time? Your history does not show it. Let's go back in time and look at some of the M$ love fest, apologizing and Slashdot insulting from Bonch:

Blames the user for MyDoom, which distributed itself through Kazaa.
Begging for free software goodies to be ported to M$'s junk.
"Slashdot discussion--the Internet king of groupthink and propaganda." More insults, you wonder why he reads Slashdot other than to cause trouble.
Here he is bitching over being blacklisted for his behavior. Of course, he was on the infamous troll post.
"Slashdot is a bunch of kooks complaining about stuff." His way of excusing the use of M$ garbage in voting machines that were both impossible to verify and easy to manipulate.

All of the above was found by looking at two pages of google results for bonch slashdot. More than half of the results were like those.

Well, that's enough fun for me for now. Thanks for playing, Bonch. I hope your account is deleted soon. Until then, I think I'll save this post and put it wherever you show up.

Great speed by foksoft · 2005-01-23 20:45 · Score: 1

Does anybody know how much records (packets) can CISCO switches proces per second?
This database seems to be nothing more than filtering machine which calculates some statistics from data it receives. And that is nothing new. If you are able to receive more than 140,000 packets, then you have faster database than one announced. :-)

kdb by Anonymous Coward · 2005-01-24 20:01 · Score: 0

http://www.kx.com/index.php

Big boys use it.

http://kx.com/k4/e/tpcd.txt

Slashdot Mirror

Streaming a Database in Real Time

194 comments