Streaming a Database in Real Time
Roland Piquepaille writes "Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In 'Data On The Fly,' Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? This overview contains more details and references."
How much Does Roland Piquepaille pay you to link to his shitty articles?
It must be alot since the pay for play is so obvious.
From what I hear, Blizzard should think about hiring this guy ;)
if they are so much focused on speed, couldn't this be the mysql killer for web applications that don't need funky features but where concurrency and speed are important
Artists against online scams http://www.aa419.org/
Streaming data? Data must have some correlation otherwise it's useless. I doubt that all that can be kept in memory alone and so a permanent storage medium (disk, DAT, or holographic cubes) must be used.
I used to work with a mySQL variant which facilitate queries by using a RAMDisk and an optimized version of Watcom Pascal to enhance query functionality. We made it open source, but last I heard, the last administrator had converted it into a MP3-labelling shareware package.
What Would Google Think?
Codito, ergo sum.
with the goal of speeding access to relational databases.
Oh, so somebody's finally written a truly relational database that's true to relational principles? That treats relvars as sets and allows arbitrary relational expressions and arbitrary declarative contraints? Where can I download it?
Or did you mean "SQL database"?
I trust forbes.com about as much as I would trust, donaldtrump.com! http://www.trump.com/
This is a great breakthrough in RDBMS development. Remember that you nead real atomic ACID transactions for this thing to work reliably. Watch out for the gotchas. Good luck.
Yet another winning post by Roland Piquepalle!
The guy should write a book. It would be bland, devoid of content, have an ad on every page, but it would quote prolifically from NYT Top 100 Best-sellers list.
Roland Piquepaille and Slashdot: Is there a connection?
I think most of you are aware of the controversy surrounding regular Slashdot article submitter Roland Piquepaille. For those of you who don't know, please allow me to bring forth all the facts. Roland Piquepaille has an online journal (I refuse to use the word "blog") located at http://www.primidi.com/. It is titled "Roland Piquepaille's Technology Trends". It consists almost entirely of content, both text and pictures, taken from reputable news websites and online technical journals. He does give credit to the other websites, but it wasn't always so. Only after many complaints were raised by the Slashdot readership did he start giving credit where credit was due. However, this is not what the controversy is about.
Roland Piquepaille's Technology Trends serves online advertisements through a service called Blogads, located at www.blogads.com. Blogads is not your traditional online advertiser; rather than base payments on click-throughs, Blogads pays a flat fee based on the level of traffic your online journal generates. This way Blogads can guarantee that an advertisement on a particular online journal will reach a particular number of users. So advertisements on high traffic online journals are appropriately more expensive to buy, but the advertisement is guaranteed to be seen by a large amount of people. This, in turn, encourages people like Roland Piquepaille to try their best to increase traffic to their journals in order to increase the going rates for advertisements on their web pages. But advertisers do have some flexibility. Blogads serves two classes of advertisements. The premium ad space that is seen at the top of the web page by all viewers is reserved for "Special Advertisers"; it holds only one advertisement. The secondary ad space is located near the bottom half of the page, so that the user must scroll down the window to see it. This space can contain up to four advertisements and is reserved for regular advertisers, or just "Advertisers". Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/) to see it for yourself.
Before we talk about money, let's talk about the service that Roland Piquepaille provides in his journal. He goes out and looks for interesting articles about new and emerging technologies. He provides a very brief overview of the articles, then copies a few choice paragraphs and the occasional picture from each article and puts them up on his web page. Finally, he adds a minimal amount of original content between the copied-and-pasted text in an effort to make the journal entry coherent and appear to add value to the original articles. Nothing more, nothing less.
Now let's talk about money. Visit http://www.blogads.com/order_html?adstrip_category =tech&politics= to check the following facts for yourself. As of today, December XX 2004, the going rate for the premium advertisement space on Roland Piquepaille's Technology Trends is $375 for one month. One of the four standard advertisements costs $150 for one month. So, the maximum advertising space brings in $375 x 1 + $150 x 4 = $975 for one month. Obviously not all $975 will go directly to Roland Piquepaille, as Blogads gets a portion of that as a service fee, but he will receive the majority of it. According to the FAQ, Blogads takes 20%. So Roland Piquepaille gets 80% of $975, a maximum of $780 each month. www.primidi.com is hosted by clara.net (look it up at http://www.networksolutions.com/en_US/whois/index. jhtml). Browsing clara.net's hosting solutions, the most expensive hosting service is their Clarahost Advanced (http://www.uk.clara.net/clarahost/advanced.php) priced at £69.99 GBP. This is
Any of the enterprise databases will with gobs of memory end up caching the entire database in memory.
As long as it's read only, the disk won't be touched.
A writeable database that doesn't need to be written to disk is not a database, it's called a nonpersistent cache.
I gonna need more RAM!
they don't have a generic database which performs this well.
they just take a specific problem, and write a custom-made application which produces new output as soon as there is enough new data aviable.
Only morons moderate based on a sig.
I am missing something here ...
What if the whole thing crashes - what happens to the data then if nothing was stored on the harddrive ?
I'm curious as to exactly what this does. The article is rather vague.
Photos.
Scientific programming question: Anybody have any experience with the Data Space Transfer Protocol? Also known as the "Data Socket Transfer Protocol"? National Instruments [NI] wrote a DSTP front end into LabVIEW, but if any major vendors have a DSTP back end, I haven't discovered it.
Or does anyone have any experience with any other methods of moving large amounts of [strongly-typed] data across the wire so that it comes to rest in a central repository in some sort of a coherent fashion?
Thanks!
I wonder how this is different from MySQL Cluster an in memory only DB. From my own comparisons of regular MySQL versus MySQL Cluster, I didn't see much of a performance increase. But, I guess it wasn't "streaming" either. I didn't really see too many technical specs for their new DB, but I didn't really look either. I wonder how they handle saving stuff to disk? Or do they not even bother and hope that the generator holds out until the power is restored?
So they manage to do their analysis without even touching main memory? Nifty! What do they do, make it all fit in the L1 data cache? OK, maybe the guy was misquoted - I trust reporters about as far as I can throw them - but the whole thing just smells funny to me. I'm betting that the massive speedup they report is only for carefully selected, pre-groomed data sets. I agree that analyzing data as it comes in rather than storing it up to recrunch later is the smart thing to do, but that insight isn't a breakthrough of the kind the article is spinning this as.
.. for me. /.effect. nothing can accomodate /. user visits.
...imagine...create...express...share...enjoy...
First comes love,
Then comes marriage,
Then come a spam-advertisement campaign earning thousands a month at the expense of companies which Roland rips off.
Shit, that didn't rhyme...
If Roland had RTFA, he'd have realized that this StreamBase thing is not a relational database and does not do the job of a traditional relational database. The whole point is that it uses a different architecture to solve problems that don't map well to relational databases.
creators' planet/population rescue short of funds? (Score:mynuts won, insidious PostBlock devise foiled again)
by Anonymous Coward on Friday January 21, @06:28PM (#11437328)
fortunately, the whole wildly popular initiative/mandate runs on an unlimited supply of newclear power.
also fortunate (deepending on who you are/yOUR motives), is that the daze of the felonious corepirate nazi execrable are #ed/WANing into coolapps, at the (sometimes slow) speed of right.
lookout bullow.
consult with/trust in yOUR creators, disempowering unprecedented evile, & restoring (&/or wiping out) civilizations since/until forever. see you there?
Before another dozen people post about how in-memory databases have been done before, please read the article. They're specifically not talking about in-memory or on-disk databases. They're reading the data and analyzing it in real time as it flows through the network. For everyone asking how they're going to back such data up, you don't need to back up data that is useless 1 second after it has flowed through your network.
I'm a big tall mofo.
Just to let everyone know this is not the only product or even the first product to do this.
Another option is EPL server by ispheres . Unlike the product mentioned here, which seems to be just some extra code thrown on top of a database EPL server is built from the ground up for this sort of application.
If you liked this thought maybe you would find my blog nice too:
So this is mostly for sensor networks.. where you have hundereds (or thousands) of small, cheap sensors sending data to a nearby controller.. the controller doesn't need to store every bit of data it receives; it just calculates some prespecified queries (histograms, running sums, checking for trigger conditions, etc) on them and might store some small window of data for ad hoc queries... these systems are more simlar to dataflow applications than traditional databases.
seems similar to his Auroa project... stonebraker has a history of turning his university research projects into successful startups.
The Synopsis:
Streaming a Database in Real Time
Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In "Data On The Fly," Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk as are doing other relational database software gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? Read more...
Here are some excerpts from the Forbes article.
"Relational databases are one to two orders of magnitude too slow," says Stonebraker, who is chief technology officer at Streambase, a 25-person outfit based in Lexington, Mass. "Big customers have already tried to use relational databases for streaming data and dismissed them. Those products are non-starters in this market."
In a recent pilot program, Streambase was able to analyze 140,000 messages per second, while a leading relational database -- Stonebraker won't say which one -- could handle only 900 messages per second. Streambase has 12 customers now testing its software, all of them financial services companies that need to analyze rapid-fire ticker feeds and other streaming data.
Unlike traditional database programs, Streambase analyzes data without storing it to disk, performing queries on data as it flows. Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.
The software, which should be commercially available next month, runs on Linux and Solaris, but a Microsoft version should be available soon.
The database business is not a cheap one. So how much this new company will charge for a -- largely -- unproven software?
Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year, says Barry Morris, Streambase's chief executive.
In "StreamBase eyes real-time streaming apps," InfoWorld wrote the prices shoud be lower.
The software is available via a subscription model, with pricing in the range of approximately $50,000 per year, Stonebraker said. Subscriptions are sold on a per-CPU basis.
Who will be the customers for these speedy accesses to their databases? Let's come back to Forbes.com.
For now Streambase is focusing attention on financial services companies, which hope to do things like track how well traders are performing on a real-time basis, rather than aggregating trades at the end of the day and analyzing them overnight.
A bigger opportunity involves processing real-time data feeds generated by sensor networks and RFID tags. A military contractor wants to use Streambase to keep track of soldiers and vehicles in the battlefield. A casino in Las Vegas is considering using Streambase to track the performance of individual gamblers.
In an interview with InfoWorld, Stonebraker gave more details about military applications.
We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or [take other actions]. They already have a GPS system in every vehicle, but that didn't keep Jennifer Lynch's convoy from getting lost.
They want to turn this into a system to watch the position of every vehicle and compare it against where you're supposed to be. They also want to put a sensor on the gun turret. Together with position, that allows you to detect crossfire
You can't talk about Wikipedia's flaws on Wikipedia
Ask away!! See, that wasn't so hard...
Click here or a puppy gets stomped!
Roland's name links to his old website, but the link in the summary goes to his new website. A GLITCH IN TEH MATRIX?!
If you find this post offensive, don't read it! THINK ABOUT YOUR BREATHING! I am what I am because of how apes behave.
How do they deal with the durability of aspect of ACID? If the system crashes without any data in a durable data store, it dissappears forever. It sounds more like high speed data analysis vs. a true database which implies longer term storage.
putting the 'B' in LGBTQ+
This reminds me of cyberpunk-esque network traffic. More specifically, I'm talking about those futures when bandwidth is so cheap that it becomes affordable (even necessary?) to have a constant flow of data coming and going from a datacenter.
Seems to me that something like this would be incredibly useful for that: when the data from a couple seconds ago is now obsolete, you definitely need to be able to parse your queue as fast as you can.
This is without a doubt the most confusing post I have ever read on slashdot. Congratulations on completely failing to convey your thoughts coherently. WTF?
This sounds like the same theory as memcached http://www.danga.com/memcached/ . I wonder if their version improves on it?
Once you have decided you are only reading (what I think they mean by streaming) the data, why use a "database" at all. Shared memory, MDBM, etc etc all will give you high speeds for relaying a snapshot of data.
I wonder how this is different from MySQL Cluster an in memory only DB.
I hate it when MySQL fanboys jump into threads like this only to show their ignorance of relational algebra and predicate calculus saying that no one should ever bother with PostgreSQL and ACID-compliance, because MySQL is somehow a "better tool for the job" in the "real world". People, please, I beg you, read this first: [1] [2] [3] [4] [5] [6] [7] before you post yet another misleading plug for your favorite toy. Thank you. A real relational database is more than just a data store with SQL frontend.
> Streaming data without storing it on disk gives
> them a tremendous speed advantage.
There's a reason people generally don't do this, and that's because memory is expensive.
> The company claims it can process 140,000
> messages per second on a $1,500 PC, when its
> competitors can only deal with 900 messages per
> second.
But I bet you its competitors can serve huge web-sites at 900 messages per second, whereas StreamBase can serve fits-in-memory-only web-sites at 140,000 messages per second.
--
Toby
Classifier Systems are a genetic algorithm analog for this type of streaming data/pattern analysis. With classifier systems a stream of incoming messages interacts with a constantly evolving population of classifier rules and an internally changing pool of working messages to create a stream of outputs. A reward/feedback loop drives adaption of the rule system to reinforce when it creates "good" outputs. The entire Classifier System concept is analogous to the mammalian immune system in the way that neural nets are analogous to brains and genetic algorithms are analogous to Darwinian evolution.
With a high enough stream processing speed (using StreamBase's methods), classifier systems might be useful for AI/adaptive learning scenarios.
Two wrongs don't make a right, but three lefts do.
This seems more like Message Oriented Middleware than a Database...
All the worlds indeed a
Is this what you feel like after reading this article.... http://www.doyousnap.com/portal/albums/7/7.aspx
why not use the echo port? write data out to an echo port, then tee it off to your echo port. Then you can drink from the never-ending stream of data bouncing between your box and the remote box.
Simple, lots of space, and secure...until a power failure.
But up to now, no database vendor has been bold enough to consider skipping the actual writing to the database to squeeze out that extra overhead. Kudos for thinking outside the box!
If the article is correct, the only thing that distinguishes this dbms from more traditional is that it doesn't serialize its writes to the disk. If that's true, I don't know what the selling point is. Both MS SQL Server and Oracle have the capacity to run a database in commitless mode, in which changes aren't recorded to the disk (they can optionally be serialized on a timed interval). The military applications they talk about being difficult with traditional dbms' are already largely implemented today - most the examples Forbes offered are perfect candidates for read-only databases, which are screaming fast. What makes them different?
DB Group @ Stanford is doing some Stream projects as well. Incase anyone is interested in more technical information check out: http://www-db.stanford.edu/stream/
It kinda makes me think of what you'd get if you crossed SED with SQL.
Free Software: Like love, it grows best when given away.
Streaming a Database in Real Time
Michael Stonebraker is well-known in the database business, and for good reasons. He was the computer science professor behind Ingres and Postgres. Eighteen months ago, he started a new company, StreamBase, with another computer science professor, Stan Zdonik, with the goal of speeding access to relational databases. In "Data On The Fly," Forbes.com reports that the company software, also named StreamBase, is reading TCP/IP streams and using asynchronous messaging. Streaming data without storing it on disk as are doing other relational database software gives them a tremendous speed advantage. The company claims it can process 140,000 messages per second on a $1,500 PC, when its competitors can only deal with 900 messages per second. Too good to be true? Read more...
Here are some excerpts from the Forbes article.
"Relational databases are one to two orders of magnitude too slow," says Stonebraker, who is chief technology officer at Streambase, a 25-person outfit based in Lexington, Mass. "Big customers have already tried to use relational databases for streaming data and dismissed them. Those products are non-starters in this market."
In a recent pilot program, Streambase was able to analyze 140,000 messages per second, while a leading relational database -- Stonebraker won't say which one -- could handle only 900 messages per second. Streambase has 12 customers now testing its software, all of them financial services companies that need to analyze rapid-fire ticker feeds and other streaming data.
Unlike traditional database programs, Streambase analyzes data without storing it to disk, performing queries on data as it flows. Traditional systems bog down because they first store data on hard drives or in main memory and then query it, Stonebraker says.
The software, which should be commercially available next month, runs on Linux and Solaris, but a Microsoft version should be available soon.
The database business is not a cheap one. So how much this new company will charge for a -- largely -- unproven software?
Streambase charges customers annual subscriptions for its software, setting prices based on how many CPUs a customer uses to power the software. Typical deals so far have ranged from $100,000 to $300,000 a year, says Barry Morris, Streambase's chief executive.
In "StreamBase eyes real-time streaming apps," InfoWorld wrote the prices shoud be lower.
The software is available via a subscription model, with pricing in the range of approximately $50,000 per year, Stonebraker said. Subscriptions are sold on a per-CPU basis.
Who will be the customers for these speedy accesses to their databases? Let's come back to Forbes.com.
For now Streambase is focusing attention on financial services companies, which hope to do things like track how well traders are performing on a real-time basis, rather than aggregating trades at the end of the day and analyzing them overnight.
A bigger opportunity involves processing real-time data feeds generated by sensor networks and RFID tags. A military contractor wants to use Streambase to keep track of soldiers and vehicles in the battlefield. A casino in Las Vegas is considering using Streambase to track the performance of individual gamblers.
In an interview with InfoWorld, Stonebraker gave more details about military applications.
We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or [take other actions]. They already have a GPS system in every vehicle, but that didn't keep Jennifer Lynch's convoy from getting lost.
They want to turn this into a system to watch the position of every vehicle and compare it against where you're supposed to be. They also want to put a sensor on the
Referential integrity? We ain't need no stinkin' referential integrity.
The network resources are NOT unlimited, you can have that and smoke it.
For a minute there I thought they were trying to store a large database by just forwarding all the little bits of it around the net constantly and then grabbing them when they came back around to save disk space.. but thats a thought!
This idea really doesn't seem that new though? its just real-time DSP on text-based data! with a front-end that pretends to be a database.
This comment does not represent the views or opinions of the user.
Given that we are able to get ~50k entries / second with a tethereal output parsed via lex/yacc -> postgresql on a moderate pc I would more amazed at what level of analysis they are providing. Also the data does tend to have some importance over time for those transient issues. Add a hash to your parser and you can just aggregate the data to reduce the load on the db.
OK, I get what they're trying to do, but my question: so what?
Sooner or later you have to put something somewhere. Let's say you monitor a battalion in battle in realtime. All of these messages are streaming in and being analyzed. Great. But now what? So something triggers an alert, say. Well, what's tracking the status of the alert? Wouldn't you want to track the status of an alert saying "this Humvee is off course"? Wouldn't you want to track whether someone had acknowledged the alert, and what they did about it?
And don't forget there are liability issues, historical issues, and more. You're a stock trader, all of these messages are coming and being analyzed. You get an alert...one of your triggers tripped. You make a trade as a result, only to find out 30 minutes later that the trigger was WRONG and your trade was WRONG and you (or your company) is out $10 million. How do you prove that you made the trade based on the trigger like you were supposed to and not because you f**ked up? The trigger, and the data that caused it to trip, is long gone. What do you do now?
Eventually something has to be written (stored) somewhere, sometime. I guess I can see the need for summarizing data and only storing what StreamBase says is "important" but how would you know if everything was OK if the actual data driving everything was long gone?
This isn't streaming, it's standard message queuing. Most messaging products allow you to have non-persistent queues and allow you to extract data based on arbitary queries. There are well over a decades worth of products for doing this kind of stuff.
I'm sure this is a great product, but both the submitter and the writer of the story seem to not grok what makes it great.
sigs are a waste of space
Good point and great links.
Check out this diagram of a classifier system. It's taken from The Computational Beauty of Nature. The website isn't really up to date nowadays, but the full source code for everything in the book is available in both Linux and Windows downloads and there's a java applet of all the examples too. :^) Not astroturfing just really enjoyed the book myself.
The material covered in the book is also still very relevant and the books a joy to read.
You should buy it
Shh.
When I was in the satellite ground station business in the '80s, we did this with software/hardware commutator and decommutator pairs. The computer software/hardware of the time could not possibly have kept up with the satellite's data stream, so we built custom processors to separate the various streams (images, telemetry, special sensors) and forward the data for processing. It just goes to show that everything old is new again; bell bottoms are back, too.
It's fascinating for me to realize that the data stream we thought huge is now less than a Firewire or USB 2.0 connection.
Goetz
the parent is both +1 funny and reasonably on topic...
of this StreamBase nonsense. StreamBase is not standard. Sun put a lot of effort and experience in the J2EE spec.
I've written my own wire protocol + packers and unpackers. I tag every data value with its type (number, time, string,
This is precisely what I feared: You had to write the whole thing from the ground up.
If there's a quality vendor out there that's got any kind of pre-packaged data transfer protocol for moving strongly-typed data, I haven't found it yet.
No, Data Space Transfer Protocol is not "also known as" Data Socket Transfer Protocol.
First of all, Grossman's group at UIC tends to call it Data Space Transfer Protocol. On the other hand, the promotional and marketing material at National Instruments tends to call it Data Socket Transfer Protocol.
Second, there seems to be some confusion as to what is meant by a backend. I want some sort of a server [something traditional, like Oracle/DB2/SQLServer, or something a little new-fangled, like Objectivity/Caché/Poet/ObjectStore] to serve as the backend, receive all that data, and store it in some kind of a coherent "database".
kx has been doing this for ~4 years in the financial space. There's also at least one company selling software based on it in other industries...
kx is on its second major version and legitimately handles 1m-100m records per second and historical/real time databases in memory and on disk of unlimited size using standard linux/solaris/win hardware, filesystems, etc.
q, the latest version, is the k language w/ sql syntax embedded.
www.kx.com
I am a sig.
But the idea of a query engine in front of those messages is interesting.
Yet, then what is LabView? We've been processing live real-time data streams for years.
I still don't get the scope of it. It seems on one hand to be a lot of the same. This idea that they need this type of software to process data from remote sensors doesn't click. I process data from remote sense in real-time all the time (no pun intended). There is no need to store it in a DBMS and then query it in order for the data to be useful. For historical reasons, yes, but it's never necessary.
... if the software costs $300K
Anyway if you're talking realtime event processing, you only care about a relatively narrow window of events, so you don't really need a database as events will age out too rapidly to bother storing.
Michael Palin and Bwian want some of this.
n/t
It's just that if you start querying AFTER you store it on disk, the I/O makes it much more slower. So what you do is pick up some of the information from the flowing data, and some other system behind yours saves the data.
Every time you get some thing interesting, you save that on disk too - but separately, into a much smaller db. This way state is also saved, and since state is going to be much smaller than the data, there will be no speed issues.
Now the clever thing to do would be to link this flowing-state dbms (FSDBMS) to a standard rdbms working from the disk. Then you could verify the information from the FSDBMS, and ensure that things aren't screwed up. Also, based on patterns seen by the rdbms with long term data, new queries could be generated on the FSDBMS, allowing it to generate results from the data on the wire.
Sounds like it would have applications primarily where response time is at a premium, and long history is not such a large component of the information.
So in the case of military info, where a HumVee could be in trouble (a situ someone else has mentioned), the FSDBMS would raise the alarm, and some other process would then follow up and ensure that the alarm was taken care of.(The data itself would be backed up for future analysis, such as whether the query was correctly handled).
Dynamic queries in such a situ could be - get the id of the closest Apache reporting in, or closest loaded bomber en-route to some other target. Then the alarm handling program would re-route the bomber/apache to the humvee for support. While querying the disk database may be time intensive, the FSDBMS would have delivered a sub-optimal FAST solution.
So imagine the FSDBMS as a filter, giving different bits of information to different people. With the option that you could change the filter on the fly. And the filter could be complex, based on previous history etc., just like a DB query.
All bow to his Noodliness!! His Noodle Appendage has touched me!
When I was a project manager at ECONZ http://www.econz.co.nz/ in 1999 I did a high level design for a product similar to this but we merged it with a relational database (Oracle in this instance).
Other posts are correct that what is talked about here is a message queuing mechanism to some degree. What I had designed and built was what we called an event server.
Basically how it worked was that you sent what SQL statement you wanted registered and then you got the initial data set back and then any changes to it. Anytime somebody did an UPDATE or INSERT or DELETE statement the results got sent to whoever had registered for it. We sent it through our own message queue software.
This worked very well although not at the speed claimed here and was much more complex to write than we anticipated. It was written in C++ on Linux which was quite revolutionary back then...
How did it work in practice? The software that we replaced was running on SGI boxes that cost more than $10 million. We built our total hardware solution for less than $1 million (large cost was Sun boxes for Oracle). The response time dropped from minutes to seconds or less. The applicaiton was a dispatch system for jobs in the telco area with over 500 users.
I remember seeing a RAM-Cacheing scheme for Oracle a few years ago that had the same claims. In actuality Microsoft, for all the love they'll have here, allows you to do this exact thing in a Dataset object within .NET. There are several solutions to this kind of problem, but the .NET way is the one I'll focus on here.
... data..etc., to 'stream' in a way back and forth in realtime within the relational Dataset objects created at app instantiation. Essentially, .NET allows for the same type of action by instantiating a 'database' within the Client-side apps by building a schema of sorts, up through and including relational refernces such as foreign keys. At this point, we have a 'database' of RAM (dataset) that can now be resynched via ports to any other client or server using the same architecture.
The CommandBehavior.SequentialAccess descendant of the SelectCommand Class in C# can be assigned in a way that allows binary objects, or otherwise
I do this today to provide a distribution network for doctors who need access from several places to a pool of active patient data. This is a data volume of Serveral Terrabytes per location, so I assure you that we are discussing the same scale here as the article.
Consequently, the TPC benchmarks show 3,210,540 TpCM as the current posted record for AIX on a Big Blue machine, so their numbers are skewed if not wrong. Most processes, including those using binaries, can be proceduralized at the back end anyway, thus make call -> server -> stored_procedure ->return (); be the flow, with all data living inside of RAM, and sorts happening in 'real-time', that is from a pinned table into another location in memory at the server layer, returning into a dataset that is kept in RAM on the client.
I don't really see anything revolutionary about all this, correct me if I'm mistaking something?
-chitlenz
Imagination is the silver lining of Intelligence.
Forgive my ignorance, but last time I checked "database" implied "persistence" of some sort. It's great that it can **process** 140,000 messages per second, but how many can it **store**?! Show me something that can store 140,000 items per second and I'll be duly impressed. Until then, let's compare apples with apples and keep everybody honest.
Some financials company is using this software to check incoming stock feeds for problems. It takes thousands of messages per second, and if certain stocks don't come in at least once in 5 seconds, it counts a miss. For others it's 1 in 30 seconds.
If a given provider is consistently slow, it sounds a low-level alarm against the provider, not to trust their data because it's slow. Similarly for various markets, and probably other groupings too. It probably does other processing on the data.
This data is almost useless within 5 minutes, and it has to be processed very fast. If you change your application, nothing will matter within 5 minutes. If your machine crashes, you have bigger problems, as is generally the case when you want real-time processing. And you don't need a lot of history.
Streambase is much faster than the company's previous custom-coded C++ program, largely because it has better multithreading and more query optimization. It's designed to cut across multiple layers of a traditional database platform (transport, database, application).
Of course, Stonebraker could be puffing his product, but it sounds pretty effective to me.
I hereby place the above post in the public domain.
I asked him why so many Roland articles get accepted, and he said he doesn't even look at the submitter's name and that Roland must be submitting good articles.
I then told him about the controversy over it in posters' minds, and he said it was just a "new successful troll meme." Good luck getting through to Slashdot's editors, because clearly Malda does not consider this anything to take seriously.
This press release says a lot about analyzing streams and nothing about altering them. Most of the weight of a database is in manipulating a permanent record. INSERTS are slow. Streambase may not have any.
My first reaction is: He is late in the game. Check out www.kx.com. They've already done this. And this kind of thing has been used for years to analyze real-time stock and commodities trading data as the trades occur in real-time. I've deployed several systems that are essentially streaming databases like this. Or did I miss something here?
an rdms does so much more. availability and redundency. What happens in a power outage if you have several gigs in memory your ups better be able to stand up long enough for everything to be backed up to disk. Primary memory is not the place for important datastores, unless your trying to lose your job. Poof power outage or software patch leakes some memory and your screwed. Raid 10 Oracle or all my critical data sitting on a $100 dimm. I'll choose the slow Oracle.
This comment seems fishy: "Visit Roland Piquepaille's Technology Trends (http://www.primidi.com/ [primidi.com]) to see it for yourself." Why would someone AGAINST primidi.com getting tons of money from per view /. floods suggest, constantly, on /. that others should go check it out?
Wouldn't he be encouraging people to NOT go and visit?
We are one consciousness experiencing itself subjectively. Back to you with the weather, Bob!
...namely, distributed systems that process large volumes of data in real-time, please send your resume to mbt@alum.mit.edu. I'm a co-founder of a stealth-mode startup which builds a bleeding edge platform similar to the one described in the article. We are funded, have a great management team, and are located in the San Francisco Bay Area. If there is a match between your background and our needs, we will contact you.
Cool! I am in!
Good, glad to know there are more companies working on this. I heard the previous Stonebraker's company was not very successfull.
By Ronald Piquepalli & Michael... "the most confusing post"? So you haven't read any others by them?
Moderators: Please note that "bonch" is a known fanatical psycophant whose obnoxious offtopic rants are legend here on Slashdot. It doesn't matter what the topic is, he'll find a way to scrape in some pointless Microsoft shilling. While nobody expects us to love Microsoft in any way, his particularly tepid style of calling anyone he replies to "troll" or "liar" because he happens to disagree with whatever they're saying is well documented and should not be rewarded. If anything, bonch is the type of person that should not be part of the open source/free software community. He is an anathema to all that is good about free software.
/. subscriber, I invite you to look through some of his posting history. I guarantee that you'll be hard pressed to find someone that is more "out there" than bonch. You'll also probably notice he's got quite an AC following. Don't just read his posts, make sure you go through the replies.
I'm posting this so that you (the moderator) have some context to consider bonch and not mod him up whenever he posts his filler preformatted rants about installing Windows or whatever that unfortunately get him karma every single time and allow him to continue posting his trademark toxic crap (read on) day in and day out. You may consider this a troll - I consider it community service. And I ain't kidding.
If you're a
For example, in this recent post bonch not only calls the OP a troll but attempts to "tell it like it is" while making some vague argument about "MS". Yes, if you're confused, you're not alone. The reply (modded +0) proceeds to simply destroy his bogus argument. You will notice he did not reply. This is what some people call "drive-by advocacy". A sort of I'll just leave you with my thoughts here and move on to the next flamebait kind of deal. In fact, he almost never replies because he knows that his fanatical arguments simply do not hold up to any sort of discussion. It's not that he's chosen the wrong cause - he's just going at it in a completely wrong way.
More? Just read though this post and the subsequent replies. I guess this stands on its own.
More? Bad spelling in astounding conspiracy theories, more offtopic FUD and uninformed "I'm right, look at me" rants, promptly proven wrong. Worse even, bonch wants to be Bill Gates, apparently (that first one is a winner). I mean, really. You think?
FUD, FUD, FUD, FUD, offtopic FUD, and more FUD. This guy is like the Monty Python SPAM skit, but with FUD and more FUD instead of canned meat. Amazed yet? Don't forget that KDE and Gnome make you dumb, and it's all a Slashdot conspiracy. How low do you want to go? Maybe as low as this?
The infamous Slashdot Front Page Troll? Nuclear fireballs? It goes on and on and on and on and on and on and on (troll?). Like the energizer bunny. Or take these two, which stretch the definition of weird.
It's up to you. We can get rid of this guy and make Slashdot a better place. I don't know about you, but I'd rather take the trolls and crapflooders over people like "bonch" any day. And I sure as hell don't want to be categorized along with him. This is not how you advocate free software, period.
He doesn't exist. The editors are pocketing the $600 ad revenue themselves. He almost never posts, and when he does there is very little stylistic denotation in his writing--it's bland, so it would not be difficult for anyone to step into the role. He has posted some self-portraits to his journal, but that doesn't mean anything; they could be got from anywhere. The whole thing is just a sleazy ploy to take a few extra bucks home each month.
All of the above was found by looking at two pages of google results for bonch slashdot. More than half of the results were like those.
Well, that's enough fun for me for now. Thanks for playing, Bonch. I hope your account is deleted soon. Until then, I think I'll save this post and put it wherever you show up.
Does anybody know how much records (packets) can CISCO switches proces per second? :-)
This database seems to be nothing more than filtering machine which calculates some statistics from data it receives. And that is nothing new. If you are able to receive more than 140,000 packets, then you have faster database than one announced.
http://www.kx.com/index.php
Big boys use it.
http://kx.com/k4/e/tpcd.txt