Database Bigwigs Lead Stealthy Open Source Startup

← Back to Stories (view on slashdot.org)

Database Bigwigs Lead Stealthy Open Source Startup

Posted by ryuzaki0 on Wednesday February 14, 2007 @09:17AM from the hope-it-isn't-vaporcorp dept.

BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."

9 of 187 comments (clear)

Min score:

Reason:

Sort:

Partners by stoolpigeon · 2007-02-14 09:22 · Score: 5, Informative

The article mentions that redhat and hp are listed among their partners. i'm not surprised by red hat or informatica (another partner though they aren't mentioned in the article) but i was a little surprised by hp - since they have been trying to get the word out about their own data warehousing and bi stuff. i wonder what that indicates about how they regard this new player.

also interesting is the wikipedia article on Michael Stonebraker if you aren't already familiar with him.

--
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
Best of luck by 140Mandak262Jamuna · 2007-02-14 09:40 · Score: 5, Insightful

I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.
What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Re:Column oriented? by georgewilliamherbert · 2007-02-14 09:47 · Score: 5, Insightful

A column oriented relational database? I'd like some more details on how that works.

Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).
Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.
Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.
Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.
Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.
Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.
Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...
You're bound to get some strange looks... by Anonymous Coward · 2007-02-14 09:51 · Score: 5, Funny

during the transition when you tell people your business runs on LAVA-LAMP technology.
Speculation by cartman · 2007-02-14 09:55 · Score: 5, Informative

I noticed that Stonebraker is the company founder. Stonebraker has contributed extensively to database research over the years.
He's known for advocating the "shared-nothing" approach to parallel databases. The shared-nothing approach means that nodes in the parallel database don't attempt memory or cache synchronization, and each node has its own commodity disk array. In a shared-nothing parallel database, the data is "partitioned" across servers. So, for example, rows with id's 1-10 would be on the first server, 11-20 on the second server, etc. Executing the SQL query "select * from table where id < 1000" would send requests to multiple commodity servers and then aggregate the results. The optimizer is modified to take into account network bandwidth and latency, etc.
My guess on what they're doing: they're working on a shared-nothing parallel RDBMS with an in-memory client similar to Oracle TimesTen.
The are a few drawbacks to the shared-nothing approach: 1) the RDBMS software is more difficult to implement; 2) since the data is partitioned, any transaction that updates tuples on more than one database node requires a two-phase distributed commit, which is much more expensive; and 3) some queries are more expensive because they require transmitting large amounts of data over the network rather than a memory bus, and in rare cases that network overhead cannot be eliminated by the optimizer.
The advantage, of course, is linear scalability by adding commodity hardware. No more need for $3M+ boxes.
Re:buzzword enabled by c0nst · 2007-02-14 09:59 · Score: 5, Informative

Here you go:
Stonebraker, Mike; et al. (2005). C-Store: A Column-oriented DBMS (PDF). Proceedings of the 31st VLDB Conference.
From the paper:
Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of columnoriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures
:-)
Re:buzzword enabled by Jherek+Carnelian · 2007-02-14 10:39 · Score: 5, Funny

"grid-enabled, column-oriented relational database management system"
What does that mean?

Uh, a spreadsheet?
Re:buzzword enabled by perfczar · 2007-02-14 10:54 · Score: 5, Informative

Buzzwords, yes, but they have a little bit of meaning left. Grid-enabled means that it works on a "shared nothing" environment, that you can use a networked cluster of commodity computers if one isn't enough to hold the data, and so on. This is in contrast to using one big huge box (big computer, big storage array, or whatever). Of course many databases are similarly grid-enabled. Column-oriented means that data is stored on disk by column, this makes it fast to process a subset of columns that touch lots of rows, as is typical in data warehouse applications. This is a key architectural difference among databases; Oracle, DB2, etc., are "row stores", while Sybase IQ, Vertica, etc. are "column stores". Note: I work for Vertica Systems
Re:Given that... by perfczar · 2007-02-14 11:46 · Score: 5, Informative

Here are a few of the technical reasons one might choose Vertica over Monet; I'll not get into business issues.

Vertica is designed for large amounts of data, and is optimized for disk based systems. Monet does benchmarks against TPC-H Scale Factor 5 (30 million records, an amount which would fit in main memory) running on Postgres; Vertica does TPC-H Scale factor 1000 (6 billion records) against commercial row stores tuned by people who do such work to make a living.

Vertica runs on multi-node clusters, allowing the cluster to grow as the amount of data grows, while Monet doesn't scale to multiple machines.

There are numerous differences in the transaction systems, update architecure, tolerance of hardware failure, and so on, that make Vertica better suited to the enterprise DW market.

Note: I work for Vertica