Database Bigwigs Lead Stealthy Open Source Startup

← Back to Stories (view on slashdot.org)

Database Bigwigs Lead Stealthy Open Source Startup

Posted by ryuzaki0 on Wednesday February 14, 2007 @09:17AM from the hope-it-isn't-vaporcorp dept.

BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."

2 of 187 comments (clear)

Min score:

Reason:

Sort:

Best of luck by 140Mandak262Jamuna · 2007-02-14 09:40 · Score: 5, Insightful

I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.
What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Re:Column oriented? by georgewilliamherbert · 2007-02-14 09:47 · Score: 5, Insightful

A column oriented relational database? I'd like some more details on how that works.

Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).
Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.
Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.
Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.
Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.
Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.
Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...