Zvents Releases Open Source Cluster Database Based on Google
An anonymous reader writes "Local search engine company, Zvents, has released an open source distributed data storage system based on Google's released design specs. 'The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs [...] The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.'"
i've been interested in this question for the last few years. how much do people value the ability to use a relational language and transactional consistency, or for most of these uses are these things just historical artifacts?
This is a classic column-orientated DBMS, ala Sybase. You use these for data warehousing since they are optimized for read queries and not transactions. Stuff like Google search queries. It also allows you to quickly build cubes of data across a timeline, since you have data in columns instead of rows.
IE:
a,b,c,d,e; 1,2,3,4,5,6; a,b,c,d,e;
instead of:
a, 1, a;
b, 2, b;
c, 3, c;
d, 4, d;
e, 5, e;
A cube using the time dimension would look like:
01:01:01; a,b,c,d,e; 1,2,3,4,5; a,b,c,d,e;
01:01:02; a,b,c,d,e; 1,2,6,4,5; a,b,c,d,e;
It's pretty difficult to do the same thing with row-based DBMS. However, you can see that doing an insert is going to be costly.. This looks like a pretty good try, I know there were some other projects going to try to replicate what BigTable does. And after hearing that IBM story the other day about one computer running the entire internet, I started thinking about Google.
More interesting is their distributed file system, which is what makes this really work well.
Cool! Amazing Toys.