Zvents Releases Open Source Cluster Database Based on Google

← Back to Stories (view on slashdot.org)

Zvents Releases Open Source Cluster Database Based on Google

Posted by ScuttleMonkey on Friday February 8, 2008 @11:19AM from the surprised-it-took-this-long dept.

An anonymous reader writes "Local search engine company, Zvents, has released an open source distributed data storage system based on Google's released design specs. 'The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs [...] The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.'"

4 of 87 comments (clear)

Min score:

Reason:

Sort:

how useful is DHT? by convolvatron · 2008-02-08 11:42 · Score: 3, Insightful

i've been interested in this question for the last few years. how much do people value the ability to use a relational language and transactional consistency, or for most of these uses are these things just historical artifacts?
1. Re:how useful is DHT? by moderatorrater · 2008-02-08 11:51 · Score: 4, Interesting
  
  It's useful for ridiculously large data sets, like the entire internet. I know that medium sized stores (overstock, etc) use a relational database, and anything with less data than that is probably going to use a relational database. However, for extremely large data sets and certain repetitive, non-dependent loops (such as, say, looping through every website for a search), this can be useful. At least for now, relational databases are more useful overall, but tools like this have their place, and as data sets grow faster than real computational power, they'll be used more and more.
2. Re:how useful is DHT? by ShieldW0lf · 2008-02-08 12:00 · Score: 4, Insightful
  
  i've been interested in this question for the last few years. how much do people value the ability to use a relational language and transactional consistency, or for most of these uses are these things just historical artifacts?
  
  In the 7 years I've been working in the industry, I've never delivered a single project that I would trust to a non-ACID database. Ever. And I doubt I ever will. If you want something that will generate some marketing material at high speed, and if it fails, who cares, well, use MySQL. If you want to do something that can handle a million pithy comments and if some of them get lost in the shuffle, who cares, well, that's fine too. Use whatever serves fast. If you're running Google, and it doesn't matter if a node drops out because there is no "right" answer to get wrong in the first place as long as you spit out a bunch of links, well, these sorts of non-resilient systems are fine.
  
  Personally, I've never done projects like that. In my projects, if the data isn't perfect always and forever, it's worse than if it had never been written. It's very existence is a liability, because people will rely on it when they shouldn't, for things that can't get by with "close".
  
  So yes. Transactional consistency and a solid relational model are pretty much mandatory, and not going anywhere soon. The idea that they might be replaced by technology such as this is laughable.
  
  --
  -1 Uncomfortable Truth
Column Orientated DBMS by inKubus · 2008-02-08 11:48 · Score: 5, Informative

This is a classic column-orientated DBMS, ala Sybase. You use these for data warehousing since they are optimized for read queries and not transactions. Stuff like Google search queries. It also allows you to quickly build cubes of data across a timeline, since you have data in columns instead of rows.

IE:

a,b,c,d,e; 1,2,3,4,5,6; a,b,c,d,e;

instead of:

a, 1, a;
b, 2, b;
c, 3, c;
d, 4, d;
e, 5, e;

A cube using the time dimension would look like:

01:01:01; a,b,c,d,e; 1,2,3,4,5; a,b,c,d,e;
01:01:02; a,b,c,d,e; 1,2,6,4,5; a,b,c,d,e;

It's pretty difficult to do the same thing with row-based DBMS. However, you can see that doing an insert is going to be costly.. This looks like a pretty good try, I know there were some other projects going to try to replicate what BigTable does. And after hearing that IBM story the other day about one computer running the entire internet, I started thinking about Google.

More interesting is their distributed file system, which is what makes this really work well.

--
Cool! Amazing Toys.