Slashdot Mirror


Zvents Releases Open Source Cluster Database Based on Google

An anonymous reader writes "Local search engine company, Zvents, has released an open source distributed data storage system based on Google's released design specs. 'The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs [...] The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.'"

14 of 87 comments (clear)

  1. Kitten Nipples by milsoRgen · · Score: 2, Insightful

    ..designed to scale to 1000 nodes, all commodity PCs... I'm just curious if anyone has had any experiance with these types of systems using commodity PCs, how is performance and does how well does it scale as you increase the amount of nodes?
    --
    I'm sick of following my dreams. I'm just going to ask where they're goin' and hook up with 'em later.
    1. Re:Kitten Nipples by Idiot+with+a+gun · · Score: 2, Insightful

      IIRC, Paypal and Google both use commodity PCs in clusters like this. Google uses something similar to above (duh), and Paypal uses a 3 tiered, multi-PC setup (Database, caching layer, and application side layers, respectively).

    2. Re:Kitten Nipples by allenw · · Score: 2, Informative

      So, Hypertable runs on top of Hadoop. We don't use Hypertable (or HBase) so I can't commen on those. I can share some of our experiences with Hadoop though. I think it is safe to say that it scales quite well for the vast majority of people who need it. Let's deep dive for a bit...

      Hadoop keeps all of its file system metadata in memory on a machine called the name node. This includes information about block placement and which files are allocated which blocks. Therefore, the big crunch we've seen is the total amount of memory available to the JVM's heap. With a 16G machine (with ~14.5G heap) for the name node and ~2000 machines acting as data nodes, we're scaling to somewhere between 12-18 million or so files [it's been a while since I've looked... :) ]. Our capacity should be around 5PB or so, but keep in mind that large chunks of that are taken up by block replication and that, for the most part, that number is tied into the physical size of the drives in use. ... and, yes, this is all on commodity machines that you can get from pretty much anyone.

      We're working on making it scale better, of course. But we've come a long way in a really short time. [We've doubled capacity in less than... six months? Something like that.]

  2. how useful is DHT? by convolvatron · · Score: 3, Insightful

    i've been interested in this question for the last few years. how much do people value the ability to use a relational language and transactional consistency, or for most of these uses are these things just historical artifacts?

    1. Re:how useful is DHT? by moderatorrater · · Score: 4, Interesting

      It's useful for ridiculously large data sets, like the entire internet. I know that medium sized stores (overstock, etc) use a relational database, and anything with less data than that is probably going to use a relational database. However, for extremely large data sets and certain repetitive, non-dependent loops (such as, say, looping through every website for a search), this can be useful. At least for now, relational databases are more useful overall, but tools like this have their place, and as data sets grow faster than real computational power, they'll be used more and more.

    2. Re:how useful is DHT? by ShieldW0lf · · Score: 4, Insightful

      i've been interested in this question for the last few years. how much do people value the ability to use a relational language and transactional consistency, or for most of these uses are these things just historical artifacts?

      In the 7 years I've been working in the industry, I've never delivered a single project that I would trust to a non-ACID database. Ever. And I doubt I ever will. If you want something that will generate some marketing material at high speed, and if it fails, who cares, well, use MySQL. If you want to do something that can handle a million pithy comments and if some of them get lost in the shuffle, who cares, well, that's fine too. Use whatever serves fast. If you're running Google, and it doesn't matter if a node drops out because there is no "right" answer to get wrong in the first place as long as you spit out a bunch of links, well, these sorts of non-resilient systems are fine.

      Personally, I've never done projects like that. In my projects, if the data isn't perfect always and forever, it's worse than if it had never been written. It's very existence is a liability, because people will rely on it when they shouldn't, for things that can't get by with "close".

      So yes. Transactional consistency and a solid relational model are pretty much mandatory, and not going anywhere soon. The idea that they might be replaced by technology such as this is laughable.

      --
      -1 Uncomfortable Truth
    3. Re:how useful is DHT? by nguy · · Score: 2, Informative

      So yes. Transactional consistency and a solid relational model are pretty much mandatory, and not going anywhere soon. The idea that they might be replaced by technology such as this is laughable.

      Relational databases don't implement the relational model correctly anyway. As for transactional consistency, you can get that on top of many different kinds of stores (including file systems); relational databases have no monopoly on that.

  3. Column Orientated DBMS by inKubus · · Score: 5, Informative

    This is a classic column-orientated DBMS, ala Sybase. You use these for data warehousing since they are optimized for read queries and not transactions. Stuff like Google search queries. It also allows you to quickly build cubes of data across a timeline, since you have data in columns instead of rows.

    IE:

    a,b,c,d,e; 1,2,3,4,5,6; a,b,c,d,e;

    instead of:

    a, 1, a;
    b, 2, b;
    c, 3, c;
    d, 4, d;
    e, 5, e;

    A cube using the time dimension would look like:

    01:01:01; a,b,c,d,e; 1,2,3,4,5; a,b,c,d,e;
    01:01:02; a,b,c,d,e; 1,2,6,4,5; a,b,c,d,e;

    It's pretty difficult to do the same thing with row-based DBMS. However, you can see that doing an insert is going to be costly.. This looks like a pretty good try, I know there were some other projects going to try to replicate what BigTable does. And after hearing that IBM story the other day about one computer running the entire internet, I started thinking about Google.

    More interesting is their distributed file system, which is what makes this really work well.

    --
    Cool! Amazing Toys.
    1. Re:Column Orientated DBMS by SystematicPsycho · · Score: 2, Informative

      You can do all those things in Orace -

      http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/dimen.htm#i1006266

      Distributed filesystem - Oracle RAC (Real Application Clusters) fits the bill.

      --
      Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold
  4. I don't think so... by warrax_666 · · Score: 2, Funny

    A cube using the time dimension looks more like this.

    --
    HAND.
  5. Re:You mean, like really, this time by drinkypoo · · Score: 2, Informative

    Really, this time, a full fucking beowulf cluster (that runs linux!) is available to /.ers. No. Fucking. Way.

    What?

    There is no particular piece of software that defines a cluster as a Beowulf. Commonly used parallel processing libraries include MPI (Message Passing Interface) and PVM (Parallel Virtual Machine). Both of these permit the programmer to divide a task among a group of networked computers, and recollect the results of processing.

    "Beowulf (computing)." Wikipedia, The Free Encyclopedia. 28 Jan 2008, 12:25 UTC. Wikimedia Foundation, Inc. 9 Feb 2008 <http://en.wikipedia.org/w/index.php?title=Beowulf_%28computing%29&oldid=187453674>.

    Wikipedia lists no less than eight Linux distributions designed specifically for building Beowulf clusters.

    Using OpenMosix, a single-system-image cluster can be created by booting cluster nodes with LiveCDs and with very little configuration. It's even been done with Xboxes, although they have very poor performance per watt consumed by modern standards.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  6. Google 'Forms' by webword · · Score: 2, Informative

    I think Google Forms is more interesting. (Based on Google Spreadsheets.)

  7. Wheel: reinvented by stonecypher · · Score: 2, Insightful

    Mnesia has been able to handle things far in excess of the numbers cited, and with far better control of placement, for more than a decade. So has KDB. Also Coral8. This wouldn't even be on the map if people didn't start drooling the second they heard "based on Google." When they find out it's unstable and in alpha?

    Yawn.

    --
    StoneCypher is Full of BS
    1. Re:Wheel: reinvented by stonecypher · · Score: 2, Informative

      Mnesia is mostly a DHT for key-value pair lookups while hypertabe/bigtable support efficient primary key sorted range scans.
      Pretty much every database on earth has key sorted ranges. Please be less of a noob. Go look up ondex_match_object.

      For concurrent read/write/update, Mnesia requires explicit locking
      No, it doesn't. It offers explicit locking, because it's been proven for decades that without it, you cannot have hard realtime queries, something that mnesia wanted to offer. You don't have to use that stuff at all; it's just there in case you need it.

      Hypertable/bigtable doesn't need explicit locking for that, consistency and isolation is achieved through data versioning.
      Every distributed ACID database, by definition, uses a local sixth form implementation under the hood; there is no other way to provide query durability. All given examples do this.

      The most interesting feature here is time/history versioning for all the data, and efficient compression for such data.
      Yeah, I remember the first time I read about that too, back when I was learning Oracle in the early 90s. Just because it's interesting to you doesn't mean it isn't implemented in other databases already.

      Hypertable/bigtable is mostly for online analytics and storage of many versions of the entire web, Mnesia was built to support real-time lookups and data management for telecom apps tightly coupled with Erlang.
      The real purpose of mnesia is distributed durability, so you're aware. They actually discuss it in the documentation which you obviously have never read. That said, I don't really care what the software is for: Mnesia can handle bigger datasets over more nodes in realtime, offers the user better control of which data is on which node, and has a much more flexible locational querying system. There's nothing you can do in Bigtable that you can't do in Mnesia, and the reverse is most certainly not true.

      So, unless SQL is particularly important to you, this is a useless project. There's a reason Google's moving to Erlang so fast - they're discovering that a lot of the tools they've half-assed reinvented in Python already exist in Erlang in far more flexible fashions. This is nothing more than another map/reduce fiasco - a first generation solution to a problem that the internet adores because it's never seen any solution to the problem, but something which has been far better addressed in real industry for thirty or so years. If google would just quit stealing people from Microsoft, who makes application and system software, and start stealing people from AT&T and Ericsson, who make hard realtime system software, they'd find they wouldn't have to spend so much time poorly re-walking what's already been pathed.

      If Google would just buy Bluetail already, things would start changing for the better, fast.

      If you say Mnesia is a wheel, Hypertable/Bigtable would be a floating system for a hovercraft or maglev train.
      Metaphors are only useful when they elucidate something specific. Mnesia is radically more powerful than hypertable; I suggest you spend less time at the altar and more at the library. Or, to put it in terms that apparently you will understand, you just tried to rub in my face how much more powerful your Geo is than my Technodrome.

      You have done such a spectacularly poor job of making your case that all I can imagine as your reason to say something like that is:
      1. You think mnesia doesn't have indices
      2. You think Mnesia is manually locked
      3. You think Mnesia isn't versionned
      4. You think Bigtable can handle more physical storage than Mnesia
      Of those, not only are you wrong on every count, but only the last is in any way something that someone who knows even the basics about distributed databases would even begin to consider. Doesn't support indices? Are you nuts? You really think there's a database that can't sort its contents?

      Unbelievable.
      --
      StoneCypher is Full of BS