vicaya · Slashdot Mirror

Re:Oblig ReiserFS Joke on Hans Reiser Guilty of First Degree Murder · 2008-04-28 20:38 · Score: 2, Funny

He tried to rollback his commit but end up leaving more history in the logs.

Re:Google 'Forms' on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-09 16:36 · Score: 1

Do you know Google Forms are built upon Bigtable?

Re:Wheel: reinvented on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-09 16:33 · Score: 1

Citations needed.

All my cited data can be found at http://research.google.com/pubs/papers.html

Tell me how to store petabytes in a 4GB Table because Erlang dets use 32 bit file offset?

Storing petabytes small key value pairs on a DHT a la Mnesia is trivial. Sustained ordered on disk scanning at hundreds MB/s per core is not.

Re:Column Orientated DBMS on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-09 16:16 · Score: 1

The Google system group guys are definitely not kids. Neither are Hypertable developers, who have actually built and deployed web scale search engines themselves. How many people on the slashdot can say that? We're well aware of the literatures in the area (RTFP to find out) and continuously learning from peers. Both Bigtable and Hypertable built upon previous solutions to solve real world web scale data problems. Many algorithms used in Bigtable/Hypertable appeared in literature only in the late 90s+, claiming the technology is 30 years old is like claiming a new car technology is hundreds of years old because horse carriages existed back then.

The locality group in Hypertable/Bigtable allows you to control/tradeoff the I/O performance between column and row layout without changing the query interface. There are some expensive commercial solutions that claim auto tuning in that regard. However, it is simply waste of resources in many cases.

The scale of actual Bigtable deployment is unprecedented.

Re:Column Orientated DBMS on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-09 15:43 · Score: 1

RTFP?

Re:Wheel: reinvented on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-09 13:43 · Score: 1

Thanks for the playing :) Lets see:

There's a reason Google's moving to Erlang so fast - they're discovering that a lot of the tools they've half-assed reinvented in Python already exist in Erlang in far more flexible fashions. This is nothing more than another map/reduce fiasco - a first generation solution to a problem that the internet adores because it's never seen any solution to the problem, but something which has been far better addressed in real industry for thirty or so years. If google would just quit stealing people from Microsoft, who makes application and system software, and start stealing people from AT&T and Ericsson, who make hard realtime system software, they'd find they wouldn't have to spend so much time poorly re-walking what's already been pathed

Profound ignorance! Google is evaluating ejabberd/erlang for messaging services/framework. And you claim they're moving to Erlang!? Do you know what percentage of code of Google is written in python? Your criticism of map-reduce is so laughable, it's not worth arguing.

Mnesia can handle bigger datasets over more nodes in realtime, offers the user better control of which data is on which node, and has a much more flexible locational querying system. There's nothing you can do in Bigtable that you can't do in Mnesia, and the reverse is most certainly not true.

According the Mnesia FAQ, Mnesia is mostly an in-memory DB and ill suited for large on disk data. The largest table size is 4GB. The largest number of records in a table I've seen so far is on par with 100 million records, which is not even on the same magnitude of Hypertable/Bigtable, which is designed to routinely hold trillions of records and petabytes of data in ONE table. Fragmented table in Mnesia is an afterthought and a ugly hack.

You have done such a spectacularly poor job of making your case that all I can imagine as your reason to say something like that is:

1. You think mnesia doesn't have indices

2. You think Mnesia is manually locked

3. You think Mnesia isn't versionned

4. You think Bigtable can handle more physical storage than Mnesia

Sorry, none of the assumptions (that I had these notions) you mentioned you is true. By versioning I meant explicit version control besides the purpose of MVCC. Hypertable/Bigtable allows you to explicitly keep n number of versions data easily for historical analysis with a simple clause in the schema. By scanning, I meant efficient on disk scanning for data too large to fit in to memory. Indices is useless for that (think about the seeks). Do you know Bigtable/Hypertable doesn't typically use separate indices? Do you know what locality group is for? Can Mnesia compress data to 1/10th of the original size at 100MB/s per core? Erlang is too slow to implement any compression algorithm that worth its salt. Foreign language interface marshaling overhead per invocation in Erlang is unbelievable.

Of those, not only are you wrong on every count, but only the last is in any way something that someone who knows even the basics about distributed databases would even begin to consider. Doesn't support indices? Are you nuts? You really think there's a database that can't sort its contents?

Did I say Mnesia doesn't support indices? You're so blind that you're projecting ignorance.

Unbelievable

Indeed, I was not even criticizing Mnesia, it's a fine distributed in-memory oriented db great for its purpose, but driving an imaginary technodrome while laughing at a real bullet train is one of the funniest things I've read so far on slashdot.

Re:Don't forget HBase on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-08 18:05 · Score: 1

Hypertable can run on a variety of DFS' that support global namespace. Currently it can run on HDFS, KFS (GFS clone from Kosmix) and any DFS with a POSIX compliant mounting point, include GlusterFS, Lustre and Parallel NFS, GPFS etc. An S3 DfsBroker can be made easily as well.

Besides DFS flexibility and not-java, Hypertable supports access group (locality group in Bigtable) unlike HBase, where you have to resort to column family hacks for read performance tuning. Hypertable also have more block compression options (quicklz for fastest compression, lzo for fastest decompression and bmz (similar to the BMDiff and Zippy mentioned in Bigtable paper) for fast multiversion data compression/decompression). More compression options can be added trivially.

Communication/IO layer of Hypertable is fully asynchronous to achieve throughput not yet possible by HBase.

(Full disclosure - I'm a Hypertable developer :)

Re:Wheel: reinvented on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-08 16:16 · Score: 1

Sorry, you couldn't be more wrong. Mnesia, KDB and Coral8 and Hypertable/Bigtable are completely different beasts for different purposes. Mnesia is mostly a DHT for key-value pair lookups while hypertabe/bigtable support efficient primary key sorted range scans. For concurrent read/write/update, Mnesia requires explicit locking. Hypertable/bigtable doesn't need explicit locking for that, consistency and isolation is achieved through data versioning. The most interesting feature here is time/history versioning for all the data, and efficient compression for such data.

Hypertable/bigtable is mostly for online analytics and storage of many versions of the entire web, Mnesia was built to support real-time lookups and data management for telecom apps tightly coupled with Erlang.

If you say Mnesia is a wheel, Hypertable/Bigtable would be a floating system for a hovercraft or maglev train.

Re:Column Orientated DBMS on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-08 15:34 · Score: 1

Sorry, it's column-oriented but not really "classic". It supports ID part of ACID (which is possible but not implemented) very well, due to its builtin data versioning. It's optimized for both reads (random or sorted sequential scan) and very high random writes as you don't have to lock anything. The scalability and fault tolerance part is not classic either. BTW, google search queries do NOT use Bigtable. It's used for storing all the crawl data and input/output for their Map-reduce framework to build search indices that serve the real queries. It's also used for serving processed data to Google Reader, Map and Analytics etc.

Re:how useful is DHT? on Zvents Releases Open Source Cluster Database Based on Google · 2008-02-08 15:20 · Score: 1

Hypertable is not a DHT. DHT is mostly useful for large amount of relatively small key value pairs. Hypertable like Bigtable use a metadata table to track the tablets (ranges in Hypertable term) Table automatically splits when they grow. A master server assigns the splitted tablets/ranges to appropriate tablet/range servers. Hypertable can be made to support transactions, as it has builtin versioning of data. As a result, Hypertable/Bigtable is more versatile than DHT.

Re:Irritating. on Schneier Mulls Psychology of Security · 2007-02-07 11:17 · Score: 1

It irritates me to no end, when somebody just brand something they don't understand with "theory" or "theoretical". Just like the ID folks who like to call evolution a "theory". This time, it's neither "theoretical" nor psychology. It has much to do with "experimental", biophysics, biochemistry, physiology, neurology and neuroscience in general. May I recommend that you read "Principles of Neural Science" by Kandel et. al, which is often assigned as a textbook for many undergraduate and graduate neuroscience and neurobiology courses. The book attempts to at least introduce every aspect of the modern understanding of the biology of the brain.

Re:This reminds me of Ghost in the Shell. on VMware, XenSource Join Forces For Linux · 2006-08-12 09:56 · Score: 1

Sorry pal, you misspelled f**k. Processes can't have children until they f**k.

But it had to be said: on Use Google Earth To Track Santa · 2005-12-25 13:34 · Score: 1

imagine a beowulf cluster of santa claus'!

Yahoo never said that it would on Yahoo Helps Jail Chinese Writer · 2005-09-07 04:53 · Score: 1

"do no evil".

The Search Engine Size Game on Yahoo Passes Google in Total Items Searched · 2005-08-08 20:22 · Score: 3, Informative

For popular search terms (queries with millions of hits) index size doesn't matter much. Yahoo, google, ask, msn etc all produce pretty similar results (that tend to favor established sites/pages.) For rare terms or combinations, which contribute to the Long Tail of web search, index size is very important. Both Yahoo and Google report estimated (often inflated) hits for popular terms and exact numbers for rare terms, which still include dups. You need to go to the last result page to find out the exact non-dup number, which sometimes can shrink the de-dup'ed hits by a factor of 10. Let's see how the new yahoo fairs against google with a few queries I picked randomly:

"Acid Brass" stockport - yahoo:20 google:24
"anetan district" - yahoo:17 google:15
"chunder blunder" - yahoo:25 google:27
"information theoretical death" - yahoo:45 google:46
kliningan juru - yahoo:27 google:47
"phylogenetic organisms" - yahoo:5 google:10
zibelthiurdos thrace - yahoo:9 google:4

Yahoo used to consistently underperform google on rare terms, it seems they indeed have caught up. But it has NOT really exceeded google in terms of useful size (Yahoo has more dups.) Still, it's a worthy engineering effort. Congrats!

No site should ever store passwords on Cisco Warns of Stolen Web Site Passwords · 2005-08-03 15:25 · Score: 3, Insightful

It's appalling that a major company (a major tech company with security product offerings in this case!) website would store passwords in cleartext. Passwords (even usernames) should always be stored in strong one-way hashes like sha-1, so that even if they're stolen, they're close to useless.

I'd rather make love but on 100 Years of Special Relativity · 2005-06-30 06:22 · Score: 1

make: target not found. Stop.

Someday? maybe not in your life time. on MD5 To Be Considered Harmful Someday · 2004-12-07 22:01 · Score: 1

If you read the original paper, you'll find that it's simply not true that you could easily find a collision to a given hash. Wang et al's contribution is a more efficient algorithm than the canonical birthday attack to find random collisions in a family of hash functions (SHA-* and MD5 are the same family), while this guy's contribution seems to be only confusion. For the attack to be useful, you'll need to find a easy way to create hash collision to a given file.

28bc8c78881b2f89bbeab4f9bb8fbeda is the md5 of "clue", I can bet my hash farm and a hash laced brownie that you'll never find anything other than "clue" that hashes to the same md5, at least not "someday" in your life time.

Slashdot Mirror

User: vicaya

Comments · 18