Is It Time For NoSQL 2.0?
New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT.
HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?"
Er...
Um...
Yeah...
I use Riak in production and, while it is notably-slow, I appreciate its fault-tolerance. I wonder if HyperDex is just as resilient? Also, Riak 1.1 just came out, finally adding management and diagnostics. Secondary indexing may not be as fast as HyperDex, but, depending on the design, "crashproof-ness" can come out the winner.
From the paper:
Efficient lookup of fully-specified objects is critical to
object insertion and deletion performance, and requires
a deterministic object to node mapping. Much like in
ring-based key-value stores[3, 17, 38], HyperDex maps
both object coordinates and nodes to the same hyperspace.
Specifically, HyperDex tessellates the hyperspace
into a grid of N regions in space. Zones, which we previously
defined to be mutually exclusive regions belonging
to one or more nodes, are created by assigning nodes
to each of the tessellated regions to be responsible for
all objects which hash to a coordinate within the region.
The zone mapping is disseminated to all clients which
may operate directly on the mapping without any routing
between server nodes.
This sounds like the old Berkeley DB/Sleepy Cat software.
Key/Value pairs instead of relational stuff. Worked with a product years ago that was built on Berkeley -- offered some pretty useful features that simply didn't map to object-relational stuff.
For some applications, you really do need something that works a little differently than an RDB ... however, there's still loads of things I can't imagine trying to do without one.
Choice is good in technology.
Lost at C:>. Found at C.
Er...
Um...
Why not learn both, and use whichever's strengths suit the application the best?
You do not have a moral or legal right to do absolutely anything you want.
http://www.youtube.com/watch?v=URJeuxI7kHo
is the best introduction to this subject I've seen. Until someone can explain the pros of hyperdex with a funny video featuring cute animals I'm sticking with technology that's been tested more thoroughly.
http://rareformnewmedia.com/
The hashing system is pretty neat. The idea that you could get at records without their specific key via search criterion is astounding.
In the future more advanced hashing systems will allow NoSQL databases to extract a set of records all containing a similar subset of data without keys at all!
Of course we'd need a name for the sections that are matching. Perhaps "Columns", yeah, then each result returned could be called a "Row", makes sense. I bet you could then create even more complex matching patterns for multiple "Columns" against each record in the data-set. If only there was a language to describe query we're sending to the servers... Oh! Server Query Language!
I can't wait to use SQL with NoSQL 3.0!
Many of the key-value pair DBs supply a Perl library that let you tie a Perl hash (%Variable) to the DB directly, giving you persistent hashes.
Makes database storage virtually a native feature of the language. Anybody who uses Perl is probably already a hash buff, so it is a win-win if you and your app already use Perl.
Disclaimer: I run a 10yo web "app" (Perl/CGI/Apache), so I'm a bit biased. But, the thing is rock-solid, so I'm not going to be too apologetic.
"Flame away, I wear asbestos underwear"
So, at what point do we all admit that a NoSQL database is basically a glorified file-system over a network and start calling it a file-system again?
Use NoNoSQL like me! Otherwise known as SQL.
Isn't that what XML is for? XML files are also compatible across systems.
This is a type of index, not a type of database. See locally sensitive hashing. It's an efficient way to find keys which are "near" the search key in some sense.
Such a mechanism could be provided in a key/value store or an SQL database. It's even possible to do it on top of an SQL database. It's more powerful in a database that can do joins, because you can ask questions with several approximate keys.
This is an area of active research. Many machine-learning algorithms are scaled up by locally sensitive hashing, so they can work on big data.
Who needs to be afraid of ACTA, SOPA and PIPA if his database is gone after a reboot?
This is retarded like everything I read on Slashdot. I come here for the retarded bullshit.
http://www.youtube.com/watch?v=URJeuxI7kHo
You know what's better than NoSQL? - modern distributed relational SQL implementations like Clustrix http://www.clustrix.com/ and Volt http://en.wikipedia.org/wiki/VoltDB to mention two.
Because NoSQL wasn't hipster enough.
Kill all hipsters.
until I noticed that there seems to be a single point of failure in this system. from the site:
The HyperDex coordinator maintains the "hyperspace." This involves making sure that servers are up, detecting failed or slow nodes, taking them out of the system, and replacing them where necessary. The coordinator maintains a critical data structure, the hyperspace map, that establishes the mapping between the hyperspace and servers. Clients use this map to locate the servers they need to contact, while servers use it to perform object propagation and replication to achieve the application's desired goals.
How can people call a system "fault tolerant" and "distributed" when it might as well be running off a single box?
Just append your keys together with a joining symbol. The progammer should be able to produce his own hash in
how ever many dimensions, better than anything predefined in the database code.
NoSQL2.0...
Read "NoSequel... the sequel"
So, if the Sequel to NoSQL does exist, then it is itself a paradox.
Facebook, Google and friends wouldn't need such databases if they respected privacy, solve the privacy issues and a MyISAM will be enough to everyone. And for the marketing, just send pregnancy coupons to everyone, youll get em.
In using Oracle RDBMS, I see that for very large data set queries, using a hash join causes lots of disk activity (lots of paging, going to swap) Though hash functions are fast, this performance is from scanning though a hash table that's fully mapped in memory. Once your hash table gets too big for the available memory, you start using disk space (unindexed, sequential full reads) Isn't this a bottleneck in a distributed database that relies on hash functions? Wouldn't you want to have a distributed DB based on a distributed version of a B-Tree descendant (B+Tree, B*Tree,B**Tree) that would use memory AND storage and scale out more than just the available memory on all your nodes? Not only that, but you'd likely have better performance on range scans. Just thinking...
Whenever a /. headline asks a question, the answer is always No.
Which, iirc, is SORT of how modern filesystems work (barring IBM's DB/2 driven one), but that's GREAT FOR READS (not so great for writes iirc). ISAM/VSAM also use hashes.
* Lastly/iirc: Some filesystems are even based on the ISAM/VSAM design too, but ones like NTFS use binary search pattern methods.
APK
P.S.=> The use of Hashes also makes it different than most RDBMS (those based on SQL usage), because they're based on binary trees iirc as well...
... apk
"Key-value stores have been supplementing traditional databases..."
Well, the most successful NoSQL product in history is the Windows Registry, which is like one giant Java properties file. Not sure if that's an argument for or against NoSQL.
Hand it over to Mozilla. We could then have NoSQL 5.0 by the end of summer.
Why does a new product operating in the very same space as other keyvalue stores warrant an increment of the buzzword version number?
Mumps was NoSql before NoSql was cool: MUMPS and NoSql
Disclaimer: my only interaction with MUMPS has been via thedailywtf: A Case of the MUMPS
-- The Genesis project? What's that?
I didn't RTFA but are they trying to reinvent IMS?
Try it! Library of Babel
How about consistency? Does this database even support the notion of transactions?
If Pandora's box is destined to be opened, *I* want to be the one to open it.
For C, C++, and Python only???
If it's Web Scale I'll use it.
Cassandra which is optimized for low-latency writes.
See, it's all about which tradeoffs you choose. Whodathunkit?
(Not that I'm advocating a NoSQL-is-fast mindset, mind you. You have to tailor your choices to the situation at hand.)
CouchDB has native map-reduce indexing of arbitrary fields of the stored data. Doesn't appear to be anything new here in that regard.
Be careful. People in masks cannot be trusted.
http://www.mongodb-is-web-scale.com/
Oracle Coherence (formerly Tangosol) is a distributed cache with queries too. It has existed for many years and does exactly this (and more).
I don't think this is anything new.
Against