Slashdot Mirror


Is It Time For NoSQL 2.0?

New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT. HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?"

164 comments

  1. No SQL, Know SQL by PortHaven · · Score: 1, Funny

    Er...

    Um...

    Yeah...

    1. Re:No SQL, Know SQL by Anonymous Coward · · Score: 0

      Web 2.0 running on the cloud serving up HTML5 over HTTP2.0 with a NoSQL2.0 backend.... Dec 2012 really is the end.

    2. Re:No SQL, Know SQL by Sulphur · · Score: 1

      Er...

      Um...

      Yeah...

      SQL is pronounced sequel. NoSQL 2.0 is the sequel to NoSQL 1.0?

  2. No mention of Riak? by platypusfriend · · Score: 1

    I use Riak in production and, while it is notably-slow, I appreciate its fault-tolerance. I wonder if HyperDex is just as resilient? Also, Riak 1.1 just came out, finally adding management and diagnostics. Secondary indexing may not be as fast as HyperDex, but, depending on the design, "crashproof-ness" can come out the winner.

    1. Re:No mention of Riak? by Anonymous Coward · · Score: 1

      Seems to be part of the focus of the research:

      Per the abstract:

      "Additionally, HyperDex achieves high performance
      for simple get/put operations compared to current
      state-of-the-art key-value stores, with stronger faulttolerance
      and comparable scalability properties."

      From the intro:

      HyperDex ensures that all data in the
      system is replicated to provide a desired level of faulttolerance.
      The system employs a value-dependent replication
      technique to maintain the desired level of fault
      tolerance while providing strong consistency guarantees.
      Value-dependent replication ensures a desired number of
      copies, provides flexibility in assigning replica nodes to
      different regions in hyperspace, and enables replica sets
      to be selected from non-fate-sharing nodes. Unlike fixed
      replication techniques, the replica sets are dynamically
      constructed on the fly depending on the contents of the
      object fields as well as the previous contents of the object,
      enabling efficient search operations and strong consistency
      semantics even in the presence of concurrent updates
      and crash failures.

      '
      emph. mine. Not only do they claim better fault tolerance overall, but also -- to me at least -- this paragraph makes it sound like you can scale your fault tolerance based upon the region of hyperspace (ie type of data), which could be very useful if you're in a situation where data has different levels of importance.

    2. Re:No mention of Riak? by viperidaenz · · Score: 1

      Hyperspace?

  3. How It Works by Anonymous Coward · · Score: 1

    From the paper:

    Efficient lookup of fully-specified objects is critical to
    object insertion and deletion performance, and requires
    a deterministic object to node mapping. Much like in
    ring-based key-value stores[3, 17, 38], HyperDex maps
    both object coordinates and nodes to the same hyperspace.
    Specifically, HyperDex tessellates the hyperspace
    into a grid of N regions in space. Zones, which we previously
    defined to be mutually exclusive regions belonging
    to one or more nodes, are created by assigning nodes
    to each of the tessellated regions to be responsible for
    all objects which hash to a coordinate within the region.
    The zone mapping is disseminated to all clients which
    may operate directly on the mapping without any routing
    between server nodes.

  4. Berkeley DB? by gstoddart · · Score: 4, Insightful

    This sounds like the old Berkeley DB/Sleepy Cat software.

    Key/Value pairs instead of relational stuff. Worked with a product years ago that was built on Berkeley -- offered some pretty useful features that simply didn't map to object-relational stuff.

    For some applications, you really do need something that works a little differently than an RDB ... however, there's still loads of things I can't imagine trying to do without one.

    Choice is good in technology.

    --
    Lost at C:>. Found at C.
    1. Re:Berkeley DB? by Anonymous Coward · · Score: 0

      Yes the NoSQL options are quite similar to BerkeleyDB.

      The main difference is that there is a network-enabled server running on-top of your DB, and instead of putting all your data in one file on one machine you can have many servers handling requests from thousands of clients.

      But other than that - yes - it's almost exactly the same.

    2. Re:Berkeley DB? by Anonymous Coward · · Score: 0

      however, there's still loads of things I can't imagine trying to do without one.

      Do you have any experience doing functional programming? If not, I would pick up a book on SML or Haskell or Scala or -- the paradigm is fundamentally different from OO (programming's equivalent of RDB) and maps very well to key-value data stores. More efficient implementations also tend to be the more natural implementations as well.

    3. Re:Berkeley DB? by unholy1 · · Score: 5, Informative
    4. Re:Berkeley DB? by oneiros27 · · Score: 1

      Oh ... so it's more like an LDAP server, instead.

      Wasn't OpenLDAP written on top of BDB?

      (Although I don't know how well OpenLDAP handled replication -- the 'many servers' part ... I've only administered the Netscape/SunOne LDAP servers ... which are also key/value stores)

      --
      Build it, and they will come^Hplain.
    5. Re:Berkeley DB? by moderatorrater · · Score: 1

      I don't know how well OpenLDAP handled replication

      That's half of the point of NoSQL, or at least Mongo. The point is to have very large data sets that can be accessed quickly and reliably (but not necessarily consistently). Mongo does that in two ways: by simplifying the data store significantly and by providing fast and easy replication and sharding. It's usually as simple as designating which group the server belongs to and then letting mongo take care of the rest.

    6. Re:Berkeley DB? by jd · · Score: 1

      There are permutations of the concepts in database theory that have not been tried, at least not that I know of, but NoSQL and this NoSQL 2.0 idea are not amongst them.

      A short, and not comprehensive, list of underlying database structures
      Another short list, describing a few others
      A short list of some file organization methods (ie: access methods)

      A truly novel combination of existing structures and access methods would be worthy of a new name. A truly novel form of structure or file organization - assuming it is actually useful - is worthy of not just a new name but a new name writ large on the front page. A repackaged version of an existing combination might want a new marketing name, but it should be recognized that that is all it is - a name to sell by.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    7. Re:Berkeley DB? by medcalf · · Score: 1

      Dead on. And I'm currently building an ecommerce site on openldap. It's way better than it used to be. In particular, I'd never use it in the past because slurpd stank. Now that that's gone their replication is fast and solid. And yeah, NoSQL is basically a poor reimplementation of well tuned LDAP.

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    8. Re:Berkeley DB? by element-o.p. · · Score: 1

      Although I don't know how well OpenLDAP handled replication -- the 'many servers' part ...

      OpenLDAP handled replication in two different ways. Older OpenLDAP servers used a separate daemon ("slurpd") to handle replication. IME, it worked pretty well. New OpenLDAP servers...well, it's pretty much just voodoo*, but it seems to work, too <shrug>

      *Okay, it's not really voodoo, but I haven't spent the time to figure it all out yet. I believe it's more a network of peers than the older master/slave server configuration, but I don't completely understand all the details of how they communicate updates with each other.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    9. Re:Berkeley DB? by WaffleMonster · · Score: 1

      Dead on. And I'm currently building an ecommerce site on openldap. It's way better than it used to be. In particular, I'd never use it in the past because slurpd stank. Now that that's gone their replication is fast and solid. And yeah, NoSQL is basically a poor reimplementation of well tuned LDAP.

      OpenLDAP is not the directory server you seek.. Switch to 389 you will.

    10. Re:Berkeley DB? by medcalf · · Score: 1

      If you'd asked me before I started this project, I'd have said Sun. OpenLDAP used to be ... not as useful as other solutions, let's just say. But really, recent versions have made some significant improvements in performance and reliability. I can't be nearly as dismissive of it as I used to be.

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
  5. Why not both? by Sarten-X · · Score: 3, Interesting

    Er...

    Um...

    Why not learn both, and use whichever's strengths suit the application the best?

    --
    You do not have a moral or legal right to do absolutely anything you want.
    1. Re:Why not both? by Anonymous Coward · · Score: 0

      And don't forget that there are multi-dimentional array database engine (so called "NoSQL") out there that also support SQL, so you don't even have to choose between the two if you need or want both.

    2. Re:Why not both? by LostCluster · · Score: 2, Insightful

      Can somebody explain how this NoSQL stuff works? It's a database without SQL, so what replaces it? Is this just the difference between BASIC and C being expanded.

    3. Re:Why not both? by Sarten-X · · Score: 5, Informative

      NoSQL is a terrible misnomer, in that the difference is far more than just "doesn't use SQL", and there are NoSQL systems that do actually support SQL. It's really just referring to data storage systems that aren't based on relations. That change in paradigm has its advantages (speed (in some cases), scalability, and flexibility) and disadvantages (speed (in some cases), lack of consistency, less restriction on bad programming). Of course, each NoSQL system tries to mitigate the disadvantages, and each RDBMS tries to prove itself better than all of NoSQL's advantages. It's a big fun party involving lots of mud-slinging.

      Most NoSQL systems I've worked with are distributed hash tables, in a basic sense. Each value has a key, and that key determines where it's stored on a cluster. Values are not tied to any other values, so things like "foreign-key relations" are silly in a discussion of NoSQL. Rather, the algorithm to retrieve the data does all of the processing to connect data, using massive parallelization across a cluster to handle huge amounts of data at once.

      With a traditional RDBMS, the application must fit its data to the schema completely before any data can be stored. This, of course, means that all data in the database can be assumed to be complete. You won't find references that don't exist, which makes queries straightforward.

      With NoSQL, the database is treated as a more flexible bucket. Data is dumped in with a key, with little concern for fitting the design of the application's model. This, of course, means a bit more planning at design time, but the data can be arranged to better fit whatever it actually represents. Some details are present, and some aren't, but that's okay. The retrieval algorithm (typically a MapReduce program) should check for the existence of whatever data it needs, and handle errors accordingly. Those MapReduce programs are far more complicated than a simple SQL query, but the database's backend is conceptually simpler as an abstract key/value store. Key/value stores have been around for decades, and studied extensively. They can be made more fault-tolerant and scalable than RDBMS shards, but lack the support for large set-based comparisons.

      The comparison to the BASIC-vs-C battles is appropriate. Both BASIC and C serve their purposes well (education and system programming, respectively), but neither should be used where the other is better suited. NoSQL and RDBMSs also both have their places.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    4. Re:Why not both? by Anonymous Coward · · Score: 0

      Because in the black-and-white duality that is contemporary geekspace "There can be only One" and the others are, you know, evil and stupid or at least very bad.
      Kinda silly, yes.

    5. Re:Why not both? by w_dragon · · Score: 5, Funny

      Thank you for a concise summary of the difference between the 2 systems. I must say though that such informative and level-headed comparisons have no place in a slashdot discussion ;)

    6. Re:Why not both? by larry+bagina · · Score: 2
      SQL = Structured Query Language. NoSQL = key/value stores. With SQL, you have a query, the database parses, plans, and executes it. With NoSQL, you have a key (string, number, etc). The database hashes it and finds the previously stored value.

      It's dbm or perl tied hashes, updated for the cloud.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    7. Re:Why not both? by Anonymous Coward · · Score: 2, Insightful

      Perhaps I'm just in an old state-of-mind, but what good is data without relations? I don't mean that as a gripe about the system, just, how would one ever pull members of a group, or messages belonging to a user, etc? I guess I don't understand how that's more efficient.

    8. Re:Why not both? by justforgetme · · Score: 5, Funny

      (speed (in some cases), scalability, and flexibility) and disadvantages (speed (in some cases), lack of consistency, less restriction on bad programming).

      You have a background in Lisp right?

      --
      -- no sig today
    9. Re:Why not both? by LostCluster · · Score: 2

      So that's good at finding the record if you already know the key, but there's no help in finding a record if you don't know the key, or getting a count of records with the same attribute attached... SQL for the win.

    10. Re:Why not both? by Anonymous Coward · · Score: 0

      SQL = Structured Query Language. NoSQL = key/value stores. With SQL, you have a query, the database parses, plans, and executes it. With NoSQL, you have a key (string, number, etc). The database hashes it and finds the previously stored value.

      It's dbm or perl tied hashes, updated for the cloud.

      this is not correct, yes some noSQL stores work like this, but some don't - look at mongoDB

      noSQL = not only SQL are stores which do not follow some of the traditional RDBMS, they can (dont have to) follow ACID, they can (dont have to) use SQL etc...

      simply say, these stores give up some RDBMS features in order to gain something else like speed, replication, flexible schema, ...

    11. Re:Why not both? by jellomizer · · Score: 1

      You broke the first rule of Slashdot.
      Though Shall not embrace two different methodologies.

      You shall stick with one methodology until forced to change. When you do change you much embrace this with all your heart.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    12. Re:Why not both? by Massacrifice · · Score: 2

      I guess he's saying that with NoSQL the relations are done at application level rather than database level. You still have the equivalent of schema and queries, but they are managed by the code, not the DB engine.

      --
      -- Home is where you eat your heart out.
    13. Re:Why not both? by Sarten-X · · Score: 4, Funny

      (not (know (I) (meaning (you))))

      --
      You do not have a moral or legal right to do absolutely anything you want.
    14. Re:Why not both? by gman003 · · Score: 2

      Seriously? No Lisp (or lisp-like) hacker (in the classical, "neat hack" sense (as opposed to the Faux-News "hackers stealing all your identities!" (or even the related but distinct "haxxor"))) worth his salt would be caught dead using only two nested parens. Real LISP Hackers (see previous nested comments) use at least ((2 * n)^(log n)) nested parentheses.

    15. Re:Why not both? by Anonymous Coward · · Score: 4, Informative

      So that's good at finding the record if you already know the key, but there's no help in finding a record if you don't know the key

      Most NoSQL databases have indexes, and the indexes can be searched to find the key(s) you need. As an example, straight from the MongoDB examples:

      db.things.find({colors : {$ne : "red"}})
      {"_id": ObjectId("4dc9acea045bbf04348f9691"), "colors": ["blue","black"]}

      In other words "find all the objects which do not contain 'red' in the field 'colors'". The ObjectId that is returned happens to be the key.

    16. Re:Why not both? by Wee · · Score: 2

      So that's good at finding the record if you already know the key, but there's no help in finding a record if you don't know the key, or getting a count of records with the same attribute attached... SQL for the win.

      This isn't totally true. In MongoDB, for example, you don't even really have to think about the "primary key" for every document. Many times I don't know it or even care to. If you wants to look up customers in by name, you'd index the last_name and first_name fields and then do your query like so:

      db.users.find({last_name : 'Cluster', first_name : 'Lost'})

      Since there's a compound index on those two keys, the key/values being looked up are those. That will return everything in that document which matches that name. A count is done by replacing the find() method call with the count() method call.

      You get a lot of flexibility. Let's say that for the above some users had an avatar. Then for those who have one, I just save it with their stuff. If not, no big deal. But I never have to go back and add a "column", I just save a new document that happens to have an 'avatar' key and there it is right alongside records, and it doesn't matter if some documents have an avatar or not. In fact, I could store binary data, then a shopping list and then a bunch of key/value pairs in the same collection (a collection is analogous to a table).

      It does take a mental shift, but definitely has its uses. And like everything, right tool for the job. SQL for the win in some case, not in others.

      -B

      --

      Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

    17. Re:Why not both? by K.+S.+Kyosuke · · Score: 1

      Can somebody explain how this NoSQL stuff works? It's a database without SQL, so what replaces it? Is this just the difference between BASIC and C being expanded.

      The basic distinguishing feature of RDBMS systems is that they are data-centric, so as to speak. The DB engine takes care of the metadata, access methods, query plan preparation ("SQL compilation") and optimization, transactions, backups, etc. Additionally, the RDBMS can (or is at least supposed to, if it's a proper RDBMS, see Codd's original work) allow you to separate the design of the physical layout of the data from the conceptual design of the data model ("this column is going to be accessed quite a lot, so pretend it's logically in the same table but actually store it physically on this fast new SSD drive, and oh, use this pair of functions to compress it and decompress it during storage") . None of the previous models (network model DBs, hierarchical model DBs etc.) supported this level of abstraction, they were just a thin API over the physical access methods, and I doubt that many of the modern "NoSQL" databases do today.

      The net result is that a relational database system allows you to engineer a somewhat universal data model that, in addition to the original application, can be repurposed with minimized risk of project failure. ("We didn't anticipate the need for reverse pointers from items to orders, now we'll have to convert all our data" stuff etc.) So the abstraction costs you a bit in the performance area, compared to specialized storage, but 1) not much, because set-based data models are quite amenable to optimizatons, and 2) that specialized hyper-fast data storage you developed for your app is likely to be useless for the next app the management is going to ask you to deploy anyway, so why bother. If you have 5 TB of company data, you don't want to convert them for each new app you're going to run over it.

      If your app is really specialized (Google search engine, a fast caching server for rendered web pages, a DNA database, a real-time trading system...) and performance-hungry, then by all means, go for it and write a specialized data store. It won't be a data-centric, but an application-centric one instead.

      --
      Ezekiel 23:20
    18. Re:Why not both? by jd · · Score: 5, Informative

      NoSQL 1.0 is usually not much more than a hash-accessed flat-table database. GDBM, QDBM and BerkeleyDB are all hash-accessed flat-table databases. The refinements mentioned as being added to NoSQL databases (such as searchable indexes) are simply sequential indexes that associate some indexed parameters with the hash value.

      NoSQL generally works by you pushing an item into the database and getting one or more hash values back. You want the item back, you give the database the hash values and you get the item. Object-oriented and object-based NoSQL both work by allowing objects to point to other objects. This gives you inheritance. (Basically you have a hash value that points to another record, where the structure of that other record is fixed rather than chosen at run-time via a join statement.)

      Basically, database theory describes all the various forms of database you can have: flat-file, hierarchical, network, relational, object/relational, relational, semi-structured, associative, entity-attribute-value, transactional and star (aka data warehouse). A description of some of these can be found here.

      This describes how the data is actually laid out, but does NOT necessarily describe how the data is accessed.

      Database theory also describes the following underlying methods of accessing data: sequential, indexed, hash. Any combination of these is permitted, so you can have an index that points into sections of a database that are then searched sequentially for example. Or you can have indexes that point to other indexes that in turn point to a hash value. And so on.

      SQL is just a meta-language that allows you to apply a restricted form of set theory on the underlying access methods. There were arguments at the time SQL appeared that it should allow all of set theory - and those arguments still go on, with some SQL alternatives using actual set theory notation as opposed to SQL notation.

      NoSQL, in some cases, is just direct access to hash tables for directly accessing items. In other cases, it's a lightweight abstraction layer.

      In the example advertised in the summary, an object is referenced through a set of indexes. If you have a partial set of indexes, you reference multiple objects but they will be related in some way. There is nothing X.0 about it, it's just a NoSQL database that uses a network database topology rather than a flat-file topology. It is nothing new.

      I recognize that marketspeak is what sells things, that calling the systems by what they actually are would not be nearly as impressive to managers. Managers do not, as a rule, read Slashdot. Geeks and Nerds read Slashdot. Geeks and Nerds know Database Theory. (Well, if they don't, they damn well should -- either that, or they can use Google to look the terms up.) The two additions to database theory in the past 30 years have been the Object-Relational and Object-Oriented models.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    19. Re:Why not both? by Grishnakh · · Score: 2

      Not all data needs to be related. Look at the history of databases: relational databases are fairly new actually, as they only came around in the 1970s. Before that were hierarchical databases, such as IBM's IMS, which was used on the Apollo program to track all the millions of parts used in that project. Wikipedia has a nice article about it here:
      http://en.wikipedia.org/wiki/IBM_Information_Management_System
      Even though it was first used in the 60s, it's still in widespread use now, and every time you use an ATM machine, you're probably triggering a transaction on an IMS database.

    20. Re:Why not both? by Anonymous Coward · · Score: 1

      Not all data needs to be related.

      Relation as in the mathematical term, not as in relationships. Relations exists at the conceptual level in the model, most RDBMSs use table-like structures but they are not required to. It is actually one of the strength of the relational model to strictly decouple conceptual, logical and physical levels. On paper....

    21. Re:Why not both? by Frnknstn · · Score: 4, Informative

      A hierarchy IS a relationship. In a hierarchical databases, child segments and parent segments were the main kind of relationship used.

      All relational databases did was allow the relationships to be more freely defined.

      Further to that, a key / value pair is also a relationship, in that the key symbolically represents the data. That's why it is correct to call them NoSQL databases: They forgo the complexity of a general query language. In doing so, they also lose the ability to inherently store anything except the most basic relationship: the key / value lookup.

      --
      If it's in you sig, it's in your post.
    22. Re:Why not both? by Anonymous Coward · · Score: 0

      So it's like a RDF triple store, but with pairs instead of triplets?

    23. Re:Why not both? by shentino · · Score: 4, Funny

      Yeth

    24. Re:Why not both? by lysdexia · · Score: 5, Funny

      I concur! Shocking really. Gentlemen! Seize that miscreant's pants!

    25. Re:Why not both? by lysdexia · · Score: 1

      Sthenthino winth the thread. Congratulathonths.

    26. Re:Why not both? by shutdown+-p+now · · Score: 2

      Yes, but where's the requisite car analogy? This is Slashdot, after all.

    27. Re:Why not both? by jonadab · · Score: 1

      Yes. (I have been known to (program in lisp (specifically, elisp (the variant used for customizing Emacs (the world's most feature-complete text editor (and the most feature-complete program of any kind for that matter)))) (from time to time (Why do you ask? (It's not some lame joke about parentheses is it? (Because, that would just be dumb.))))).)

      --
      Cut that out, or I will ship you to Norilsk in a box.
    28. Re:Why not both? by Anonymous Coward · · Score: 0

      Data is dumped in with a key, with little concern for fitting the design of the application's model.

      You probably meant that there is no need for the applications to accommodate to a common schema. Otherwise we have just a bunch of useless silos.

    29. Re:Why not both? by Anonymous Coward · · Score: 0

      Actually, you're dead wrong.

      Database theory, that is, the various models, describes how the data is accessed.

      How it's laid out is irrelevant at this level. That's defined by the implementation, or specifically the optimizer; sequential, indexed, hash.

      There's three problems with _existing_ relational databases.

      One, they're mapping the model directly to disk. So if what you'd need (performance wise) is to have an order of order-items, you need 2 tables, and that's gonna be slow when you need both whereas an order physically containing the items would much faster. Neither option precludes querying order-items, and up to the DB to sort it out, but because relational databases map to model to the disk, which they don't have to, they're much slower for the one use-case you need (and faster for the many you don't). NoSQL databases, on the other hand are going to execute the use case that was coded much faster, because you laid it out, but they're gonna be dreadful for anything else. Just like a SQL DB would with only key-blob tables.

      Second, they're not mapping to the layer above. SQL is an abysmal language that forces you to repeat yourself for anything not completely trivial. Partly because of that (and also because SQL generally sucks) it's much easier to just get the data in and munge it yourself than figure out how to tell the DB to do it. And I'll pass on empty string being the same as null and other idiocies (a pet peeve of mine is the inability to share a constraint, say, unique key, or index -between- tables).

      Third, they're dreadful at refactoring. Ooops, my bad, should have a different table structure. Have to take the whole thing down while we run scripts that will take ages, then, because otherwise running queries will die horribly. Any halfway decent language can update the model while running, be it by way of dynamic libraries, class loaders or whatever. But not DBs.

      Given all that, it's no surprise developer just went f___ this. They might be sorry a few years down the road. But in the meantime they're getting things done, which they wouldn't with your traditional relational DB. And, to reiterate, that's not because the relational model is bad, it's because all the implementations _suck_.

    30. Re:Why not both? by billcopc · · Score: 1

      So it's basically key/value where the value is a serialized freeform array ? So then, if there is no structural integrity at the DB level, it has to be implemented in the application logic ? Doesn't that merely displace the performance bottleneck from the DB to the application ?

      Perhaps I'm not getting the point, but I'd much rather have DB-enforced structural integrity, than have to write all those checks and balances myself for every single app. Computing time is cheap. Development time, not so much.

      --
      -Billco, Fnarg.com
    31. Re:Why not both? by Sulphur · · Score: 1

      Yeth

      Thats a myth. I used to have a myth, but I lossst it.

    32. Re:Why not both? by Anonymous Coward · · Score: 0

      vi > emacs

    33. Re:Why not both? by afabbro · · Score: 2

      There's another piece to the definition. The traditional RDBMS (Oracle, DB2, SQL Server, MySQL, PostgreSQL) is designed to give 100% consistent results. All other design goals are sacrificed so that two people asking the DB the same question at the same time will get the same answer, and no one can make a modification and someone else gets an answer that is not 100% up to date. NoSQL trades consistency for flexibility/simpler scalability.

      If you post something on a social network at 1:00 and your friend in a different timezone looks at 1:00:10 and doesn't see the update, no big deal. If one person authorizes $500 on your credit card at 1:00 and consumes your limit and someone else tries to authorize $300 at 1:00:10 and it goes through because the DBMS isn't giving consistent answers, that's a problem.

      They're just systems with two different design goals. Some DBs will let you restrict this - SQL Server 2008R2 has a "replicate to nodes" setup that tries to stay 100% in sync but doesn't guarantee it. OTOH, something like Oracle RAC is always 100% in sync because there's only one set of datafiles shared by everyone.

      I said "simpler scalability" because the idea that SQL-based RDBMS systems can't scale to any length is ridiculous - all credit card processing, bank transactions, airline reservations, Amazon orders, etc. all flow through very traditional RDBMS of one of the flavors I mentioned. However, it's a lot more complicated than scaling NoSQL. Getting that guarantee of consistency is not easy once you outgrow a single server, need 100% (not 99.999%) uptime, etc..

      NoSQL is also better for large document storage. Traditional RDBMS has LOBs but they're a later add-on and it somewhat shows. Want to store a few gigabytes of LOBs in your DB? Sure. Want to store terabytes of big LOBs and use your DB as a transactional filesystem? It can be done, but it won't be pretty.

      All in all, it depends on what you're trying to do.

      --
      Advice: on VPS providers
    34. Re:Why not both? by afabbro · · Score: 1

      This isn't totally true. In MongoDB, for example, you don't even really have to think about the "primary key" for every document. Many times I don't know it or even care to. If you wants to look up customers in by name, you'd index the last_name and first_name fields and then do your query like so: db.users.find({last_name : 'Cluster', first_name : 'Lost'})

      An excellent example.

      I think of the NoSQL world as "get a document/piece of data by an indexed data column". It works very well for that. SQL is better for "correlate and compute summation on these data with these sets of conditions".

      --
      Advice: on VPS providers
    35. Re:Why not both? by LongearedBat · · Score: 2

      Think of RDBMS as a van and NoSQL as a dune buggy.

      Each has its advantages. Travelling by buggy might be speedier, but transporting a dozen people by van might be more efficient (different forms of faster). This is because a van has greater scalability than a buggy. Also, a van is more flexible in that it can transport a variety of cargo/passengers. The high speeds in a buggy going off road can also be a disadvantage, compared to the relative safety of driving on roads following the many traffic rules. That said, a buggy isn't restricted to driving on roads as is a van.

      Disclaimer: I might have got some of the analogies the wrong way around, 'cos I'm not as clued up about NoSQL as GP.

      Satisfied now? ;)

    36. Re:Why not both? by fusiongyro · · Score: 1

      All relational databases did was allow the relationships to be more freely defined.

      That, and had an underlying theory of data storage, management and querying based on math. Apart from all that, yeah.

    37. Re:Why not both? by justforgetme · · Score: 1

      not to worry javascript is worse:
      (( sarten-x.com.constructor == (new CHTTPException(404)).constructor ) ? panic : relax)()

      --
      -- no sig today
    38. Re:Why not both? by WaffleMonster · · Score: 1

      There's another piece to the definition. The traditional RDBMS (Oracle, DB2, SQL Server, MySQL, PostgreSQL) is designed to give 100% consistent results. All other design goals are sacrificed so that two people asking the DB the same question at the same time will get the same answer, and no one can make a
      modification and someone else gets an answer that is not 100% up to date.

      This is incorrect. Oracle and MySQL use MVCC for all reads by default. SQL Server is the only one in your list that blocks readers for data where write locks have been issued unless SI or uncommited reads are enabled for the query ( CHOICE). Oracle does not even offer a serialized reads option.

      If one person authorizes $500 on your credit card at 1:00 and consumes your limit and someone else tries to authorize $300 at 1:00:10 and it goes through because the DBMS isn't giving consistent answers, that's a problem.

      Changes are consistant...answers are NOT.

      NoSQL trades consistency for flexibility/simpler scalability.

      You can make consistancy tradeoffs with most RDBMS systems as well.

      Want to store terabytes of big LOBs and use your DB as a transactional filesystem? It can be done, but it won't be pretty.

      Why not?

    39. Re:Why not both? by Frnknstn · · Score: 1

      Yes, and no...

      No, because a the fact relational databases are based off a first-order logic model doesn't mean that a RDBMS 'does' anything that another system couldn't do.

      Also, don't forget that no current RDBMS completely implements the relational database model as originally defined.

      --
      If it's in you sig, it's in your post.
    40. Re:Why not both? by Anonymous Coward · · Score: 0

      One, they're mapping the model directly to disk.

      Several commercial vendors offer systems "mapped" to random access storage if thats what turns you on.

      SQL is an abysmal language that forces you to repeat yourself for anything not completely trivial.

      All this repetition..if only there were views, functions, ctes, procedures.

      it's much easier to just get the data in and munge it yourself than figure out how to tell the DB to do it

      If you knew how to read a manual you would not be such a waste of your employers time and money.

      And I'll pass on empty string being the same as null and other idiocies

      I heard that one thing about Oracle one time. Now I think I'll complain bout it cause otherwise I have nothing substantive to say.

      Third, they're dreadful at refactoring. Ooops, my

      YOU are dreadful at refactoring.

      bad, should have a different table structure. Have to take the whole thing down while we run scripts that will take ages, then, because otherwise running queries will die horribly. Any halfway decent language can update the model while running, be it by way of dynamic libraries, class loaders or whatever. But not DBs.

      Garbage in garbage out. This is what happens when idiots try their hand at schema design.

      Given all that, it's no surprise developer just went f___ this. They might be sorry a few years down the road. But in the meantime they're getting things done, which they wouldn't with your traditional relational DB. And, to reiterate, that's not because the relational model is bad, it's because all the implementations _suck_.

      Your an idiot.

    41. Re:Why not both? by Terrasque · · Score: 4, Interesting

      I've just gotten my NoSQL feet wet by playing around a weekend with python + mongodb. I am pretty used to SQL, and generally had the same thinking you (and most other SQL people) had.

      But, many people liked it, so I figured out I should at least have a look at it. I made a small webapp for tracking my movies, with query to imdb and with users. I was surprised to see that most of the problems I anticipated wasn't a problem at all, and things mostly just worked naturally. For a quick get-started intro to python + mongodb : Part 1 and Part 2. If you got the spare time and some interest, poking around with it is a great little weekend project.

      Anyway, back to your question. MongoDB store data in a format very similar to JSON (technically BSON, a JSON superset), if you're familar with that. Unordered key->value and ordered lists. For the python driver, it translates the data to and from native python dict/list structures. I started with three fields; filename, added and imdb. The imdb field was more or less the raw data from imdb (json format, decoded to python native and encoded to mongodb's BSON format again.)

      Later on I added option for users to mark movies as favorites and seen (by adding two new fields to movie list, "seenby" and "favoriteof" - both lists - these were added to a movie entry the first time someone marked one as seen or favorite). To add a new user I just did movie["seenby"].append(user_id) and movies.save(movie)

      When I wanted to query the db, I created a data structure of what I wanted, and sent that to the server. The server would then return all documents that matched that example structure. So, to find the entry for file "/bla/test.mp4" I would do movies.find( {'filepath': '/bla/test.mp4} ).

      For finding by imdb Title value : {'imdb.Title': '300'}. For finding all favorites by user: {"favoriteof": user_id} (yes, it would handle the list of users as you'd expect, and find all that the list of "favoriteof" had user in it. It would also of course skip all entries without that field).

      mongodb also support some special keywords for searching. Let's say I have a list of 3 users, and want to have all movies that any of them have favorited. {"favoriteof" : {"$in": users} } would fix that - for movies that all of them have as favorite, {"favoriteof" : {"$all": users} }. Sorting was done using sort_by( field_n_direction_list )

      You have a full list of modifiers here. And all could of course be combined to quickly and easily create powerful queries. And you of course have options for indexes. You might notice that you do lose something from normal SQL's here, if you wanted both movie and user info, you'd have to make two queries (well, from what I've understood) so highly relational data is not fitted for this. Also, you don't have the type constraints any more.

      In the app I also wanted to list all movie genres (I did one preprocessing of the imdb data, splitting up comma seperated genres string to a list of genres) and number of times each genre was used. This led me to mapreduce, which was the thing I both anticipated most, and feared most. Well, I kinda chickened out, since the pymongo doc had an excellent example which was exactly what I wanted doing, but I did get a look at it at least :) And it was fast enough to not making a noticeable dent in load time for a few hundred movie entries.

      *Cough* well, that was a long post.. I hope it helped you at least a bit in answering your question, and maybe inspire you to take a closer look at it when you get some spare time. I've only used it over a weekend, so I've probably just scratched the surface, and I probably have missed some neat features or horrible gotchas here and

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    42. Re:Why not both? by geekopus · · Score: 1

      I've not used a NoSQL system (meaning I'm *perfectly qualified to speak*! :-) ), but I would assume that there are abstraction libraries that can apply ORM type mapping on these things.

      My co-workers think I'm nuts, but I have for years said that I would only use stored procedures, triggers, functions, what-have-you when I absolutely have no other choice. The reason is that it smears application logic into your database, which most of the time means that all of the fancy, gee-whiz tools you use to write, maintain, version and otherwise manage your code are nearly useless. It's also more difficult to scale that code (you can make copies for sure, but if you've ever had to do change management on a sizable Oracle cluster, for example, it can be painful).

      Scaling application logic across cheap hardware is also easier than scaling your database.

      So, for me, assuming that I have access to an abstraction layer, I can't see a downside (apart from strict ACID compliance) to a NoSQL system.

      My $0.02. And I've been called a crochety old man before, so if you disagree with me you wouldn't be the first. :-)

    43. Re:Why not both? by Stele · · Score: 1

      A myth once bit my sisther....

    44. Re:Why not both? by jonadab · · Score: 1

      Oh, really?

      nathan@donalbain:~$ ls -l `which vi` `which emacs`
      lrwxrwxrwx 1 root root 23 Mar 24 2011 /usr/bin/emacs -> /etc/alternatives/emacs
      lrwxrwxrwx 1 root root 20 Mar 24 2011 /usr/bin/vi -> /etc/alternatives/vi
      nathan@donalbain:~$ ls -l /etc/alternatives/emacs /etc/alternatives/vi
      lrwxrwxrwx 1 root root 18 Mar 24 2011 /etc/alternatives/emacs -> /usr/bin/emacs23-x
      lrwxrwxrwx 1 root root 17 Mar 24 2011 /etc/alternatives/vi -> /usr/bin/vim.tiny
      nathan@donalbain:~$ ls -l /usr/bin/emacs23-x /usr/bin/vim.tiny
      -rwxr-xr-x 1 root root 6583560 Dec 11 2010 /usr/bin/emacs23-x
      -rwxr-xr-x 1 root root 632884 Jul 11 2010 /usr/bin/vim.tiny
      nathan@donalbain:~$

      Huh. Whaddaya know. vi wouldn't be bigger than Emacs even
      if you added a zero, and that's just the small portion that's written
      in C. Most of Emacs is written in lisp.

      Does vi have a built-in mail and news reader? Does it come with a
      web browser, a spreadsheet, and a TeX-editing mode that can display
      what the results will look like in real time? Can the action that occurs
      when each individual key is pressed be fully customized and scripted
      by the user on a per-file-type basis and take syntactic context into
      account when deciding what to do? Does vi include its own shell,
      vshell, so that the output of in-editor scripted functions can be passed
      as command-line arguments to system commands and vice versa?
      Can you play zork and nethack in vi? Does vi ship with an Emacs
      mode just in case anyone should happen to want that for any reason?

      No, it does not. vi users don't have any idea what the phrase
      "feature-complete" means. If you were building a house, it wouldn't
      have a bat cave or a waterpark in the basement or a helipad on the
      roof or an amusement park on the thirty-fourth floor or an industrial
      scale organic chem lab in the north wing. Heck, the vi house
      probably wouldn't even be capable of space flight.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    45. Re:Why not both? by jd · · Score: 1

      No, I'm absolutely right.

      Ok, these "problems" of which you speaketh:

      1) No, only some map the model direct to disk. Oh, and except on one or two very primitive databases, views aren't mapped into a physical form on disk. MySQL only does so when you tell it to use a storage engine that allows you to map the model that way AND you configure it to. In the example you gave, you could use 1 table and 1 view (or, indeed, 1 table and 1 table-returning function) for almost every relational database out there. More sophisticated databases will let you generate an index from a function and order the table through an index. You still have the view/function, but you don't have to access anything through it for your example.

      NoSQL databases can use any underlying structure (relational, hierarchical, flat file, object-oriented, etc), although they are more often coded as flat file. From Wikipedia:

      Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface.

      No, you don't lay them out. You have absolutely bugger all idea where Memcached or MungoDB has put your data - it could be in any file on any disk in any computer in the database structure.

      2) SQL is nothing more than a wrapper. Not all existing relational databases use SQL. And those that do - well, which SQL are you referring to? ANSI? T-SQL? Some other kind?

      SQL isn't, however, restricted to relational databases. Ingres is not a relational database, it is a star database. Ingres uses SQL. Well, at least as one option.

      3) I've never had any problems refactoring a database live. But, then, I use design tools. Which prevent me from needing to refactor in the first place by making it easy to design things correctly to begin with. That's the great thing about being a software engineer/computer scientist/database admin/system admin -- I know all about the RASDIT methodology and know how to apply it in everyday activities. I pity the poor sods who try to code blind and have to do any refactoring at all. A good design is so easy to create, so easy to implement and a wondrous thing to maintain.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    46. Re:Why not both? by Wee · · Score: 1

      So it's basically key/value where the value is a serialized freeform array ? So then, if there is no structural integrity at the DB level, it has to be implemented in the application logic ? Doesn't that merely displace the performance bottleneck from the DB to the application ?

      By not imposing any structure on what is being stored, performance is very, very good. And yeah, it's up to the application to put and get what it needs properly. The real win is that if your data isn't "relational" meaning that you don't need to correlate anything in a normalized sort of way, it's very very fast. And the storage model is different too. It works well for things that are event based, for instance.

      Computing time is cheap. Development time, not so much.

      Actually, at scale, the exact opposite is true. I worked next to a guy at Google who won a five-day paid vacation to Hawaii for making "most" plain searches 0.09 seconds faster, thereby freeing up a considerable amount of hardware. There was always a hardware crunch at Google. Don't know if there still is or not, though. I suspect so.

      -B

      --

      Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

    47. Re:Why not both? by billcopc · · Score: 1

      At scale, sure. I am not Google. The biggest clusters I manage consist of maybe a dozen machines. If I have the choice between spending a month trying to optimize throughput, or adding an extra node, I'll do the latter because:

      - 150+ hours of dev costs about the same as a decked out DB/web server
      - I'd rather spend that month on billable work, or getting ahead of the game
      - YAY more toys!

      Despite that, I can see how there's a tipping point, dependent on the relation between traffic and revenue. I'm not exactly running an ad-supported business here, if my switch LEDs are burning out, it's because I'm making money.

      --
      -Billco, Fnarg.com
    48. Re:Why not both? by Anonymous Coward · · Score: 0

      Not only didn't you bother to read my post, but you actually don't have a clue what you're talking about!

      1) How do views help with my scenario? I'm talking about retrieving an order and its items in two disk lookups, one in in the index to get to the record, one for the record itself. No can do with SQL databases. Because they directly map tables to disk - it's all new and shiny in Oracle on Exadata that you can split a table the columnar way and still access it as if it wasn't, so new and shiny in fact that you can't use the feature on non-exadata boxes. So you have to go to the index, read the order record, then to another index, then read the items. Views are completely and utterly irrelevant.

      If you store the data in a non-relational DB (or memcached), then you do get the option of storing the whole thing, order and items, in one location. But then you can't access the items at all, short of reading ALL the orders. That was my point.

      Oh, and, yes, you do know what memcached is going to do with your map entries. If you don't, you have no business talking about it. It's open source, for chrissake.

      2) That's nice, I was saying that SQL doesn't map to the layers ABOVE. You know? C, Java, erlang? That sort go thing? Who cares that non-relational DBs expose SQL as an interface? It's shit anyway. And part of the reason why it's shit is that, as you noted, it doesn't really exist in the first place.

      3) Show me a DB where you can, in one transaction and without stopping service, rename a table, create a new table, define a view as a union over both, and direct inserts into the view to the new table (possibly with rewriting). I'm waiting.

      As for "I'm working with good people using good tools", whatever dude. Back in the real world we have changing business needs, changing requirements and mistakes.

    49. Re:Why not both? by Anonymous Coward · · Score: 0

      That's a really good overview, and I've been working in Python lately so it's particularly helpful in figuring out how things might look.

      Thanks for taking the time to respond, you answered a lot of basic questions with those little examples.

    50. Re:Why not both? by JasterBobaMereel · · Score: 1

      SQL is like a Sherman Tank, it is safe, robust and can go surprisingly quickly, but can be overwhelmed by hordes of lightly armed soldiers

      NoSQL is more like a cluster of bots, each is low powered and very simple and so cannot do anything very complex, but they can cope with many things at once ..

      --
      Puteulanus fenestra mortis
    51. Re:Why not both? by JasterBobaMereel · · Score: 1

      NoSQL is a terrible name

      SQL = Structured Query Language, but people usually use it to mean an RDBMS (Relational DB) Because most use SQL

      NoSQL = Not SQL, but actually means is not an RDBMS - usually a hash store (or similar)

      Some NoSQL DB's use SQL to query the hash store, some RDBMS's don't use SQL ...

      --
      Puteulanus fenestra mortis
  6. wake me in a few years by joss · · Score: 5, Funny

    http://www.youtube.com/watch?v=URJeuxI7kHo

    is the best introduction to this subject I've seen. Until someone can explain the pros of hyperdex with a funny video featuring cute animals I'm sticking with technology that's been tested more thoroughly.

    --
    http://rareformnewmedia.com/
    1. Re:wake me in a few years by Sarten-X · · Score: 4, Informative

      Decisions based on cute animals and straw-man arguments without any facts... You must be a manager!

      --
      You do not have a moral or legal right to do absolutely anything you want.
    2. Re:wake me in a few years by c0d3g33k · · Score: 2

      It's funny because it's true.

    3. Re:wake me in a few years by royallthefourth · · Score: 1

      Dealing with managers and coding up job security are certainly linked skills.

    4. Re:wake me in a few years by w_dragon · · Score: 1

      I thought managers wanted pretty graphs and meaningless statistics? Maybe he's an executive?

  7. Wow! That's some neat Progress! by VortexCortex · · Score: 5, Funny

    The hashing system is pretty neat. The idea that you could get at records without their specific key via search criterion is astounding.

    In the future more advanced hashing systems will allow NoSQL databases to extract a set of records all containing a similar subset of data without keys at all!

    Of course we'd need a name for the sections that are matching. Perhaps "Columns", yeah, then each result returned could be called a "Row", makes sense. I bet you could then create even more complex matching patterns for multiple "Columns" against each record in the data-set. If only there was a language to describe query we're sending to the servers... Oh! Server Query Language!

    I can't wait to use SQL with NoSQL 3.0!

    1. Re:Wow! That's some neat Progress! by Sarten-X · · Score: 4, Insightful

      And we'd still be able to have the cluster support, scalability, lax schema, and MapReduce algorithms NoSQL currently provides, right? Sometimes those aspects are vital to the application design, and key to the system's overall performance.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    2. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      Structured Query Language

    3. Re:Wow! That's some neat Progress! by sourcerror · · Score: 1

      Last time I checked Postgres had cluster support. I don't know what you mean by lax schema, or whether I really want it.

    4. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      Why, each time I read something similar to "cluster support, scalability, lax schema", it parses as "Can't be bothered to backup, can't be bothered to optimize, can't be bothered to analyze"?

    5. Re:Wow! That's some neat Progress! by interval1066 · · Score: 1

      I can't wait to use SQL with NoSQL 3.0!

      Or to paraphrase Vortex: "I can't wait to build a hammer with a hypersonic toothbrush!"

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    6. Re:Wow! That's some neat Progress! by interval1066 · · Score: 1

      I think the big deal w/NoSQL is scalability. SQL cannot beat NoSQL in this arena. Not even close.

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    7. Re:Wow! That's some neat Progress! by rubycodez · · Score: 1

      bwahahahaha, you're kidding right? because sql dbms like db2 or oracle dbms can scale bigger than google's search engine db? n

    8. Re:Wow! That's some neat Progress! by Sarten-X · · Score: 4, Insightful

      [citation needed], and preferably one that actually covers NoSQL as it's intended for use.

      Last time I checked thoroughly (2009), most RDBMSs (MS SQL Server included) could scale across an arbitrarily-large cluster, but for every doubling of the cluster's power, the costs would be around 300% to 400%. When you get to the point of needing billions of rows per table (and yes, there are applications out there that need that, even at relatively small startups), those outpacing costs become prohibitive.

      The lax schema isn't about not knowing what you're doing, but about acknowledging that you won't know everything about the data you'll receive. Back when I did server programming, the mantra was "be strict in what you provide, and lax in what you accept". This is that principle applied to databases. Maybe the website you're crawling doesn't have a title, or its address is obviously dynamic. Maybe the medical record's patient has seven different insurance providers. Maybe the passport holder legally doesn't have a surname. When you design a schema for a strict database like an RDBMS, you make certain assumptions about the data you'll get. Those assumptions lead to performance increases if they're accurate, and failure if they're wrong.

      MapReduce is the key to performance without assumptions, at lower cost. By moving processing to the data, and replicating the data to multiple nodes, network transfer is reduced greatly. The MapReduce programs are designed to operate on any amount of data they are presented with, so each node in the cluster contributes its available resources, and since the data is spread evenly, most "queries" will be partially processed by every node. Contrast that with RDBMS sharding, where certain servers handle certain shards, and the massive parallelism of the cluster isn't used. Some servers will sit idle while others do all of the work. Note that the parallelism applies generally, to all MapReduce algorithms. This means that you do not need to make as many assumptions about your queries ahead of time, like expecting to only look up a customer by name or phone number (and therefore indexing those).

      NoSQL isn't just "not using SQL". It's a different storage paradigm, which comes with its own advantages and disadvantages.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    9. Re:Wow! That's some neat Progress! by interval1066 · · Score: 1

      Uh, nope. Not kidding, Cassandra

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    10. Re:Wow! That's some neat Progress! by Sarten-X · · Score: 0

      It's because you can't be bothered to learn something different.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    11. Re:Wow! That's some neat Progress! by rubycodez · · Score: 1

      oops, my brain read your post as "problem with nosql is scalabilty", the exact opposite of your meaning. guess I should switch my neurons to no-sql.

    12. Re:Wow! That's some neat Progress! by Marillion · · Score: 1

      Google? Yeah, they clearly didn't have a clue what they're doing when they invented MapReduce.

      Facetiousness aside, the highly structured storage with tables and columns that a Codd style relational database provides is a better fit for most problems than most of the key/value pair (KVP) databases out there. There is too much R&D invested in that technology to just ignore. Using KVP puts more work on the programmer to organize the data. Storing serialized tuples in JSON or XML or whatever is en vogue at the time isn't structured storage.

      Map Reduce is a technique to "pre-cook" summary data. Google uses it's massive farm of computation nodes to precompute facts about their data. You can not effectively create a single Relational DB instance with a thousand machines - as least not without breaking more than a few well established conventions.

      TL;DR - Use the right tool for the right problem.

      --
      This is a boring sig
    13. Re:Wow! That's some neat Progress! by sexconker · · Score: 1, Insightful

      300 - 400%? Lol you're doing it wrong.
      Billions of rows? So what? Easily handled by SQL.

    14. Re:Wow! That's some neat Progress! by binary+paladin · · Score: 3, Funny

      Another brilliant post on Slashdot!

      The quality of responses around here improve every day.

    15. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 1

      300 - 400%? Lol you're doing it wrong.
      Billions of rows? So what? Easily handled even by a single SQL instance

      FTFY

      Really, most people around here are completely clueless. Billions of rows is simply a matter of proper indexing and having enough storage bandwidth. Problems comes first with increasing concurrent connections. But even in that case it's completely solvable without any NoSQL thingamjic. Data fragmentation, some replication and data depending routing techniques have been around since ages.

      I wonder what these sheeps think MS has behind the millions users using Hotmail/Live/Skydrive. Oh, yes, I guess most think they are secretly using some NoSQL engine on Linux. They must be the same that routinely post about MS using Linux web servers...

    16. Re:Wow! That's some neat Progress! by serviscope_minor · · Score: 2

      The lax schema isn't about not knowing what you're doing,

      It seems to me that laxness and strictness of schemas is very much like static or dynamic typing. With static typing, certain classes of errors simply cannot happen, but you can deisgn yourself into a corner. With dynamic typing, smooshing things around is a bit easier, but you can get runtime errors if you don't design it properly. Personally, I prefer static typing.

      Maybe the medical record's patient has seven different insurance providers. Maybe the passport holder legally doesn't have a surname. When you design a schema for a strict database like an RDBMS, you make certain assumptions about the data you'll get. Those assumptions lead to performance increases if they're accurate, and failure if they're wrong.

      I'm really not convinced by your examples. Your processing is going to make those assumptions instead of making them in your schema, since some code somewhere has to know about the number of surnames. Depending on how the assumptions are made, you'll either output a record with the surname of (null), get a NullPointerException (or something similar) ot a whole host of different possible errors.

      The lax schema in that case will simply move errors from guaranteed failure at data entry time to some random error at some random point in the future.

      It seems that the time to use NoSQL is when your data isn't well represented by a relational model, or (if you are working on a truly immense scale) you have to have the performance.

      --
      SJW n. One who posts facts.
    17. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      Don't make this claim. It's not true in all cases. I've proven it. NoSQL is good for mostly read scenarios. It's TERRIBLE for writes. This blind NoSQL is super fast crap is why I've seen some terrible applications designs where they tried to replicate mostly changing data to multiple NoSQL servers. It's just stupid.

      There are cases for NoSQL in some types of apps, but sometimes a good memcached would do just as well with perminence is a real database.

    18. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 1

      I'd take MS SQL server any day of the week for large data sets that change. Most Map Reduce implementations are great for searching with static data. It doesn't scale with writes.

      You act like NULL columns don't exist. You don't have to write all fields to a database. At some point in the chain you have to interpret the data.. makes sense to store it in a logical way from the beginning rather than trying to guess later AFTER getting a bunch of crap from map reduce.

    19. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      > Perhaps "Columns", yeah, then each result returned could be called a "Row", makes sense.

      They're called "Attributes" and "Tuples" you insensitive clod!

    20. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      Google certainly doesn't know shit about scalable databases. Should have used MS SQL instead, idiots. Bet Bing uses it, right?

    21. Re:Wow! That's some neat Progress! by element-o.p. · · Score: 1

      If you spend 20 minutes to RTFM and think before you start shoveling data at the server, SQL wins every time.

      Um, no. There are times when all you need is a simple data store. For that, SQL is overkill and a key-value hash is perfect.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    22. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      I know, it's because you're a complete idiot.

    23. Re:Wow! That's some neat Progress! by afabbro · · Score: 4, Informative

      300 - 400%? Lol you're doing it wrong. Billions of rows? So what? Easily handled by SQL.

      CERN has a database with trillions of rows in a traditional Oracle RDBMS. I saw a presentation on it at Oracle OpenWorld this year by a guy from CERN..

      Yahoo also has trillion-rowed databases, on PostGreSQL.

      --
      Advice: on VPS providers
    24. Re:Wow! That's some neat Progress! by Anonymous Coward · · Score: 0

      Stop talking sense. Just stop it. Nosql scales like sql can't. It does. It really does. You just switch it on and it scales.

  8. Great for Perl aficionado... by Spectre · · Score: 5, Interesting

    Many of the key-value pair DBs supply a Perl library that let you tie a Perl hash (%Variable) to the DB directly, giving you persistent hashes.

    Makes database storage virtually a native feature of the language. Anybody who uses Perl is probably already a hash buff, so it is a win-win if you and your app already use Perl.

    Disclaimer: I run a 10yo web "app" (Perl/CGI/Apache), so I'm a bit biased. But, the thing is rock-solid, so I'm not going to be too apologetic.

    --
    "Flame away, I wear asbestos underwear"
    1. Re:Great for Perl aficionado... by interval1066 · · Score: 1

      Many of the key-value pair DBs supply a Perl library...

      Did somebody say MongoDB + PERL?

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    2. Re:Great for Perl aficionado... by idontgno · · Score: 1

      Anybody who uses Perl is probably already a hash buff,

      Not just hash, in truth; hallucinogens and sedatives are also widely popular with Perl-heads.

      --
      Welcome to the Panopticon. Used to be a prison, now it's your home.
  9. Branding by slasho81 · · Score: 2

    So, at what point do we all admit that a NoSQL database is basically a glorified file-system over a network and start calling it a file-system again?

    1. Re:Branding by vlm · · Score: 2

      Some nosql "db" support 256 bit keys and everyone knows filesystems can only support 8.3 filenames, so at 8 characters of 7 bit ascii thats only something like 56 bits. If only microsoft had a filesystem supporting longer filenames... maybe next decade.

      (note I'm intentionally avoiding the idea of a 256 directory deep filesystem, each directory containing a subdirectory 0 or 1, because that is just ... illness)

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    2. Re:Branding by slasho81 · · Score: 1

      I didn't say NoSQL dbs aren't great file-systems, but they are file-systems nonetheless...

    3. Re:Branding by donscarletti · · Score: 4, Insightful

      It's great branding.

      Previously, I was developing MMO backend software that uses MySQL for a data storage. The fit to the model was completely inappropriate, there was just no applications of the relational model, since we were just checking in and out large blobs of data, not actually performing read/update transactions. Storing records (persistent game entities) as files in a directory would have worked far better than forcing that stuff into a relational DB. But customers know that Databases are what professionals use, so we did it anyway. Clients can buy it, realise they need the flat files and turn them on after benchmarking, we get the sale, they get a good product in the end, win win, but a bit of wasted effort.

      Now NoSQL is what professionals use, relational DBs can be used for what they are good at and NoSQL gives us marketing hype for doing certain things in the right way that could have been done using filesystems all along. I couldn't be happier. Furthermore we get this nice application level distributed data store with map-reduce stuff built in if we can be bothered using it.

      Here's what most geeks don't get about marketing: it's not just about being smarter than the other guy, you've got to be smarter than him and make him give you his money. Money is good, it buys freedom and power and if branding makes sure that you have more of this freedom and power than the fool who falls for it, then the world will be a better place.

      --
      When Argumentum ad Hominem falls short, try Argumentum ad Matrem
    4. Re:Branding by slasho81 · · Score: 2

      You're proving my point, which is that NoSQL is a marketing distinction, not a technical one. The problem with this is any technical discussion about the subject is biased and therefore flawed.

    5. Re:Branding by donscarletti · · Score: 2

      Well, no, I'm arguing your point as best I can, this stuff is too murky to "prove" anything concretely, but you're welcome.

      A technical discussion is not at all biased by marketing. What's most efficient is most efficient, what is most stable is most stable, what can be implemented the fastest can be implemented the fastest nomatter what the marketing concerns regarding who wants to buy it. But still, the "best" solution involves many factors, the technical factors are extremely important, but you've still got to persuade someone to buy it before it has any baring on the product's suitibility.

      What you need to understand is that these CTOs are way past their prime. 80% of their decisions are based on "judgement", which how we say "prejudice" while still implying respect. If you say "local filesystem" to a tried and tested DB man, he's going to think "half arsed" nomatter how well you've implemented it and whatever benchmarks you show him will just prove in his mind that they would be better if it was done using either his pet style in the case of a proud former developer or "best of breed data management solution" in the case of a guy out of touch with his roots. Show him something he knows and vaguely trusts, he will tend to just assume that the model fits and it works as well as other usecases.

      Middleware is great, because you deal directly with your opposite number on the other side. You think to yourself "what makes me stupid?" and chances are, he's stupid in exactly the same way. Think "what vague halftruth have I read on Slashdot today?" and chances are, he's read the same article and believed it, hook, line and sinker because he's not a specialist in that exact niche. This is what makes marketing brilliant, being self aware beats being right every time. I'm a developer by the way, I just have an interest in marketing because of my smugness and contempt of human intelect.

      --
      When Argumentum ad Hominem falls short, try Argumentum ad Matrem
    6. Re:Branding by Terrasque · · Score: 1

      I just have an interest in marketing because of my smugness and contempt of human intelect.

      Please say you did that on purpose :D

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    7. Re:Branding by Anonymous Coward · · Score: 0

      So is Active Directory a NoSQL?

  10. Use NoNoSQL like me! by Anonymous Coward · · Score: 0

    Use NoNoSQL like me! Otherwise known as SQL.

  11. Keys and values? by Anonymous Coward · · Score: 0

    Isn't that what XML is for? XML files are also compatible across systems.

    1. Re:Keys and values? by Korin43 · · Score: 3, Informative

      Isn't that what XML is for? XML files are also compatible across systems.

      XML is more useful for transferring data between systems. For storing data is kind of sucks, since there's no indexes (not the kind we need for fast lookups anyway) and it's extremely verbose.

    2. Re:Keys and values? by rubycodez · · Score: 1

      there are indexed xml retrieval systems with their own query languages to boot. Oracle XML DB is one (built on top its sql dbms)

    3. Re:Keys and values? by Korin43 · · Score: 1

      There's a difference between a database that returns XML and an XML file.

    4. Re:Keys and values? by shutdown+-p+now · · Score: 1

      But they don't store the data as XML. They usually decompose it down to Infoset, and then store that in some relational fashion with indexes and stuff; and reconstute XML when returning results of a query.

    5. Re:Keys and values? by BoberFett · · Score: 1

      I don't know about Oracle, but in my experience XML databases built on top of RDBMSes (I'm looking at you Microsoft) suck. XML data is often highly unstructured, and at least in the case of SQL Server, tries to force unstructured XML into a structure and ends up doing it poorly.

    6. Re:Keys and values? by rubycodez · · Score: 1

      the topic was storage and querying of xml, plenty of products do that, some without involving an sql database at all

    7. Re:Keys and values? by rubycodez · · Score: 1

      eh, all XML is structured, hierarchically.

    8. Re:Keys and values? by smellotron · · Score: 1

      They usually decompose it down to Infoset, and then store that in some relational fashion with indexes and stuff; and reconstute XML when returning results of a query.

      All of that processing and reconstitution really destroys the nutritional value, and excess compression contributes to high 0x80 levels. You really should be mindful about the data that you are putting into your program.

    9. Re:Keys and values? by shutdown+-p+now · · Score: 1

      Yes, they always screw up the carefully balanced amounts of different kinds of character entities in the process.

    10. Re:Keys and values? by Korin43 · · Score: 1

      I'm kind of confused about your reply, but to clarify -- if you want fast random access, XML is a terrible format. If you want to transfer data between two systems, XML can be excellent. The example of Oracle being able to return XML data just confirms what I'm saying -- the data is stored in Oracle's binary format, and *transferred to you* as XML.

      You seem to be thinking I'm claiming there's something wrong with XML, but all I'm saying is that XML files are not designed to be databases.

    11. Re:Keys and values? by BoberFett · · Score: 1

      Which does not denote structure.

  12. Locally sensitive hashing by Animats · · Score: 5, Informative

    This is a type of index, not a type of database. See locally sensitive hashing. It's an efficient way to find keys which are "near" the search key in some sense.

    Such a mechanism could be provided in a key/value store or an SQL database. It's even possible to do it on top of an SQL database. It's more powerful in a database that can do joins, because you can ask questions with several approximate keys.

    This is an area of active research. Many machine-learning algorithms are scaled up by locally sensitive hashing, so they can work on big data.

    1. Re:Locally sensitive hashing by Anonymous Coward · · Score: 0

      Why would you link to a useless paper from a bunch of Microsoft cultists? FSU is not a good school, and their worship of broken crap from Microsoft just makes them even more useless. Are you one of them? Is that why you're spamming that garbage?

    2. Re:Locally sensitive hashing by el33thack3r · · Score: 1

      Actually, what we do is not at all related to indexing, nor is it related to locally-sensitive hashing (LSH). Hyperspace hashing, the central technique that underlies HyperDex, is a data placement technique akin to consistent hashing.

  13. LOL by Anonymous Coward · · Score: 0

    Who needs to be afraid of ACTA, SOPA and PIPA if his database is gone after a reboot?

    This is retarded like everything I read on Slashdot. I come here for the retarded bullshit.

    http://www.youtube.com/watch?v=URJeuxI7kHo

  14. The days are numbered. by Anonymous Coward · · Score: 0

    You know what's better than NoSQL? - modern distributed relational SQL implementations like Clustrix http://www.clustrix.com/ and Volt http://en.wikipedia.org/wiki/VoltDB to mention two.

  15. NoSQL 2.0 by pankkake · · Score: 1

    Because NoSQL wasn't hipster enough.

    --
    Kill all hipsters.
    1. Re:NoSQL 2.0 by Sez+Zero · · Score: 4, Funny

      Yeah, all the hipsters will be calling it "No2SQL", which is way more righteous.

    2. Re:NoSQL 2.0 by dreemernj · · Score: 1

      Why would you say that? That's actually going to be a thing now.

      --
      1 (short ton / firkin) = 89.1432354 slugs / keg
    3. Re:NoSQL 2.0 by flimflammer · · Score: 1

      Oh my god. Do you know what you've just done?!

    4. Re:NoSQL 2.0 by Anonymous Coward · · Score: 0

      NO! Stop the lemmings madness!

    5. Re:NoSQL 2.0 by Anonymous Coward · · Score: 0

      Yeah, all the hipsters will be calling it "No2SQL", which is way more righteous.

      Nobody respond and the marketeers will never find this... wait! gak!

  16. i was all excited... by Anonymous Coward · · Score: 1

    until I noticed that there seems to be a single point of failure in this system. from the site:

    The HyperDex coordinator maintains the "hyperspace." This involves making sure that servers are up, detecting failed or slow nodes, taking them out of the system, and replacing them where necessary. The coordinator maintains a critical data structure, the hyperspace map, that establishes the mapping between the hyperspace and servers. Clients use this map to locate the servers they need to contact, while servers use it to perform object propagation and replication to achieve the application's desired goals.

    How can people call a system "fault tolerant" and "distributed" when it might as well be running off a single box?

    1. Re:i was all excited... by rescrv · · Score: 2

      Although the coordinator is logically centralized, we've got a version in the works that uses Paxos (a consensus algorithm) to distribute the coordinator as well. For more information check out http://openreplica.org/

  17. multidimensional hash, no point by Anonymous Coward · · Score: 0

    Just append your keys together with a joining symbol. The progammer should be able to produce his own hash in
    how ever many dimensions, better than anything predefined in the database code.

    1. Re:multidimensional hash, no point by rescrv · · Score: 1

      When keys are concatenated with a joining symbol, objects can only be retrieved when one posesses all of the joining keys. Hyperspace hashing allows object retrieval when only a subset of the attributes are available.

  18. Wait... What? by Anonymous Coward · · Score: 0

    NoSQL2.0...
    Read "NoSequel... the sequel"
    So, if the Sequel to NoSQL does exist, then it is itself a paradox.

  19. NoSQL by M0j0_j0j0 · · Score: 1

    Facebook, Google and friends wouldn't need such databases if they respected privacy, solve the privacy issues and a MyISAM will be enough to everyone. And for the marketing, just send pregnancy coupons to everyone, youll get em.

  20. Why hashing? by dtoader · · Score: 1

    In using Oracle RDBMS, I see that for very large data set queries, using a hash join causes lots of disk activity (lots of paging, going to swap) Though hash functions are fast, this performance is from scanning though a hash table that's fully mapped in memory. Once your hash table gets too big for the available memory, you start using disk space (unindexed, sequential full reads) Isn't this a bottleneck in a distributed database that relies on hash functions? Wouldn't you want to have a distributed DB based on a distributed version of a B-Tree descendant (B+Tree, B*Tree,B**Tree) that would use memory AND storage and scale out more than just the available memory on all your nodes? Not only that, but you'd likely have better performance on range scans. Just thinking...

    1. Re:Why hashing? by Anonymous Coward · · Score: 0

      I don't think you're thinking how you should think about this.

      Just a thought.

    2. Re:Why hashing? by Anonymous Coward · · Score: 0

      You know what PGA_AGGREGATE_TARGET and HAS_AREA_SIZE parameters are for, not...

  21. No. by eternaldoctorwho · · Score: 2

    Whenever a /. headline asks a question, the answer is always No.

    1. Re:No. by Luyseyal · · Score: 1

      I thought it was always a superposition of: yes, no, maybe, and my favorite yesnomaybe.

      -l

      /glad those tags went away.

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
  22. Reminds me of ISAM/VSAM based DB's in a way by Anonymous Coward · · Score: 0

    Which, iirc, is SORT of how modern filesystems work (barring IBM's DB/2 driven one), but that's GREAT FOR READS (not so great for writes iirc). ISAM/VSAM also use hashes.

    * Lastly/iirc: Some filesystems are even based on the ISAM/VSAM design too, but ones like NTFS use binary search pattern methods.

    APK

    P.S.=> The use of Hashes also makes it different than most RDBMS (those based on SQL usage), because they're based on binary trees iirc as well...

    ... apk

  23. FTFY by Anonymous Coward · · Score: 0

    "Key-value stores have been supplementing traditional databases..."

  24. Most successful NoSQL in history: Windows Registry by Anonymous Coward · · Score: 0

    Well, the most successful NoSQL product in history is the Windows Registry, which is like one giant Java properties file. Not sure if that's an argument for or against NoSQL.

  25. Only 2.0? by alphred · · Score: 2

    Hand it over to Mozilla. We could then have NoSQL 5.0 by the end of summer.

    1. Re:Only 2.0? by kangsterizer · · Score: 3, Funny

      Or NoSQL 16 if that was Google. What a great joke.

  26. Is it time for the NoIP Internet? by WaffleMonster · · Score: 1

    Why does a new product operating in the very same space as other keyvalue stores warrant an increment of the buzzword version number?

  27. RE: MUMPS by chooks · · Score: 1

    Mumps was NoSql before NoSql was cool: MUMPS and NoSql

    Disclaimer: my only interaction with MUMPS has been via thedailywtf: A Case of the MUMPS

    --
    -- The Genesis project? What's that?
  28. IMS? by Toshito · · Score: 1

    I didn't RTFA but are they trying to reinvent IMS?

    --
    Try it! Library of Babel
  29. Consistency? by StripedCow · · Score: 1

    How about consistency? Does this database even support the notion of transactions?

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
  30. Client language bindings by Anonymous Coward · · Score: 0

    For C, C++, and Python only???

  31. But is it Web Scale? by Anonymous Coward · · Score: 0

    If it's Web Scale I'll use it.

  32. Except if you use by Anonymous Coward · · Score: 0

    Cassandra which is optimized for low-latency writes.

    See, it's all about which tradeoffs you choose. Whodathunkit?

    (Not that I'm advocating a NoSQL-is-fast mindset, mind you. You have to tailor your choices to the situation at hand.)

  33. Indexes in NoSQL is not new by snookums · · Score: 1

    CouchDB has native map-reduce indexing of arbitrary fields of the stored data. Doesn't appear to be anything new here in that regard.

    --
    Be careful. People in masks cannot be trusted.
  34. Coherence by Baki · · Score: 1

    Oracle Coherence (formerly Tangosol) is a distributed cache with queries too. It has existed for many years and does exactly this (and more).
    I don't think this is anything new.

  35. Re:Most successful NoSQL in history: Windows Regis by zootie · · Score: 1

    Against