Making Sense of the NoSQL Standouts
snydeq writes "InfoWorld's Peter Wayner provides an overview of the more compelling NoSQL data stores on offer today in hopes of helping IT pros get started experimenting with these powerful tools. From Cassandra, to MongoDB, to Neo4J, each appears geared for a particular set of application types, providing DBAs with a wealth of opportunity for experimentation, and a measure of confusion in finding the right tool for their environment. 'There are great advantages to this Babelization if the needs of your project fit the abilities of one of the new databases. If they line up well, the performance boosts can be incredible because the project developers aren't striving to build one Dreadnought to solve every problem,' Wayner writes. 'The experimentation is also fun because the designers don't feel compelled to make sure their data store is a drop-in replacement that speaks SQL like a native.'"
From the Anything-Better-Without-Oracle department.
less ads.
Print version
Creationist Textbook Stickers Declared Unconstitutional by CowboyNeal
"providing DBAs with a wealth of opportunity for experimentation" they need to start targetting the right audience group of
developers.
DBA : We should use mongoDB ...
DEV : Bend over
Don' t bother reading this fluff. Wikipedia offers a better overview. http://en.wikipedia.org/wiki/NoSQL. Oh I forgot, this is slashdot, no one here reads the articles :).
If you view it as a SQL replacement, then yes, utter garbage. But if you take it for what it is, then no.
The problem is there is a fad surrounding NoSQL and young, ignorant, inexperienced developers think RDBMs are for old farts who refuse to get with the times rather than viewing it as a different tool for solving a different problem. If you want/need ACID properties, you go with SQL. If you don't, NoSQL may be appropriate.
As Michael Hunger points out in the comment on the article, it seems like the article author Wayner did almost no research on the Neo4j graph database. Some of his points are flat-out incorrect.
This discussion is likely to lean towards "OMG NoSQL IS SO RETARDED!". So let me just say that if you don't care about NoSQL, then fine. If MySQL/Postgres/Oracle/MS-SQL fit your needs, then fine.
That doesn't mean "NoSQL" databases are useless.
I've had exposure to both MongoDB and CouchDB so far. CouchDB is the newest experience, as part of a Chef installation. Yes, it is a very immature product, and yes it has a long way to go, but it's very simple to configure and it does it's job with very few resources. I don't personally have a need for CouchDB myself, but I can see why people use it for certain specific needs (I.e. I can understand why Chef uses).
MongoDB is a little marvel for certain applications. In my current and previous jobs we've used MongoDB for Syslog collection and SMTP mail logging. MongoDB is excellent for this sort of thing: each log entry is a single entry in the collection, the data is NOT relational in any interesting way and the insertion rate is far beyond anything a traditional relational database engine could manage on the same hardware at the same resource utilisation. Even better you can write some quite clever Map/Reduce functions on top that allow you to do some amazingly deep inspections of the log data, so you can produce on-demand data as well as graph out long term trends.
NoSQL is a NOT a replacement for traditional SQL databases, but it sure is useful for stuff where SQL databases struggle.
Read Nati Shalom's blog for an interesting article (http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html) about how to impliment an application using an In Memorg Data Grid as a front for the data and for real time or near real time analytics. The data can be persisted to a SQL or NoSQL database of your choice, depending on what best suits your application's needs.
sigo ergo sum
Key-value store
Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is generally of interest to friendless sperglords only.[16] The following types exist:
Crowdsourcing at its finest.. Although, I suppose the comment is accurate?
Sure, some solutions are faster than MySQL out of the box by skipping much of the language parsing and stuff that any SQL solution has to do. But that's not to say that they are actually more efficient at key retrieval.
For example, one developer found that the best no-sql solution was.... MySQL, which excels at simple key retrieval. He was able to best MemCached by a factor of almost 2.
Use the right tool for the job.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
first off, you have to really really understand your dataset before committing to either an sql or no-sql solution. this is because the main theoretical difference, as i see it, in sql, one basically generates a result set and the "game" is to find a particular record (or records) within result set, whereas with nosql, you basically already have your "object" (or key) and the "game" is to find what the object connects to. a subtle yet extremely important difference.
im towards the end of an 8 month project that i started with mysql, and switched to mongodb about a month in. why? because we are dealing with facebook data, and with facebook data, you start with the "id" (profile id) and nothing else. so it made sense to use nosql, or else the end result would have been implementing a hash table in mysql.
one totally amazing aspect of mongodb are the embedded documents. i use them to create a embedded "connection" for each key, and i can query an object (hash)'s connections and figure out what relates to what. it is extremely powerful. the key is deciding what is a connection and what is an object. for example, a user is an object, a community is an object, but a users role within that community is a connection. so you can group your objects (or "documents") into "collections" that have connections. in facebook's case, a page is an object but a "like" is a connection.
but, really, dont use nosql just because its cool (its not even a new idea, is it!?). its certainly a really neat and novel way to program a database, but could be your downfall if you dont understand your data set first.
Yeah, the problem is that you want and need ACID, even if you don't know what it means. Very, very rarely, you may find yourself in a situation where availability demands are too great for systems with the ACID property, and then you should consider using one of these non-relational systems. The problem from where I'm sitting, is that too many young, ignorant, inexperienced developers think that their shitty little website needs to be prepared for handling millions of hits per second, and jump to two conclusions: one, that the problem is their database (and not the way they're using it), and two, that ACID should be thrown out the window to fix it.
All other things being equal, you are much more likely to be implicitly depending on ACIDity than in a situation where demand is great enough that choosing NoSQL is worth the trouble you're going to get into.
So, Netflix won't work on my Roku. Get "internal services error" messages. Google gets me to this two month long thread. Been going on since mid June and still isn't fixed. There is some problem with the Netflix "instant queue"; looks like the server has a cache that is out of date somehow. Can be fixed by altering or deleting entries from a web browser. Problem pops up with several different clients. Thinking to myself; this is a caching problem in the Netflix web services stack; probably some multi-tier coherency problem and reckless programming. Things like NoSQL come to mind; Digg and Twitter learned the hard way in public too.
Then this story appears. More muddled thinking about databases. I decided to make the effort to see if my blue sky guesswork about Netflix and their screwups have any basis in fact. Result of Google query #1 ("netflix nosql"):
This is Yury Izrailevsky, Director of Cloud and Systems Infrastructure here at Netflix. As Netflix moved into the cloud, we needed to find the appropriate mechanisms to persist and query data within our highly distributed infrastructure ... move beyond the constraints of the traditional relational model ... high availability ... trumps strong consistency ... we have found ourselves braving the new frontier of NoSQL distributed databases.
22 weeks from that blog post to first damage.
This is just the sort of unthinking buzzword driven nonsense I have come to associate with all things NoSQL, the technology of celebrity wannabe PHBs. The results speak for themselves.
We want to jump on the NoSQL ship. I won't bore you with all of the details but briefly put SQL databases and tables are too restrictive for our work. Unfortunately because there are SO many NoSQL solutions, and none of them are backed by big names nobody here has the balls to sign off on one. Unfortunately, and ironically, NoSQL's biggest downside is the lack of cross compatibility. Once you make that call you're stuck with it good or bad.
The other issue, is that because all of these solutions are relatively young the toolsets simply don't exist for many of them. No libraries, backup solutions, third party support, etc. I wish we'd see someone like Microsoft, Oracle, IBM, or any big name roll out some kind of complete solution (in particular XML compatible). I know a few big Cloud solutions exist but again we come back to being locked into a solution.
First you need to learn something useful, like understand a normal database, like PostgreSQL, SQLLite, DB2 or whatever your heart desires (not MySQL, that's just not right.) Once you really understand the normal databases and you understand your requirements only then you can make a statement by going 'nosql' something, otherwise it's most likely for most scenarios is counterproductive, you are not all FBs out there.
You can't handle the truth.
I need to dump millions of lines of syslog output to a structured datastore. I don't give a toss about ACID: I just need to know that the write succeeded. A NoSQL like MongoDB does the job brilliantly.
Syllable : It's an Operating System
I was a notes programmer a decade ago... (wow...) I went to a talk on CouchDB and It all seemed strangely familiar.
Basically lotus Notes is a NoSQL database with an email and calendar program attached. Of course anything was better than "lotus script" but I can see why this stuff is very appealing. I think some of the couchDB developers are former notes developers are involved in the NoSQL movement.
Exactly, assuming that everybody needs ACID for every datastore is not a valid assumption and if you think it is, you have a very limited imagination.
If your data is worth something out of this single application, you need relational and ACID. Syslog records might not qualify as being worth something. I heard flat file works fine for those.
One the many reasons that programmers that I know are adopting these technologies is that it breaks the back of the in-house DBA. Often there are a few in-house DBAs with certifications up the wazoo who squeeze themselves into every project that has to store data(all projects). But somehow their word becomes the final word. Getting a table added to a schema can take days or even weeks and might not be approved at all. Suddenly with MongoDB or whatever the DBA has no possible input. One can make all kinds of arguments for and against relational systems and how valuable a DBA is to the long term health of a datastore but from many developer's / project manager's perspective a modern DBA often acts as a brick wall to on time on budget.
anybody use this? I believe it falls under this category... we're about to get a large system that uses it, the developers say its not too bad, but I've use ADO.NET since 1.1 .
I think what GGP is saying is that it is reasonable to assume that ACID is needed by default, with proof required that it is not the case. Which makes sense for the same reason why assuming that something can be written in some high-level programming language makes sense, before deciding that, no, it really has to be in C.
Intersting article.
This is a funny Q&A session on Mongo DB which raises a good point.
Slashdot needs Geekcode | Can anyone recommend any good SCIFI? My tastes: Foundation, Startide Rising, CITY, Ringworld,
If you are curious about the benefits of using MongoDB there is a good explanation here.
but what about that third pillar? the quality thing?
The article didn't cover Amazon SimpleDB (http://aws.amazon.com/simpledb/). SimpleDB is part of Amazon AWS, so it's cloud-only. However, if you're planning to deploy on AWS anyway, it makes for a formidable option.
www.clarke.ca
Really - why do they all have retarded names?
They should have called it fuck-you-SQL instead of noSQL
No mention of RavenDB http://ravendb.net/ or does it not fall under the NoSQL category?
We use Cassandra for all the user management and virtual file system storage at ClubCompy, It is so blazing fast compared to SQL for both read and writes, and it is very scalable. I've had a node of my storage cluster go down and whole system stays up with no data loss, and it can repair itself once I bring the downed node back up.
Coding to Cassandra is pretty challenging, you have to do all of your data modeling in code or use the new CQL to access the cluster. I wrote about my experiences recently, where I have started using Google's Protocol Buffer to give me more flexibility in how I store my data and describe my column families: Coding to Apache Cassandra with Google's Protocol Buffers
Dave
DO NOT use PICK. I've been using it for 3 years, and the kindest thing I can say about it is that it is a cool idea implemented by an ugly hack. Library & inter-communications options just suck.
probably wouldn't fly in the linux kernel!
"And, NoSQL sounds like something done long ago by 'Big Blue' (IBM)" (ISAM).
E.G.-> ISAM uses hash tables rather than B-Trees (as relational DB's do), & "Lo and Behold", so do NoSQL based databases!
Now, IF I am "off" here, please... feel free to correct me!
(However, imo @ least? Well... NoSQL DB engines don't really sound "all that new" to me, & sounds like a "return to yesteryear" in ISAM methods, or a variation thereof mostly!)
APK
P.S.=> BOTH types of DB engines (relational, or ISAM (or this "new" NoSQL stuff that sounds an AWFUL LOT like ISAM to myself @ least)) have their places... so, use the one that fits your data-processing requirements model(s) best - "right tool for the job" type thinking...
... apk
Penalty -- use of "sales" and "think" in the same sentence. 15 yards!
Using such a DB can be a two edged sword. Especially when wielded by bored CTOs who have nothing to do but try new tech without "sweating the details."
They key thing I have taken away from the experience of using such a DB is that typically, software architects will start migrating or build new functionality in earnest, only still succumbing to a relation schema in the end.
Except the schema is backed by a non-relational database now. Which causes very, very high amounts of pain.
This is not to say correct use of NoSQL DBs is not possible. I just have yet to see it.