Making Sense of the NoSQL Standouts
snydeq writes "InfoWorld's Peter Wayner provides an overview of the more compelling NoSQL data stores on offer today in hopes of helping IT pros get started experimenting with these powerful tools. From Cassandra, to MongoDB, to Neo4J, each appears geared for a particular set of application types, providing DBAs with a wealth of opportunity for experimentation, and a measure of confusion in finding the right tool for their environment. 'There are great advantages to this Babelization if the needs of your project fit the abilities of one of the new databases. If they line up well, the performance boosts can be incredible because the project developers aren't striving to build one Dreadnought to solve every problem,' Wayner writes. 'The experimentation is also fun because the designers don't feel compelled to make sure their data store is a drop-in replacement that speaks SQL like a native.'"
less ads.
Print version
Creationist Textbook Stickers Declared Unconstitutional by CowboyNeal
Don' t bother reading this fluff. Wikipedia offers a better overview. http://en.wikipedia.org/wiki/NoSQL. Oh I forgot, this is slashdot, no one here reads the articles :).
More typically, it goes:
Dev: We should use MongoDB.
DBA: THE END IS UPON US!!! The Beast and his armies shall rise from the Pit and make war against God!!! ZALGO!!! HE COMES!!!
To understand recursion, you must first understand recursion.
If you view it as a SQL replacement, then yes, utter garbage. But if you take it for what it is, then no.
The problem is there is a fad surrounding NoSQL and young, ignorant, inexperienced developers think RDBMs are for old farts who refuse to get with the times rather than viewing it as a different tool for solving a different problem. If you want/need ACID properties, you go with SQL. If you don't, NoSQL may be appropriate.
This discussion is likely to lean towards "OMG NoSQL IS SO RETARDED!". So let me just say that if you don't care about NoSQL, then fine. If MySQL/Postgres/Oracle/MS-SQL fit your needs, then fine.
That doesn't mean "NoSQL" databases are useless.
I've had exposure to both MongoDB and CouchDB so far. CouchDB is the newest experience, as part of a Chef installation. Yes, it is a very immature product, and yes it has a long way to go, but it's very simple to configure and it does it's job with very few resources. I don't personally have a need for CouchDB myself, but I can see why people use it for certain specific needs (I.e. I can understand why Chef uses).
MongoDB is a little marvel for certain applications. In my current and previous jobs we've used MongoDB for Syslog collection and SMTP mail logging. MongoDB is excellent for this sort of thing: each log entry is a single entry in the collection, the data is NOT relational in any interesting way and the insertion rate is far beyond anything a traditional relational database engine could manage on the same hardware at the same resource utilisation. Even better you can write some quite clever Map/Reduce functions on top that allow you to do some amazingly deep inspections of the log data, so you can produce on-demand data as well as graph out long term trends.
NoSQL is a NOT a replacement for traditional SQL databases, but it sure is useful for stuff where SQL databases struggle.
actually:
Dev: We should use MongoDB.
DBA: BWAHAHAHAHAHAHA!!! NO !!! Oracle. get used to it or GTFO!
What ? Me, worry ?
Read Nati Shalom's blog for an interesting article (http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html) about how to impliment an application using an In Memorg Data Grid as a front for the data and for real time or near real time analytics. The data can be persisted to a SQL or NoSQL database of your choice, depending on what best suits your application's needs.
sigo ergo sum
Key-value store
Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is generally of interest to friendless sperglords only.[16] The following types exist:
Crowdsourcing at its finest.. Although, I suppose the comment is accurate?
Sure, some solutions are faster than MySQL out of the box by skipping much of the language parsing and stuff that any SQL solution has to do. But that's not to say that they are actually more efficient at key retrieval.
For example, one developer found that the best no-sql solution was.... MySQL, which excels at simple key retrieval. He was able to best MemCached by a factor of almost 2.
Use the right tool for the job.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
first off, you have to really really understand your dataset before committing to either an sql or no-sql solution. this is because the main theoretical difference, as i see it, in sql, one basically generates a result set and the "game" is to find a particular record (or records) within result set, whereas with nosql, you basically already have your "object" (or key) and the "game" is to find what the object connects to. a subtle yet extremely important difference.
im towards the end of an 8 month project that i started with mysql, and switched to mongodb about a month in. why? because we are dealing with facebook data, and with facebook data, you start with the "id" (profile id) and nothing else. so it made sense to use nosql, or else the end result would have been implementing a hash table in mysql.
one totally amazing aspect of mongodb are the embedded documents. i use them to create a embedded "connection" for each key, and i can query an object (hash)'s connections and figure out what relates to what. it is extremely powerful. the key is deciding what is a connection and what is an object. for example, a user is an object, a community is an object, but a users role within that community is a connection. so you can group your objects (or "documents") into "collections" that have connections. in facebook's case, a page is an object but a "like" is a connection.
but, really, dont use nosql just because its cool (its not even a new idea, is it!?). its certainly a really neat and novel way to program a database, but could be your downfall if you dont understand your data set first.
Yeah, the problem is that you want and need ACID, even if you don't know what it means. Very, very rarely, you may find yourself in a situation where availability demands are too great for systems with the ACID property, and then you should consider using one of these non-relational systems. The problem from where I'm sitting, is that too many young, ignorant, inexperienced developers think that their shitty little website needs to be prepared for handling millions of hits per second, and jump to two conclusions: one, that the problem is their database (and not the way they're using it), and two, that ACID should be thrown out the window to fix it.
All other things being equal, you are much more likely to be implicitly depending on ACIDity than in a situation where demand is great enough that choosing NoSQL is worth the trouble you're going to get into.
We want to jump on the NoSQL ship. I won't bore you with all of the details but briefly put SQL databases and tables are too restrictive for our work. Unfortunately because there are SO many NoSQL solutions, and none of them are backed by big names nobody here has the balls to sign off on one. Unfortunately, and ironically, NoSQL's biggest downside is the lack of cross compatibility. Once you make that call you're stuck with it good or bad.
The other issue, is that because all of these solutions are relatively young the toolsets simply don't exist for many of them. No libraries, backup solutions, third party support, etc. I wish we'd see someone like Microsoft, Oracle, IBM, or any big name roll out some kind of complete solution (in particular XML compatible). I know a few big Cloud solutions exist but again we come back to being locked into a solution.
First you need to learn something useful, like understand a normal database, like PostgreSQL, SQLLite, DB2 or whatever your heart desires (not MySQL, that's just not right.) Once you really understand the normal databases and you understand your requirements only then you can make a statement by going 'nosql' something, otherwise it's most likely for most scenarios is counterproductive, you are not all FBs out there.
You can't handle the truth.
I need to dump millions of lines of syslog output to a structured datastore. I don't give a toss about ACID: I just need to know that the write succeeded. A NoSQL like MongoDB does the job brilliantly.
Syllable : It's an Operating System
No, it should go...
DEV: We should use MongoDB
DBA: Really? Here, have a nice big frosty glass of shut the fuck up. Now go back to your toy scripting languages and leave the data to those of us who actually understand data storage.
That should be the end of the discussion right then and there. The problem with these script kiddies is that 99.5% of them don't fucking have a clue about data. They are the ones who still embed SQL statements, log in credentials and the like in their php/python/rails/whatever.scripting.language.is.popular.this.week code. They have never even heard of stored procedures and views and wouldn't know a constraint from a hole in the ground. Sadly, it is not really their fault. MySQL ruined many a dev because it was so utterly primitive for so many versions that they never had to take the time to learn a proper database like Postgres, Oracle, DB2, MS-SQL which would have forced them to actually learn about data storage and retrieval.
MongoDB one of those fine databases that have managed to turn simple into complex eg:
Hey KID! Yeah you, get the fuck off my lawn!
I was a notes programmer a decade ago... (wow...) I went to a talk on CouchDB and It all seemed strangely familiar.
Basically lotus Notes is a NoSQL database with an email and calendar program attached. Of course anything was better than "lotus script" but I can see why this stuff is very appealing. I think some of the couchDB developers are former notes developers are involved in the NoSQL movement.
If your data is worth something out of this single application, you need relational and ACID. Syslog records might not qualify as being worth something. I heard flat file works fine for those.
One the many reasons that programmers that I know are adopting these technologies is that it breaks the back of the in-house DBA. Often there are a few in-house DBAs with certifications up the wazoo who squeeze themselves into every project that has to store data(all projects). But somehow their word becomes the final word. Getting a table added to a schema can take days or even weeks and might not be approved at all. Suddenly with MongoDB or whatever the DBA has no possible input. One can make all kinds of arguments for and against relational systems and how valuable a DBA is to the long term health of a datastore but from many developer's / project manager's perspective a modern DBA often acts as a brick wall to on time on budget.
I think what GGP is saying is that it is reasonable to assume that ACID is needed by default, with proof required that it is not the case. Which makes sense for the same reason why assuming that something can be written in some high-level programming language makes sense, before deciding that, no, it really has to be in C.
Intersting article.
This is a funny Q&A session on Mongo DB which raises a good point.
Slashdot needs Geekcode | Can anyone recommend any good SCIFI? My tastes: Foundation, Startide Rising, CITY, Ringworld,
If you are curious about the benefits of using MongoDB there is a good explanation here.
but what about that third pillar? the quality thing?
The article didn't cover Amazon SimpleDB (http://aws.amazon.com/simpledb/). SimpleDB is part of Amazon AWS, so it's cloud-only. However, if you're planning to deploy on AWS anyway, it makes for a formidable option.
www.clarke.ca
No mention of RavenDB http://ravendb.net/ or does it not fall under the NoSQL category?
We use Cassandra for all the user management and virtual file system storage at ClubCompy, It is so blazing fast compared to SQL for both read and writes, and it is very scalable. I've had a node of my storage cluster go down and whole system stays up with no data loss, and it can repair itself once I bring the downed node back up.
Coding to Cassandra is pretty challenging, you have to do all of your data modeling in code or use the new CQL to access the cluster. I wrote about my experiences recently, where I have started using Google's Protocol Buffer to give me more flexibility in how I store my data and describe my column families: Coding to Apache Cassandra with Google's Protocol Buffers
Dave
Ok, I'm bracing for a crayon comment or some flaming, but I'm one of those script kiddies trying to move onwards and upwards.
I've read a lo about data, but the stuff I've found is all over the place. Can you point me in a good direction to start understanding data better?
An important change for education.
DO NOT use PICK. I've been using it for 3 years, and the kindest thing I can say about it is that it is a cool idea implemented by an ugly hack. Library & inter-communications options just suck.
No, it should go...
DEV: We should use MongoDB
DBA: Really? Here, have a nice big frosty glass of shut the fuck up. Now go back to your toy scripting languages and leave the data to those of us who actually understand data storage.
That should be the end of the discussion right then and there. The problem with these script kiddies is that 99.5% of them don't fucking have a clue about data. They are the ones who still embed SQL statements, log in credentials and the like in their php/python/rails/whatever.scripting.language.is.popular.this.week code.
Congrats. You're the reason we get devs storing images in databases.
Either you have to educate your developers on what is appropriate to go into a relational database, or you need to get out of the way. Your attitude is exactly the reason NoSQL is picking up steam. I'm not a dev, but I've done dev work - nor am I a DBA, but I've done DBA work. And I can tell you, DBA's are often folks running around with a hammer: everything looks like a nail.
Devs, on the other hand, are looking for a solution, and thinking like devs: I'll build the solution to my problem! Of course, they usually end up reimplementing stuff other people have done.
If devs understood how full RDBMS's worked, database use would drop like a stone. If DBAs tool the time to understand requirements, database use would drop like a stone. NoSQL makes a _huge_ amount of sense. While you maintain your "script kiddies" attitude, the rest of the world will happily glide past you.
RDBMS's are 90% misused, and a massive waste of money. NoSQL is an overraction to that fact. Sometime in the future people will swing back to the middle and realize that files in directories are a surprisingly good way of storing data -- and each will have its place.
.
insert into users values('bob','123 Main Street','Springfield','NY');
I want to punch you in the head for not specifying the columns you're inserting into!
The MongoDB record looks no more complex to me than the insert statement. In fact, the MongoDB record looks more readable, but what do I know, I'm probably one of the "script kiddies" you like to so disparage. I like to have my column names next to the data that actually goes into them, rather than some mess like insert into users (username, address, street, city, state) values('bob','123 Main Street','Springfield','NY'); that the true equivalent SQL would have been. By the way, I wonder why SQL uses such a syntax, when the SQL UPDATE statement is much more readable, and by the way, an update statement would look not much different from the MongoDB record, with equals signs instead of columns, and a few keywords instead.
As is always in the world of software, there are some jobs for which NoSQL is in fact a very good idea, and others for which relational databases are better. If the fine folks at Google thought as you did and believed a traditional RDBMS was the only tool they could use then I doubt that Google would have grown to the size it has. They knew and understood that their problem did not map well into the concept of a standard relational database and acted accordingly. Of course, you also need to recognize when such an approach is warranted, as more often than not you'd be better off using a real RDBMS, and it would not be wise to shift to NoSQL databases just because you're driven by buzzword compliance.
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
The proper audience is BOTH.
Dev's doing data structures is generally less than optimal (Disks have to spin? Just buy faster ones), DBAs doing logic flows is generally bad (This is the optimal data structure, so let's just change the business logic a bit). Both working together will build a much better application because it broadens the amount of concepts that can be taken into account.
This sig is the express property of someone.
probably wouldn't fly in the linux kernel!
MongoDB one of those fine databases that have managed to turn simple into complex eg:
Is that what MongoDB code looks like? Looks perfectly readable to me. Cleaner and more structured than the SQL version. More verbose, yes, but highly usable, unlike SQL which always requires a couple of layers of abstraction and conversion and mapping in order to make it usable.
You might have just converted a SQL user to MongoDB.
I have hardly seen any C / C++ developers complaining about database. Java/Ruby/Python/whatever on the other hand are just pussies.
C/C++ developers are used to cumbersome and arcane rituals. Ruby and Python (Java less so, but still more than C/C++) are supposed to make programming faster and more natural. A more natural way to store and access data makes a lot of sense there. You could call them pussies, but you could also say they're more focused on the goal itself rather than the arcane stuff around it.
NHibernate and ADO.NET are tools for interacting with a data store, this article is about the data stores themselves, not the tools of interaction.
NHibernate is an ORM, more of a competitor to Microsoft's Entity Framework than to ADO.NET.
For me, bulk inserts only seem to work well in MySQL, not in any other DB system. (I'm probably doing something wrong; I know little about databases. I just want my data stored.)
At least they have a name, rather than merely a generic description, like MS SQL Server.
You don't understand the benefits of Mongo DB.
Please add a column to your users table. In your SQL DB you probably need to convert the table and have an global lock while doing so.
In Mongo DB on the other hand, you just add the whatever you want to include:
{
"username" : "bob", "address" : { "street" : "123 Main Street", "city" : "Springfield", "state" : "NY" }, "an" : "new", "column" : "with some new data"
}
Thanks for the laughs mate!
But yes, it is really a mater of using your tools correctly...
I once forced myself to take a day of because I saw an Intranet DB 20GB in size with just 20k rows in a total 14 tables...
The 'hotshot' C++ programmer that wrote the scripts to the db had not only told the app to upload every image to a table field but also doing it multiple times for the same image instead of keeping some relational record and taxing the Intranet sever with many many DB retrieves and php overhead (because obviously that's the way you deliver an image right???) !
The effect of this? Just a week after the app was delivered it's lookups slowed down to a crawl and getting me called in because 'Your Intranet has gone slow' Obviously the `hotshot` was just a contract and had taken off for newer heights and was (of course) unreachable. After looking login in to the DB server I noticed the blob types in every single table and went home to laugh my buttocks of
Long story short I just expanded the Intranet app to accommodate the new functionality, adding a public interface for doing project uploads from outside the company network. This is what they used the C++ binary for btw... (like ssl hasn't been invented yet)
which actually was what the Managers should have done in the first place.
D@mn do I get angry when I think of incompetent management
-- no sig today
I've thought I'd seen a problem with our Netflix queue. I just assumed my wife had messed it up somehow :)
90% of the wealth is in 2% of the pockets. Bummer to be in the majority.
A fair question so here is a fair answer.
I am assuming ( yes, yes I know... ) of course you have some knowledge of basic tables and indexs, etc.
Start by reading and understanding Database Normalization
Someone much wiser then myself once said, "You have to completely understand a set of rules before you can break them". I mention this in reference to data normalization.
The people who are the best coder / data monkey combination have the innate ability to think in structures. This is not to say it cannot be learned, but it really is a way of thinking that is left & right brain.
Realize that data is NOT TRIVIAL. Data is why we write code. Data drives code, not the other way around. If you need further proof other then my word go look at the source code for Linux. There are 10's of thousands of data structures that make it work and the code is designed to keep them updated and provide access to that data.
Build a non-trivial set of linked lists, then write the code that manipulates it without modifying the data structures. This will be illustrative of the importance of data. Keep working until you can't get any farther. When you have reached that point, throw away all the code and then go and find the errors in your data structures because that is where the fault will be located and then perhaps the light will turn on.
Hey KID! Yeah you, get the fuck off my lawn!
Nope you are wrong. In Oracle you can alter a table while the entire system is in full use and transactions are flying like mad, no locks, no break in service no sweat. Oracle simply updates the data dictionary and as statements come through older records are modified on the fly and new records are simply, well, inserted.
Hey KID! Yeah you, get the fuck off my lawn!
Nope, now all that has to be converted into some sort of escaped string do to all the bloody text.
Hey KID! Yeah you, get the fuck off my lawn!
While this is somewhat proper according to SQL-92 and on what is the failure mode? Personally, I believe it violates the atomic nature of an sql statement.
Now given a constraint that specifically only allows 'a'..'z' and 'A'..'Z' into the column lname, what part of this transaction fails? All of it or only (fred 3lmer) ? And if the entire transaction does not fail how does one determine which insert failed? "Insert" is an inherently atomic transaction and should, at least in my opinion, not be overloaded in this manner.
Hey KID! Yeah you, get the fuck off my lawn!
A perfect illustration why good DBA's are worth having on staff.
Hey KID! Yeah you, get the fuck off my lawn!
Using such a DB can be a two edged sword. Especially when wielded by bored CTOs who have nothing to do but try new tech without "sweating the details."
They key thing I have taken away from the experience of using such a DB is that typically, software architects will start migrating or build new functionality in earnest, only still succumbing to a relation schema in the end.
Except the schema is backed by a non-relational database now. Which causes very, very high amounts of pain.
This is not to say correct use of NoSQL DBs is not possible. I just have yet to see it.
The "mess" seems like YAML to me. Not perfectly certain what it is - but I can parse it right now perfectly despite the two vodkas since the morning. What kind of shit are you on, anyway?
I know tobacco is bad for you, so I smoke weed with crack.