"Slacker DBs" vs. Old-Guard DBs
snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."
Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?
I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...
You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.
And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.
Whale
Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.
It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.
"tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'
If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...
the article is right that in some cases it doesn't matter if a transaction is lost. but in any case where money is involved it's a must. you can't just start a fund from your Oracle or SQL Server savings to pay for mistakes because it will kill your brand and you may lose a lot of future business. and any savings will be eaten up by the extra cost to hire people to solve all the data problems
i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found
MySQL strives to provide RDBMS and ACID semantics, though its quality of service (QoS) may fall short. By contrast, these "slacker" databases don't even try to support RDBMS or ACID; even if they operated perfectly, they won't provide RDBMS/ACID.
I work for one of the companies in question (no, I don't speak for them). We rely heavily on a combination of these "slacker" dbs, Berkeley dbs, memcached, Oracle, flat files, and tape backups. Each fills a niche. I wish these articles would quit trying to create a false dichotomy.
How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.
It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.
Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.
It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.
--Donald
www.rdex.net
Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.
Knowing the ins & outs of your data structures is still a vital skill of programming.
Question everything