"Slacker DBs" vs. Old-Guard DBs

← Back to Stories (view on slashdot.org)

"Slacker DBs" vs. Old-Guard DBs

Posted by kdawson on Tuesday March 24, 2009 @06:25AM from the close-enough-for-web-work dept.

snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."

20 of 267 comments (clear)

slashdot insult? :( by FlashBuster3000 · 2009-03-24 06:34 · Score: 5, Funny

FTA: "The world won't end if some snarky, anonymous comment on Slashdot disappears."
What? Nothing more important than anonymous slashdot trolls to moderate :/
*mods article -1, Flamebait* by TheSpoom · 2009-03-24 06:34 · Score: 5, Insightful

Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?
I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
Normalization doesn't exist to save disk space by qoncept · 2009-03-24 06:35 · Score: 5, Insightful

Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...
You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.

--
Whale
1. Re:Normalization doesn't exist to save disk space by Hognoxious · 2009-03-24 07:06 · Score: 5, Funny
  
  You don't want to store the same data in multiple places.
  
  But if one of them is wrong, you can check the others and correct it.
  My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
2. Re:Normalization doesn't exist to save disk space by mooingyak · 2009-03-24 07:12 · Score: 4, Funny
  
  But if one of them is wrong, you can check the others and correct it.
  My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".
  I was about to post something explaining to you why that's bad, and then I reread your post and the whooshing noise around me quieted down.
  
  --
  William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
3. Re:Normalization doesn't exist to save disk space by qoncept · 2009-03-24 07:25 · Score: 4, Informative
  
  Right, and, boss, which one is right?
  
  People that haven't done it don't realize how easy it is to end up in that situation. Say, I write reports about people, and Robin writes reports about assets, whose owners are people, and puts a person's name in her table to make it faster. Someone gets married, their name changes, and now Robin's reports are wrong.
  
  --
  Whale
Mod this down by Anonymous Coward · 2009-03-24 06:35 · Score: 5, Funny

Like the article says, "The world won't end if some snarky, anonymous comment on Slashdot disappears."
Laziness Rules by ergo98 · 2009-03-24 06:37 · Score: 5, Insightful

Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.
It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.
1. Re:Laziness Rules by metalhed77 · 2009-03-24 07:02 · Score: 4, Informative
  
  Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...
  
  --
  Photos.
2. Re:Laziness Rules by Ambiguous+Puzuma · 2009-03-24 07:12 · Score: 4, Insightful
  
  If you want "a little more" than a simple flat file, perhaps SQLite is the answer? The people on the Firefox team seem to think so, for example.
  SQLite has been a pleasure to use for a small personal project involving a few Perl scripts. Granted my background is with SQL Server and Oracle, so perhaps I'm not the target audience, but I found it extremely easy to use and surprisingly efficient--and I didn't need to set up a server or anything. I didn't even need to explicitly create a database!
3. Re:Laziness Rules by Anonymous Coward · 2009-03-24 07:19 · Score: 5, Funny
  
  Thanks for validating the OP comments....
4. Re:Laziness Rules by diamondsw · 2009-03-24 07:25 · Score: 4, Insightful
  
  >Damien Katz, CouchDB's creator ... worked on Lotus Notes prior to that...
  That's not exactly a ringing endorsement.
  
  --
  I don't know what kind of crack I was on, but I suspect it was decaf.
a base of data by poot_rootbeer · 2009-03-24 06:44 · Score: 4, Insightful

"tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'
If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...
distributed databases and P2P by thanasakis · 2009-03-24 06:52 · Score: 4, Informative

The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.
You young whippersnappers don't know nothing! by www.sorehands.com · 2009-03-24 07:00 · Score: 4, Interesting

Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.
Network model databases will outperform relational all the time. You just don't have the same flexibility.
Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.
I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.

--
Fight Spammers!
I've never understood the UNIX world's fascination by Richard+Steiner · 2009-03-24 07:00 · Score: 5, Informative

I've never understood the UNIX world's fascination with relational databases.
Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.
We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.
A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.
Records not allocated are filled with a deallocated fill pattern.
A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.
To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.
Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.
We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.
I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
SELECT * FROM SNARKY_COMMENTS by billstewart · 2009-03-24 07:13 · Score: 4, Funny

Can't quite fit the whole query into the title box, but if you were using one of those databases that Wayner's article talked about, you'd be able to query and find out if you were first...

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Berkeley DB is awesome by IGnatius+T+Foobar · 2009-03-24 07:18 · Score: 4, Interesting

I can't believe there hasn't been any mention of Berkeley DB yet. Guess what, folks: sometimes you just don't need the features of a full relational database. Sometimes all you need is fast, robust, reliable storage of indexed key/value pairs.

I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).

Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.

--
Tired of FB/Google censorship? Visit UNCENSORED!
Re:I've never understood the UNIX world's fascinat by dcowart · 2009-03-24 07:38 · Score: 4, Insightful

How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.
It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.
Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.
It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.
--Donald

--
www.rdex.net
Just data structures by Thaelon · 2009-03-24 07:45 · Score: 4, Insightful

Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.
Knowing the ins & outs of your data structures is still a vital skill of programming.

--
Question everything