Slashdot Mirror


"Slacker DBs" vs. Old-Guard DBs

snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."

54 of 267 comments (clear)

  1. slashdot insult? :( by FlashBuster3000 · · Score: 5, Funny

    FTA: "The world won't end if some snarky, anonymous comment on Slashdot disappears."
    What? Nothing more important than anonymous slashdot trolls to moderate :/

  2. *mods article -1, Flamebait* by TheSpoom · · Score: 5, Insightful

    Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?

    I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?

    --
    It's better to vote for what you want and not get it than to vote for what you don't want and get it.
    - E. Debs
    1. Re:*mods article -1, Flamebait* by Just+Some+Guy · · Score: 3, Funny

      You seem to imply there was more to the story than the summary. This confuses me.

      --
      Dewey, what part of this looks like authorities should be involved?
  3. Normalization doesn't exist to save disk space by qoncept · · Score: 5, Insightful

    Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...

    You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

    And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.

    --
    Whale
    1. Re:Normalization doesn't exist to save disk space by qoncept · · Score: 2, Informative

      "CHAR(50)"

      Oracle doesn't have a "string" datatype.

      --
      Whale
    2. Re:Normalization doesn't exist to save disk space by TheSpoom · · Score: 2, Informative

      Ah, my apologies. Really, it should be an indexed enum (or whatever Oracle equivalent there is... it's been a while since I used it) if there's no additional data to go along with the status code... or another table if there is additional data.

      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
    3. Re:Normalization doesn't exist to save disk space by qoncept · · Score: 2, Interesting

      My point exactly. :) There are a lot of things are data warehouse should be and it's not. We're working on redesigning it now though so we should be resolving a lot of the issues. But most people aren't just about to redesign their databases because it's a huge deal. We have 8 different apps using the warehouse, hundreds of reports and people hitting it we don't even know about that will all be obsolete. The cost to redesign is huge, and we only have the opportunity now because a project it is dependent on is being redesigned.

      --
      Whale
    4. Re:Normalization doesn't exist to save disk space by Hognoxious · · Score: 5, Funny

      You don't want to store the same data in multiple places.

      But if one of them is wrong, you can check the others and correct it.

      My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    5. Re:Normalization doesn't exist to save disk space by mooingyak · · Score: 4, Funny

      But if one of them is wrong, you can check the others and correct it.

      My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".

      I was about to post something explaining to you why that's bad, and then I reread your post and the whooshing noise around me quieted down.

      --
      William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
    6. Re:Normalization doesn't exist to save disk space by qoncept · · Score: 4, Informative

      Right, and, boss, which one is right?

      People that haven't done it don't realize how easy it is to end up in that situation. Say, I write reports about people, and Robin writes reports about assets, whose owners are people, and puts a person's name in her table to make it faster. Someone gets married, their name changes, and now Robin's reports are wrong.

      --
      Whale
    7. Re:Normalization doesn't exist to save disk space by PseudoIdiot · · Score: 2, Informative

      Why in the world would you allow access to a line in a database while simultaneously allowing access to another DB with the same line of data? This is easily disallowed by properly using reference ID's, which should have been implemented during the conceptualization of the DB at the very beginning but could still easily be attached to the entire DB. Don't allow data edits without first locking that line in the database, cross-referencing the reference ID, and preventing that same reference ID, regardless of which other DB is exists on) to be modified. If you don't know how to accomplish this with Oracle, or even SQL, you do not have any business touching the database to begin with. Based on your answers, and how ridiculously they've been modded, goncept you barely understand databasing at all.

    8. Re:Normalization doesn't exist to save disk space by Hognoxious · · Score: 2, Funny

      Total failure to understand the situation. You, I mean he, didn't understand the concept of "factoring out" common information - say, the customer details on an order - from the variable per item data - product code, quantity.

      What he, er, you appear to be talking about is natural vs surrogate keys.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    9. Re:Normalization doesn't exist to save disk space by petermgreen · · Score: 2, Insightful

      and in the Orders table once per order.
      I'd disagree on this one, it seems to me like it would be a good idea to record the customers name and address (probablly both billing and shipping) at the time of an order even if they later change the details on thier account.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    10. Re:Normalization doesn't exist to save disk space by Tupper · · Score: 2, Informative

      I'll buy that: the name on a mailing label or shipment may be a different concept than the primary name associated with the account. If so, its wrong to conflate them.

  4. Mod this down by Anonymous Coward · · Score: 5, Funny

    Like the article says, "The world won't end if some snarky, anonymous comment on Slashdot disappears."

  5. Laziness Rules by ergo98 · · Score: 5, Insightful

    Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.

    It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.

    1. Re:Laziness Rules by Samschnooks · · Score: 3, Interesting

      ... and rather than learning it's just as easy to just wave it all off as obsolete.

      I don't know about that. But maybe these slacker DBs are perfect for what they're doing? Glancing at the those mentioned in the FA, it just looks like their simple tools to do simple things.

      Don't get me wrong. I once had the pleasure of working with an Oracle god. This dude was about to take his final Oracle exam in a series of exams and he turned my Join that took ten seconds into a Join that took less than a thousandth. I have no idea what he did to this day, but it took several lines of PL/SQL. We were dealing with tens of millions of rows that had to be processed every night.

      My point is if it's something simple to do, why all the RDBM overhead? Many times, just a simple flatfile is all you need and maybe a little more.

    2. Re:Laziness Rules by KagatoLNX · · Score: 3, Interesting

      In the end, the problem is that people just want a "default tool". They don't want to think about their requirements for data consistency. The really scary bit is that while RDBMses are the "default tool" of yesterday and slacker DBs are the "default tool" of tomorrow, neither of them are really the "problem".

      The "default tool" attitude IS the problem. Unless you carefully weigh your data consistency requirements, you shouldn't be making that call at all.

      I welcome the slackers and all of their new options along the spectrum of speed versus consistency. It's just that most of the people developing applications scare the shit out of me. They're so cavalier (or should I say, "agile", or maybe "pragmatic") about requirements that it's truly disturbing.

      That said, if you're really interested in all of the options, I also recommend checking out memcachedb, memcacheq, and redis.

      --
      I think Mauve has the most RAM. --PHB (Dilbert Comic)
    3. Re:Laziness Rules by phoenix321 · · Score: 2, Insightful

      Problem is, you're re-inventing the wheel several times over in the process. Hint: "a flatfile and maybe a little more" could very well be all the storage technology invented today only a few years down the road.

      At first, all you need is to store key:value pairs. That works with a flat file or with Oracle. Then you need some consistency checks, which are can be modelled fast in Oracle or reasonably fast in your software. Then you need some triggers, which could be written fast in Oracle and not-so fast in your software. And so on until you have progressed through the whole platform effect with several squeaky wheels invented and thousands of hours wasted.

      Any project worth doing that involves storing key:value pairs is worth a real database. Take the tiniest, lowliest member of the crowd as long as it can somehow speak SQL and allows to be linked and unlinked into the project. Everything else will require at least a medium rewrite at some point when you switch over to a real database. You could of course extend everything upon a glorified flatfile until your reinvented wheels strangles all your progress.

    4. Re:Laziness Rules by metalhed77 · · Score: 4, Informative

      Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...

      --
      Photos.
    5. Re:Laziness Rules by ergo98 · · Score: 3, Interesting

      I'm just going on the statements he made about his own (lack of) knowledge in this video.

    6. Re:Laziness Rules by Ambiguous+Puzuma · · Score: 4, Insightful

      If you want "a little more" than a simple flat file, perhaps SQLite is the answer? The people on the Firefox team seem to think so, for example.

      SQLite has been a pleasure to use for a small personal project involving a few Perl scripts. Granted my background is with SQL Server and Oracle, so perhaps I'm not the target audience, but I found it extremely easy to use and surprisingly efficient--and I didn't need to set up a server or anything. I didn't even need to explicitly create a database!

    7. Re:Laziness Rules by Anonymous Coward · · Score: 5, Funny

      Thanks for validating the OP comments....

    8. Re:Laziness Rules by sl0ppy · · Score: 2, Informative

      first some context. i architect data warehouses for a living. i also live in a world of building fairly specialized frameworks to deal with data warehouses architected as star and snowflake schemas. i tend spend quite a lot of time in pseudo-relational databases that don't fully implement codd's rules.

      for fun, i like to spend some time toying with couchdb, using it for loose data warehousing, extending it, and generally enjoying the application development freedom it gives me.

      that said, let me respond to some of your points:

      Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.

      map/reduce solves a specific problem in data warehousing - column based lookups given specific rules, able to be broken down into atomics and performed in massive parallel. this allows for very cheap horizontal scaling over a large dataset.

      It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project.

      this just shows ignorance. even just a cursory scan of damien's resume says otherwise.

    9. Re:Laziness Rules by diamondsw · · Score: 4, Insightful

      >Damien Katz, CouchDB's creator ... worked on Lotus Notes prior to that...

      That's not exactly a ringing endorsement.

      --
      I don't know what kind of crack I was on, but I suspect it was decaf.
    10. Re:Laziness Rules by Anonymous Coward · · Score: 2, Informative

      Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...

      He started work on CouchDB in 2005. Prior to that he was a Notes grunt of little significance.

      He started at MySQL in 2007.

      The point holds.

    11. Re:Laziness Rules by sl0ppy · · Score: 2, Insightful

      Everything else will require at least a medium rewrite at some point when you switch over to a real database. You could of course extend everything upon a glorified flatfile until your reinvented wheels strangles all your progress.

      not really. i think that you (and, unfortunately, the FA) are missing the point that the map and reduce functionality, while powerful, have one major advantage: scalability. simply put, a query can be, by definition of the map function, broken up into several discrete operations and performed simultaneously on the data.

      while this can be done in Oracle, using RAC, to some extent, the cost and complication is a major barrier to entry. Cache-Fusion, while typically good, can also end up being a liability when the cost based optimizer attempts to split up the query into atomic tasks in order to correctly parallelize the query. for instance, on one application of RAC (multiple multi-core servers, fibrechannel disks, and oracle clustered filesystem), across 100,000,000+ rows, when heavy writes were occurring, it was cheaper computationally to force a full disk scan, using hints, than to rely on Cache-Fusion to figure out what data was stale and what data was fresh. this was discovered after several days spent neck deep in tkprof output.

      conversely, map, by design, already does this.

    12. Re:Laziness Rules by tepples · · Score: 2, Insightful

      If you want ridiculous speed, and actively hate your data, use SQLite.

      Care to explain why SQLite requires one to "actively hate [one's] data"?

  6. Well, it's like... by oldhack · · Score: 3, Funny

    Either is cool with me, as long they are cool and takes care of business, you know what I am saying?

    It's all good.

    --
    Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
  7. a base of data by poot_rootbeer · · Score: 4, Insightful

    "tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'

    If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...

  8. who needs transactions? by alen · · Score: 3, Insightful

    the article is right that in some cases it doesn't matter if a transaction is lost. but in any case where money is involved it's a must. you can't just start a fund from your Oracle or SQL Server savings to pay for mistakes because it will kill your brand and you may lose a lot of future business. and any savings will be eaten up by the extra cost to hire people to solve all the data problems

    i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found

    1. Re:who needs transactions? by Dragonslicer · · Score: 2, Interesting

      Oh, come on. MySQL suffers from the same thing that PHP does; that it's industry standard and easy to use.

      The bigger thing that they both suffer from is having a rather poor history. The problem with people saying how bad they are is that the complaints are based on old versions. PHP5 is much better than PHP4 or PHP3, and MySQL is steadily becoming something resembling a real database (5.0 is good, in particular if you use InnoDB, 4.1 was decent, but anything below 4.1 barely qualifies as a database).

  9. distributed databases and P2P by thanasakis · · Score: 4, Informative

    The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
    The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.

  10. You young whippersnappers don't know nothing! by www.sorehands.com · · Score: 4, Interesting

    Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.

    Network model databases will outperform relational all the time. You just don't have the same flexibility.

    Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.

    I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.

  11. I've never understood the UNIX world's fascination by Richard+Steiner · · Score: 5, Informative

    I've never understood the UNIX world's fascination with relational databases.

    Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.

    We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.

    A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.

    Records not allocated are filled with a deallocated fill pattern.

    A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.

    To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.

    Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.

    We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.

    I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

    --
    Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
    The Theorem Theorem: If If, Then Then.
  12. Harsh? by Bobb+Sledd · · Score: 2, Insightful

    I'm a DB admin, and I use things that aren't toys; but what I've heard here is kinda harsh.

    Look, it's all about "right tool for the right job." Why do you need a nuclear-powered drill that can make a tunnel from here to China, when really all you needed was a shovel?

    For most daily projects that have small amounts of data, they may be using something like Crystal Reports or Excel or SPSS that just does all the number-crunching client-side anyway. You don't always need Oracle or [favorite DB flavor] for that.

    --
    "They said I probly shouldn't fly with just one eye," "I am Bender. Please insert girder."
  13. I feel old by a2wflc · · Score: 2, Informative

    When I saw the title I thought "I'm old-guard". Then I read the article and JOINs are a key concept to the old-guard.

    My first few DB apps involved using a b-tree or ISAM library (or writing our own). Then the "new guys" started wanting to pay for a server that did JOINs. We did JOINs, just at the app layer and without the guaranteed consitency that a good relational design gives you. And getting a server that does it was expensive.

    I wouldn't want to go back to pre-relational server days, but am also very thankful that I did write my own DBs from the ground up. I will probably never need to use the entire experience, but can often use bits and pieces of it, and I appreciate a good key/value store.

    1. Re:I feel old by __aasqbs9791 · · Score: 3, Funny

      I was listening to the radio (didn't pay attention the the station it was on) one day and generally liking the music I was listening to on it. Then the station ID came across between songs. It was the "oldies" station. I suddenly felt like I needed a cane (or perhaps a walker). Why does that happen? And is it going to happen every 10 years or so? I don't think I can take too many more of those moments.

  14. SELECT * FROM SNARKY_COMMENTS by billstewart · · Score: 4, Funny

    Can't quite fit the whole query into the title box, but if you were using one of those databases that Wayner's article talked about, you'd be able to query and find out if you were first...

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  15. Re:MySQL not listed as toy ? by dacut · · Score: 3, Insightful

    MySQL strives to provide RDBMS and ACID semantics, though its quality of service (QoS) may fall short. By contrast, these "slacker" databases don't even try to support RDBMS or ACID; even if they operated perfectly, they won't provide RDBMS/ACID.

    I work for one of the companies in question (no, I don't speak for them). We rely heavily on a combination of these "slacker" dbs, Berkeley dbs, memcached, Oracle, flat files, and tape backups. Each fills a niche. I wish these articles would quit trying to create a false dichotomy.

  16. Berkeley DB is awesome by IGnatius+T+Foobar · · Score: 4, Interesting

    I can't believe there hasn't been any mention of Berkeley DB yet. Guess what, folks: sometimes you just don't need the features of a full relational database. Sometimes all you need is fast, robust, reliable storage of indexed key/value pairs.

    I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).

    Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
    1. Re:Berkeley DB is awesome by Foresto · · Score: 3, Informative

      For others who are interested in Berkeley-style key-value stores, check out Tokyo Cabinet.

  17. Old vs. New Simple DB's by billstewart · · Score: 2, Funny

    Wayner's usually a good writer, and did some good theoretical-computer-science work back in the day, but this article was too short to answer the questions he asks at the beginning, and he mostly highlighted the new shiny things from big ASPs, which is generally what Infoworld wants.

    I'm particularly disappointed that while he referred to the name and history of Berkeley DB, aka Sleepycat, aka Oracle Renamed-foo, he didn't actually talk about using it. (OTOH, Infoworld did review one version of it in 2005.) I no longer have my 4.1BSD manual on the shelf, but it was useful if you wanted something faster than using grep/sed/awk/look on tab-separated text files (which were the canonical Unix database format, and what I normally used for databases.)

    These days if I want a lightweight database, I usually just put build tables in Excel, and then bitch about how it doesn't have a join or even decent text-editing and filtering capabilities, and occasionally have to save it as a CSV file and install vim on Yet Another Work-owned Windows box so I can get some bloody work done. I supposed if Excel did have a join function there'd be fewer people buying MS Access...

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Old vs. New Simple DB's by petermgreen · · Score: 2, Insightful

      the thing that always puzzled me about berkerlydb is it's incessent format breakage requiring dumps and restores.

      On a database server at least data upgrading can be handled centrally but on a file based DB where datafiles can be scattered anywhere a lack of a stable data format seems like a fatal flaw.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  18. Re:I've never understood the UNIX world's fascinat by dcowart · · Score: 4, Insightful

    How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.

    It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.

    Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.

    It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.

    --Donald

    --
    www.rdex.net
  19. Ignore The Rules At Your Peril by Prototerm · · Score: 2, Insightful

    You may have seen in the news recently how in the last decade or so Wall Street ignored some of the hard-won regulations and guidelines developed in the wake of the Great Depression.

    We all know what happened as a result.

    The same is true when dealing with data. You don't ignore the rules completely, or follow them only when you feel like it, or when you have time. As the old joke goes, Quality is *not* Job 1.1.

    If the data isn't important enough to store correctly, then it's not important enough to be stored at all.

    --
    "My country, right or wrong; if right, to be kept right; and if wrong, to be set right." --Senator Carl Schurz (1872)
  20. Just data structures by Thaelon · · Score: 4, Insightful

    Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.

    Knowing the ins & outs of your data structures is still a vital skill of programming.

    --

    Question everything

  21. The problem is scaling by plopez · · Score: 2, Insightful

    so you start a small project, "we just need a few hundred/thousand records, a few key value links and the occasional transaction". so you start with a slacker DB. A slacker DB far too often implies a slacker hack software d00d.

    Then it grows. Instead of educating themselves (Q: what's the difference between those who can't read and those who don't? A: nothing. ) and finding a better DB solution they thrash around trying to hack in DB functions into their code.

    So they lose consistency etc. Soon they have a polluted DB that breaks all the time. Often they are proud of the heroics of the wasted effort they put into it. A good programmer know how to be correct form of lazy: do not reinvent the wheel.

    --
    putting the 'B' in LGBTQ+
  22. Re:I've never understood the UNIX world's fascinat by LWATCDR · · Score: 2, Insightful

    Okay how do you find the data without a record number? I can see the value of the system but it also seems very inflexable.
    I do agree that way to many programmer use MySQL for a file system, flat files, configs, and goodness knows what else.
     

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  23. All toys by zig43 · · Score: 2, Interesting

    Every database covered in the article is a toy.

    From TFA: "The problem is that JOINs are really, really slow when the data is spread out over several machines."

    This is the result of a poor design, not a database flaw. If you are running a web application against multiple databases, either cluster them or store all the data for a user in one database. (i.e. hash the login_id and select the database based on the result). If someone is doing JOINs across multiple machines and doesn't have a very good reason for doing so, then nothing short of a lobotomy is going to help them.

    From TFA: "Each query can only run 5 seconds. The answer can only hold 250 items. Each item can have only 250 pairs."

    Yeah, I'd say that meets the definition of a toy database alright.

    From TFA: "Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too."

    The writer must be joking. Who would do this when there are better options that don't involve implementing your own database?

    From TFA: "there's no big reason to use Ruby, Python, Java, or PHP on the server when it can all be packaged in JavaScript"

    Many people who write web applications actually want to do usefull things with the data they store like generate reports, keep logs, track inventory, or run queries. This doesn't work very well when the "database" is a text file sitting on the user's harddrive.

  24. "Schema-less" storage with MySQL by kc8jhs · · Score: 2, Interesting

    Yeah, when I first read this article I thought that was the dumbest thing I'd ever heard, but reading it made alot of sense. It's basically just using a simple schema like the "slacker" DBs for canonical storage, and then using additional tables as 'indexes.'

    How FriendFeed uses MySQL to store schema-less data

    Given their needs in terms of adding features, altering the schema, and building indexes, being able to make the indexes "eventually consistent" was huge. You have to remember that to keep things nice and denormalized, you need lots of tables, joins, and that MySQL (or any other FOSS RDMS) CANNOT build indexes across tables.

  25. Music from your teenage years gets extra cred by billstewart · · Score: 2, Interesting

    It turns out that there actually _are_ neurological reasons that music from your teenage years is extra-evocative, just as language-learning works better with young kids. Go read "This is Your Brain on Music" for more details.

    A certain amount of music sensitivity appears to be hardwired into our brains, and the extra hormones after puberty increase music-remembering ability and the emotional aspects of it that younger kids don't have as much of. There's also a lot of intellectual development going on in those years, and it's easier to pick up more complex ideas from the music than you could when you were younger.

    As you get older, that still happens a bit, and you'll still run into music that's new and cool which you'll enjoy years later, but now it's competing with lots of other cool music that's in your head which your teenage-years music wasn't.

    What's much more annoying is when you find yourself tuning by a different radio station and wondering "What is all this noise those kids are listening to? They should turn that crap down and listen to good stuff" just like your parents said when you were a kid. Some of that's because 90% of everything is crap, and it's not the crap that you find evocative because it was around when you were a kid, and some of it's because 90% of everything on the radio is highly-packaged commercial crap, making it 99% crap instead of only 90%. And some of that's because kids always want to listen to new stuff and piss off their parents, and musicians always like to do new stuff, and if you want to bust into the Top 40 you've either got to do identical commercial crap better than anybody who's already there or else do something new. Rap was creative and interesting, but the whole gangstas-dissing-women motifs that dominated it were offensive. Hip-hop took that music and started doing lots of interesting things with it, though I haven't followed it. I'm finding my self playing a lot of old-timey (average hair color in our jam session == gray, leaning toward white :-), and starting to listen to jazz more (lots of deep classical stuff in there, which I haven't had the patience to listen to for a while.)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  26. They're a niche, not a full replacement by PostPhil · · Score: 2, Interesting

    I get tired of hearing the same old discussion about whether or not the relational database is going to die. They're not. But the new breed of *specialized* databases work well for their *specialized* purposes. Big surprise. But all of them inevitably make a trade-off. Anyone who works seriously with database design knows that it's all about trade-offs.

    One of the main motivations for the new breed of databases is that the standard SQL database relies on things such as foreign keys and other constraints for data consistency, but that requires the data to be directly managed by that running DBMS process. When you require data to be distributed over a network (i.e. over many separate processes), then the only way a *foreign key* can work is if the DBMS process has some sort of link over the network to the separate DBMS process and then use that somewhat as if it were local. (Other strategies involve using external application code for consistency rather than foreign keys, etc.) Of course, the DBMS process can't use it's usual local low-level optimizations behind-the-scenes in order to handle that query efficiently over the network, so it doesn't scale. Specialized DBMS's for distributed data focus on optimizing being distributed, while the typical SQL DBMS optimizes storage and retrieval of data as if it were local. The bottom line is that the traditional SQL database scales well vertically, but not horizontally concerning hardware. Or rather, when you scale horizontally, you forgo a lot of its advantages. The new breed of databases trade-off consistency and other assurances for the sake of "good enough" consistency and really fast retrieval of domain-specific data.

    But not everyone is trying to be Google or Amazon. Financial institutions such as banks can't tolerate "good enough" consistency. The biggest problem with relational databases I see nowadays is that people are ignorant about why "relational" is such a good idea, and how SQL only gets you part of the way to "relational" and that SQL's shortcomings are a different issue. The second biggest problem is that most people are used to only one or two data usage patterns, and if it "works for them", then they assume it should *always* be done that way. For example, the hordes of people who barely know Excel (i.e. not a relational database) or Access, and then like to give "expert" advice. Or a web programmer that believes that ORM's are the One True Way because they abstract away choices of DBMS in order to keep favorite language X, despite the needs of other people are the opposite: perhaps we want to abstract away the choice of programming language so that we can keep the same database, and so maybe it's a good idea if the database itself can ensure data consistency rather than relying on the ORM, etc.

  27. Pointless by EvilIntelligence · · Score: 2, Insightful

    As a DB admin myself, I find these "Us vs Them" arguments to be ultimately pointless. A company will choose a database based on the application's needs. If "immediate consistency" is needed they will choose a standard relational database. If "eventual consistency" is acceptable, the company may opt for one of the other "not-so-relational" databases. The fact that there are other options is actually a good thing. The "old guard" needs to find the positives and embrace change, or run the risk of being left behind in an evolving world of technology.