Slashdot Mirror


Moving From CouchDB To MySQL

itwbennett writes "Sauce Labs had outgrown CouchDB and too much unplanned downtime made them switch to MySQL. With 20-20 hindsight they wrote about their CouchDB experience. But Sauce certainly isn't the first organization to switch databases. Back in 2009, Till Klampaeckel wrote a series of blog posts about moving in the opposite direction — from MySQL to CouchDB. Klampaeckel said the decision was about 'using the right tool for the job.' But the real story may be that programmers are never satisfied with the tool they have." Of course, then they say things like: "We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails' ActiveRecord::Store. It’s not super well integrated with MySQL's feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs."

26 of 283 comments (clear)

  1. Not getting RDMS by Anonymous Coward · · Score: 5, Insightful

    And in another three years they will switch to whatever is the coolest up-and-coming storage solution. Incompetent developers will always be incompetent developers.

    1. Re:Not getting RDMS by gbjbaanb · · Score: 5, Insightful

      true, just reading their blog

      Things like SQL injection attacks simply should not exist.

      HTTP API. Being able to query the DB from anything that could speak HTTP (or run curl) was handy.

      so sql injection is real bad, bad design of SQL... yet allowing any old HTTP javascript queries is somehow ok. Yes, incompetent developers indeed.

      They also say

      Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?

      apart from the concepts of query caches - and stored procedures - so what if the language is related to COBOL, javascript is closely related to C which is almost as old. And that has plenty of relations to Algol which is even older.

      So yes, it sounds like they havn't really got a clue. Great advert for their business!

    2. Re:Not getting RDMS by gorzek · · Score: 4, Insightful

      I think the main problem is application developers not understanding anything about database theory. The vast majority of databases I encounter are not normalized at all, and it's almost always because they were designed by a developer with no database background.

      Granted, I didn't come into this field with that background, either, but I made a point to learn it, and now I'm very cognizant of implementing sound database designs. This whole idea of throwing random strings of structured text into a database column, and then relying entirely on the program code to parse and use it... well, why the hell even use a relational database, then?

      Relational databases aren't suitable for every application, nor are "bigtable" and other NoSQL implementations. The problem is that developers use a particular kind of database without really understanding how to use it properly. If they can get data in, and get data out, that's basically all they care about. Never mind if they make it a maintenance nightmare in the process.

    3. Re:Not getting RDMS by Xest · · Score: 5, Insightful

      "Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?"

      I couldn't agree with you more, this quote makes me want to vomit. Is this really how low the average competence of today's web developer has stooped? Between PHP developers not getting why PHP is a pretty shitly designed and developed language and stuff like this, I barely get how the web even runs anymore.

      To answer the original quote, the reason we're "still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?" is because SQL is a language based on mathematically sound principles, and which is supported widely, and known widely, and is processed by database engines across the globe that have literally decades of stability behind them, data in them and so forth.

      There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL. The fact the kiddie (I can only assume he's a kiddie due to his blatant lack of knowledge and/or experience in the field) who wrote that blog post doesn't get this suggests he should absolutely not be trusted with your data as he'll only lose it.

      This is a classic example of someone bitching about something not because it's bad, but because they simply don't understand it and believe that rather than learn about it properly, it's better to bitch and hope you can somehow effect change by bitching.

      The advantage of most SQL/RDBMS is that they do adhere to the ACID principles, and for people who want to be able to have some degree of trust in their data source that's pretty fucking important. It's no surprise that they've moved over to MySQL though as it's one of the few RDBMS that is completely shit at adhering to the ACID principles and keeping uptodate with solid, stable implementations of modern database functionality.

    4. Re:Not getting RDMS by SQLGuru · · Score: 4, Insightful

      I completely agree. A lot of non-DB centric people think that they can do more in the app tier, effectively using their databases as glorified file stores. Why even have a database server in those instances? I'm not saying that everything should be done in the database, either, but take advantage of every tool you have.

      NoSQL has a place, so does relational. Learn their strengths and determine which is the best fit for your project. Then, learn how to use the tool to its fullest.

    5. Re:Not getting RDMS by serviscope_minor · · Score: 4, Insightful

      so sql injection is real bad, bad design of SQL...

      SQL injection actually has nothing to do with SQL.

      Exactly the same attacks happen in any system where you build up a string from user data and pass it off to an interpreter. SQL has nothing to do with it.

      Exactly the same thing used to happen with sudo shell scripts.

      Exactly the same thing happened with javascript injection in very early webmail systems.

      There are plenty of opportunities for code injection on poorly written PHP, too.

      --
      SJW n. One who posts facts.
    6. Re:Not getting RDMS by K.+S.+Kyosuke · · Score: 4, Interesting

      There's absolutely no reason to change SQL, because if you build a new query language that is based on the same mathematically sound principles of relational algebra then it will er... look just like SQL.

      False. First of all, SQL is NOT based on mathematically sound principles of relational algebra. SQL took the mathematically sound principles of relational algebra and fucked them up. There should be no NULLs, there should be no natural ordering of "columns", there should be no possibility of having duplicate rows, there should be no possibility of inconsistent intermediate states in transactions (no deferred checking) etc. SQL has them all, and then some. Why? Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".

      Second, there is a separation between the surface structures of a language and its foundations. I really don't think that a language based on relational algebra has to look like SQL. That's like saying that a language with nouns having singular and plural and verbs having tenses has to look like English. Nope, it doesn't have to at all. Just look and VB.NET and C#. Basically two front-ends to a virtually identical language semantics, only one of them does not avoid non-alphabetic structural delimiters like the plague (and is so much more pleasant for it).

      --
      Ezekiel 23:20
    7. Re:Not getting RDMS by TheRealMindChild · · Score: 5, Interesting

      There should be no NULLs
      Then how do I, say, indicate the date of death for someone who hasn't died? An IsDead field? Really? (Yes, a NULL in a field is a shortcut for proper relationship, but a lack of relationship when using a linking table will still be represented by NULL)

      there should be no natural ordering of "columns"
      Does it really matter? The natural ordering of columns is the order in which you added them to the table. Ignore it. It isn't important, and not in need of a "solution"

      there should be no possibility of having duplicate rows
      Firstly, get to know your DISTINCT SQL keyword. Secondly, data in real life sometimes IS duplicate. What the hell should people do? Have a DuplicatedThisManyTimes field? Ugh.

      possibility of inconsistent intermediate states in transactions
      That is a property of the database engine, not SQL.

      Because SQL simply ignores the relation model and "does what IBM and Oracle always did". That's not the same thing as "implementing the relational model".
      Where do you get this shit? Are you telling me the function of foreign key constraints and referential integrity, and the good ol INNER/RIGHT/LEFT join keywords are just smoke and mirrors and everything is really just a chaotic bowl of soup? References please.

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    8. Re:Not getting RDMS by Anonymous Coward · · Score: 4, Insightful

      That's not how debate works. If you can't take a position and defend it against questioning, without resorting to "go away and learn more", then you have no position and shouldn't have posted in the first place.

    9. Re:Not getting RDMS by Joey+Vegetables · · Score: 4, Informative

      From a purely pragmatic point of view, it may not seem unreasonable to model it that way. But you should be aware that you are trading one form of complexity for another, probably bigger one. For instance, now, if you want to know who was alive on some specific date, you have to write something like "WHERE DateOfDeath IS NULL OR DateOfDeath > @date." You also will not know for certain whether a NULL means "person is still alive" versus "person is dead but we do not know his or her date of death." When you try to compare different people's death dates any comparison to NULL will yield NULL and you will need special case logic in every such comparison. You will need tristate logic throughout any part of your application that does logical tests based on the date of death. Nullable values will sometimes require special treatment in your code, depending on the language (e.g., whether date/time values are considered to be nullable in that language). I could go on. I also could build you both tables, an updateable view, and a set of SPs to do your basic CRUD stuff on both tables plus "show me living people" and "show me dead people", in a LOT less time than it would take to handle all the code problems that would result from breaking 1NF. I am not an extremist on this subject, but I wear both DBA and developer hats, and when I'm acting as a DBA or in any other situation where I have control over the DB, I do try to get into 3NF, and then denormalize only if there are demonstrated reasons to do so. As a developer, I will sometimes take shortcuts if it's genuinely necessary, but, more often than not, I end up regretting them.

  2. Re:The decision the simple by Sarten-X · · Score: 4, Interesting

    That's actually a rather insightful point...

    If your application fits well with the methodologies of a traditional RDBMS, use a traditional RDBMS, and hire people who are trained and experienced in using those methodologies to their full potential.

    If you're dealing with the latest Big Data paradigms and designs, where you can sacrifice some of the rigidity of a RDBMS to gain some flexibility and cheaper scalability, use a NoSQL database, and hire people who aren't stuck in their old RDBMS ways.

    --
    You do not have a moral or legal right to do absolutely anything you want.
  3. Why not PostgreSQL? by JamesA · · Score: 5, Interesting
    1. Re:Why not PostgreSQL? by squiggleslash · · Score: 5, Funny

      Because it's an urban myth.

      The reality is there are only two SQL databases in the entire universe: MySQL and Oracle. You might have been told others exist, hell, you might even have worked on something called "SQL Server" in your .NET shop, but in reality: they don't. They're all figments on your imagination. Your imagination is SO determined to find better, more robust, faster, powerful, alternatives to MySQL and Oracle that an entire fantasy world comprised of "a successor to Ingres that makes MySQL look like a piece of crap" and "A Microsoft product that doesn't feel like a thirty year old mainframe product hacked onto a modern platform" develops in your head.

      C'mon, if these mythical products actually existed, sites like Slashdot wouldn't ignore them, right? Right?

      --
      You are not alone. This is not normal. None of this is normal.
  4. Nosql in Postgres by rla3rd · · Score: 4, Interesting

    You can get json support using the PLV8 extension http://code.google.com/p/plv8js/wiki/PLV8

    or altenatively you can use the hstore data type.

  5. Re:Has to be said by Anonymous Coward · · Score: 5, Funny

    MongoDB is Webscale. MySQL is not Webscale, because it uses joins. SQL also has impetus mismatch.

  6. Re:The decision the simple by Anonymous Coward · · Score: 5, Insightful

    That's actually a rather insightful point...

    If your application fits well with the methodologies of a traditional RDBMS, use a traditional RDBMS, and hire people who are trained and experienced in using those methodologies to their full potential.

    If you're dealing with the latest Big Data paradigms and designs, where you can sacrifice some of the rigidity of a RDBMS to gain some flexibility and cheaper scalability, use a NoSQL database, and hire people who aren't stuck in their old RDBMS ways.

    The real key is for the person doing the hiring to understand which of those of methodologies fits their application.

  7. Re:Wikipedia and Slashdot use MySQL by SuiteSisterMary · · Score: 5, Informative
    --
    Vintage computer games and RPG books available. Email me if you're interested.
  8. Re:The decision the simple by TheSpoom · · Score: 4, Informative

    The real key is for the person doing the hiring to understand which of those of methodologies fits their application.

    This is insighful. I've worked extensively with RDBMS solutions and now quite a bit with NoSQL technologies. They each have their place. An entire article could be written on where each fits most naturally, but in general if you don't need to join between tables, need to throw data to your store at a high velocity (e.g. logging), and/or need a loose schema, a NoSQL solution works best. If what you're doing can be naturally modeled (i.e. users HAVE AND BELONG TO stations, stations HAVE MANY playlists, etc. etc.), use an RDBMS.

    One can see in the subtext of the GP that they may not get this, with their comment that people using RDBMS solutions are "stuck in old ways". It seems like they are saying that NoSQL is effectively always best. I'm curious why they think that. Nail, hammer, etc...

    --
    It's better to vote for what you want and not get it than to vote for what you don't want and get it.
    - E. Debs
  9. Then what's it called instead of a join? by tepples · · Score: 5, Insightful

    In NoSQL systems such as MongoDB and CouchDB, what do you call the operation where you retrieve one document, pull an identifier out of that document, and use that identifier as the key to retrieve another document?

    1. Re:Then what's it called instead of a join? by Anonymous Coward · · Score: 5, Funny

      Witchcraft.

  10. Re:Is a DB even needed sometimes? by rtaylor · · Score: 4, Informative

    A CSV or XML or JSON file is a db (a DB is just structured data).

    Are relational DBs always required? Certainly not.

    The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.

    What isn't obvious is that second application is often aggregate reporting for management. "How many customers are using $foo and where do they live geographically". With a relational DB, I might knock that query out in a few minutes across millions of customers.

    With a flat XML file per customer spread across a number of servers, this could take days to assemble, particularly if $foo is nested deep in the structure.

    Having spent far too much time writing one-off scripts to gather customer data because the middleware didn't support that type of query, I've actually gone the other way and started shoving some business logic into the DB.

    Functions such as isCustomerPaymentOverdue are now in the relational DB with a very thin model in the middleware to allow for much easier and faster reporting.

    --
    Rod Taylor
  11. Re:Wikipedia and Slashdot use MySQL by Hognoxious · · Score: 4, Funny

    MongoDB can write its data to /dev/nul/ for extra performance.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  12. Not quite true by Viol8 · · Score: 4, Informative

    If all your application is ever going to do is read and write to fixed sized record structured data with little relational (or any) attributes then COBOL will suit you fine as that's what it was designed for. Unfortunatly those sorts of apps are few and far between these days, but in its ever decreasing niche COBOL is still good.

  13. Re:Normalisation isn't a panacea by gorzek · · Score: 4, Insightful

    Yeah, it really depends on what you are doing. But any time you break normalization there should be a good reason. Performance is certainly a valid reason. "I'm too lazy to make a well-designed database," however, is not.

    If you find yourself breaking normalization all the time, then you've probably found a use case where a relational database isn't the best tool for the job.

    While there is a "right" way to use a given tool, there is no one tool that is right for every situation. People who get this backwards are zealots and will often make poor decisions.

  14. Re:Is a DB even needed sometimes? by serviscope_minor · · Score: 4, Insightful

    The big benefit to a relational DB with lots of enforcement at the data layer is that you can have one or more applications reading/writing to it with minimal concern of data corruption.

    Not just that, but good use of relations and normalization makes whole classes of bug impossible.

    --
    SJW n. One who posts facts.
  15. COBOL is cool! by NotesSensei · · Score: 4, Funny

    In what other language would this statement compile without error:

    PERFORM makemoney UNTIL rich.

    (Note the the full stop at the end)