Slashdot Mirror


Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead."

34 of 271 comments (clear)

  1. Facebook, Twitter and now Digg by clarkkent09 · · Score: 5, Funny

    In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.

    --
    Negative moral value of force outweighs the positive value of good intentions.
    1. Re:Facebook, Twitter and now Digg by h4rr4r · · Score: 3, Informative

      Fits, before that mysql was the best way to store data no one cared about.

    2. Re:Facebook, Twitter and now Digg by seanadams.com · · Score: 4, Insightful

      Are you seriously arguing that unless the first derivative of one's salary is positive, there's no incentive to work?

      No, I did not say that one's salary needs to be monotonically increasing. That is not the point at all. And did you really have to turn this into a calculus problem?

      To state it differently, many entrepreneurs are willing to work temporarily for little or even nothing, and to make great sacrifices such as giving up health benefits, vacations, and normal family/social life... things most 9-5 workers would never consider. Being someone's bitch for $1M/yr (or to be pedantic let's say $1M/yr + 5%/yr^2) may sound like a splendid deal to you but there are others who would work much harder for sweat equity in their own venture.

      These people exist even if you can't fathom it. I'm one of them.

  2. Reddit by Gudeldar · · Score: 3, Informative

    Reddit also recently switched to Cassandra.

    1. Re:Reddit by h4rr4r · · Score: 5, Funny

      I was not aware metallurgy was popular amongst the youth.

  3. Away from LAMP? by Anonymous Coward · · Score: 3, Insightful

    Or away from MySQL? There is a difference.

  4. New acronym in order? by mgkimsal2 · · Score: 5, Funny

    From the Digg blog - http://about.digg.com/node/564

    "And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP."

    Cassandra Linux Apache PHP?

    1. Re:New acronym in order? by Anonymous Coward · · Score: 3, Funny

      Trust me, you don't want the clap!!!!

    2. Re:New acronym in order? by Tablizer · · Score: 3, Funny

      [...moving away from LAMP] Cassandra Linux Apache PHP?"

      try: Cassandra Ruby Apache PHP
         

  5. Re:Good for them by Bill,+Shooter+of+Bul · · Score: 3, Insightful

    100% of hosting companies do not have twitter, facebook, reddit, or digg as their clients. Its a different market. Mysql does have a competitor in this space called PostgreSQL. Its pretty good. Pretty much every hosting company I would consider doing business with also offers it. But again, PostgreSQL wouldn't have saved the day for these companies, they've reached a different sector of the market due to their enormous scale.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  6. Re:Which DB is better? by Anonymous Coward · · Score: 3, Insightful

    If you need a comparison chart... you don't need to switch.

    It's probably not necessary to change such a huge part of your architecture if it's not worth investing serious time investigating and benchmarking the alternatives.

  7. Re:Which DB is better? by h4rr4r · · Score: 5, Informative

    Postgres, for people who care about their data.

  8. Allergic reaction to MySQL by QuoteMstr · · Score: 5, Insightful

    These slides present a balanced and comprehensive overview of the current state of free databases. Whether you're in the NoSQL camp or not, they're worth reading.

    That said, here's my take:

    It's currently fashionable to replace MySQL with some "NoSQL" database or other. This trend is driven by two factors:

    • MySQL's community is fragmenting into several forks as Oracle purchases the rights, which created the impression that MySQL's development is entering a riskier, unstable period.
    • "NoSQL" is the technology buzzword du jour in the Bay Area. It's difficult to overstate the impact of social forces on technology choice: most technology selections are governed more by what our friends say than by an impartial and disinterested weighing of merits.

    I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL. There's a world of difference between it and MySQL, and condemning all relational database systems because of bad experiences with MySQL is like condemning all sandwiches because McDonalds once made you sick. In giving up RDBMSes entirely, these developers lose quite a bit of safety, flexibility, an convenience. It's a huge over-reaction.

    This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use: it should be about choosing the best tool for the job. And I believe that in the vast majority of cases, the advantages conferred by a relational system --- enforced integrity, interoperability based on SQL, query flexibility, storage flexibility --- make an RDBMs the best choice for almost any job. If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.

    1. Re:Allergic reaction to MySQL by TubeSteak · · Score: 3, Interesting

      I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
      ...
      If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.

      When you're dealing with TB/PB of data that doesn't require relational capabilities, there's no reason to use a "good relational database like PostgreSQL" when you can dispense altogether with the relational aspect and its performance hit.

      NoSQL may seem like the fad-de-jure, but until recently, nobody was working with such enormous dynamic datasets. When you look at the growth of all these hi-tech companies, they did an incredible amount of in-house hacking to develop the software necessary to glue together their enormous hardware infrastructure.

      --
      [Fuck Beta]
      o0t!
    2. Re:Allergic reaction to MySQL by kmike · · Score: 3, Insightful

      As several MySQL experts already noted, Digg isn't even using the indexes that provide maximum performance in the query that they present as problematic for MySQL:
      http://mysqlha.blogspot.com/2010/03/index-only.html
      http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/

      So you are right about the NoSQL fashion trend. Looks like for some companies it's easier to throw a pile of cheap commodity hardware driven by some NoSQL BigTable-wannabie at the problem instead of carefully optimizing queries and indexes for the best performance.

    3. Re:Allergic reaction to MySQL by jrumney · · Score: 5, Insightful

      I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.

      The adopters of NoSQL deal with huge volumes of worthless information. They don't care about transactional integrity as much as they care about performance, which is why they chose MySQL over a good relational database in the first place.

    4. Re:Allergic reaction to MySQL by ducomputergeek · · Score: 3, Informative

      When you're dealing with TB/PB range, you call Teradata. At last check they handle 4 of the 5 largest databases in the world, including eBay/Paypal's 13PB's monster and Walmart.

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    5. Re:Allergic reaction to MySQL by jbellis · · Score: 4, Informative

      Teradata and the other big relational db products (vertical, greenplum, etc) are all _analytical_ databases, designed for small amounts of complex queries, where adding new data to the system takes minutes if not hours. They are completely unsuitable for running a live application against.

  9. Re:Wow... by Anrego · · Score: 3, Interesting

    Don't be too quick to put Java down.. it's slower but it scales fairly well.

  10. Re:Which DB is better? by QuoteMstr · · Score: 3, Informative

    The page you cited, on column-oriented databases, describes an implementation strategy that's applicable to many types of databases. There are database engines that present a perfectly normal SQL interface to a column store, and there's actually a direct link to LucidDB from the article. Likewise, there's nothing stopping a Cassandra-like database from serializing its on-disk bits the other way around.

    Column-orientation has nothing to do with the "NoSQL" databases that are in vogue. It's completely orthogonal. You're talking about using vectors or linked lists when everyone else is arguing over whether to serialize data with XML or JSON.

  11. Re:Good for them by QuoteMstr · · Score: 3, Informative

    But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?

    The relational model is consistent and easy to work with. It's easy to specify constraints that describe what the data should look like, and to allow several applications to interact with the data. It's also easier to optimize a database when you can describe discrete queries instead of directly following links from program code as you would in a navigational/object/document/etc. database.

    Furthermore, application data models aren't all that object-oriented. Most of the time, the manipulated data types (say, "story", "post", and "user") fall into well-defined categories that correspond well to rows in a table. The few mismatches are easily dealt with in application code.

    Sure, using an object database might be "easier" for the first 15 minutes, but you'll kick yourself when you have to manipulate it in any kind of sophisticated fashion.

  12. Re:Which DB is better? by RelliK · · Score: 5, Informative

    Go with PostgreSQL. Reliable, standards-compliant, fast.

    --
    ___
    If you think big enough, you'll never have to do it.
  13. "NoSQL"? by Stan+Vassilev · · Score: 5, Insightful

    Am I the only one who frowns at this moniker?

    First, it creates a false premise where people need to pick "SQL" versus "no SQL", while many real-world systems intelligently combine relational and non-relational data storage for their needs. There is no conflict.

    Second, there's nothing wrong with SQL as a language in particular, and in fact many of the "noSQL" engines are starting to support and extending basic SQL queries, instead of reinventing their own query language for the same purpose.

    I suppose "lessRDBMSabuse" was less catchy...

  14. Re:Wow... by QuoteMstr · · Score: 3, Insightful

    Bullshit. Languages don't scale: programs do.

    Writing a program in Java makes is scalable in the same way that painting a car red makes it fast. The JVM is quite good these days, but don't make up advantages that don't exist.

  15. Re:Which DB is better? by QuoteMstr · · Score: 3, Insightful

    First of all, if he's asking Slashdot for advice (which is barely a step above reading tea leaves [which itself is a step above asking 4chan]), he doesn't need Facebook-level scalability.

    Second, you're confusing scalability and performance. Scalable solutions tend to actually be slower than non-scalable ones: the difference is that a scalable system increases in capacity linearly with the number of machines you throw at it ("horizontal" scalability), whereas a fast non-scalable system generally needs the same number of faster, individual machines to increase capacity ("vertical" scaling).

    Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally.

  16. Re:Reddit's reliability has been shitty lately. by uncqual · · Score: 3, Interesting

    One aspect of the "cloud" (as in EC2) is that you can not only scale up easily (for $ of course), you can scale down easily (to save $).

    When you have fixed "in house" infrastructure to handle peak loads, there's not a lot of motivation to power off absolutely as many servers as you possibly can when you're not at peak load - all you save is the energy costs (and, if you're using remote hosting, you don't get rewarded for this except for whatever value you attach to feeling "green"). You still pay for the floor space, the machines, and perhaps some sort of maintenance contracts regardless of if the server is powered up or down.

    Using EC2 (depending on how you've structured it - some dedicated, some non-dedicated instances etc), if utilization drops to 80% over 20 instances, the temptation is to release a couple instances to save a couple bucks and drive utilization up to 90% on the remaining instances -- with potentially unfortunate consequences.

    Although I have no idea, I wonder if Reddit is just releasing instances too aggressively now "because they can" in order to save money? If so, the fingrer should be pointed at Reddit, not the cloud (or EC2 specifically).

    --
    Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
  17. Re:Reddit's reliability has been shitty lately. by Neoncow · · Score: 3, Informative

    The reddit blog discussed the issue recently.

    They claim it is not an EC2 issue, but simply the site getting bigger than it was designed to.

    Their lastest entry discusses why they switched to cassandra. I guess we'll wait for next week to see if the expected performance benefits materialize.

  18. Re:Good for them by QuoteMstr · · Score: 3, Interesting

    Thanks for the comprehensive reply.

    How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code.

    ORMs are syntactic sugar for the underlying database operations. It's possible to bypass them when you need SQL's full power and access the same data store.

    I for one do not want to use an API with Address1 - Address5 string properties.

    So create a table of addresses and use foreign keys to connect them to whatever other table you'd like. Since when does a relational structure require a garbage schema like your example. But surely you know all that.

    Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language

    But doesn't that then preclude accessing the same data set from programs written in other languages? The beauty of SQL is that it's language-agnostic.

    You also make several points relating to toolchains and testing: sure, some databases have better tools than others. But we're talking about differences between models, not differences between particular tools.

  19. Re:so does it use sql or not? by Anonymous Coward · · Score: 3, Informative

    i can't tell from the 4 lines of text buried in ads that is this supposed article, but i'm guessing this "nosql" still uses an sql database backend?

    and why wouldn't a relational database system not be perfect for facebook?

    1) NoSQL databases are just that NO SQL, there is no relational database involved.

    2) No relational models are not good for Facebook style data, Facebook uses a lot of trees, networks and graphs, none of which are easy to store in a relational system, Facebook also has a lot of dynamic schema requirements, again SQL does not cope with this well, and at the scale that Facebook operates at they are forced to use techniques like sharding and partitioning of their data sets, at which point a lot of what makes the relational model useful becomes difficult to use, i.e. joins across databases servers are really hard to do etc.

  20. Re:Which DB is better? by Billly+Gates · · Score: 3, Informative

    PostgreSQL is a real relational database that support views, nested sql, triggers, foreign keys, and even statistical analysis.

    I think Mysql supports foreign keys now and my info might be dated. But if a database does not support foreign keys then its not a real relational database and mysql had that problem for years.

    Once switching over you can find out how hard processor intensive tasks that took minutes can be done easily in seconds with the features I described above with PostgreSQL. You can save alot of speed with complex queries with PostgreSQL.

  21. Re:Wow... by Billly+Gates · · Score: 3, Insightful

    Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.

    I suppose you could write your own threading and processes code but most Java developers just use whats built into the api.

  22. Re:Which DB is better? by alexkorban · · Score: 4, Informative

    I have worked with large PostgreSQL databases (150GB or so) and really, Postgres isn't a solution. You run into issues anyway when some of your tables contain millions or even billions of rows. At that stage things like vacuuming or altering the schema start to become damn near impossible, and even querying starts to become a bottleneck.

    Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.

    Based on my experience, I don't think that relational databases are appropriate for really large databases, and at present the only realistic option is horizontal scaling which is a lot easier with things like Cassandra or MongoDB.

    --
    Free posters and articles for business analysts and project managers
  23. It's "Not Only SQL" by Otis_INF · · Score: 3, Informative

    The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL

    --
    Never underestimate the relief of true separation of Religion and State.
  24. Re:The Monty crowd will blame this on Oracle by PietjeJantje · · Score: 5, Insightful

    You have to understand the slashdot memes. These are constructed around the state of technology over a decade ago. So, PHP is always bad, Javascript and Ajax are always bad, and when someone mentions MySQL, the karma whores come out to bash it and mention PostgreSQL. They don't need an argument, the authors and upvoters are operating in old-man auto-bot mode. Like I said, it typically involves notions which were fixed years ago if they did exist to begin with. These are elitist-wannabees, using simple rules of engagement, to show you how smart they are. Similar to grammar nazi. It is actually a quite lower-class thing to do. As Hannibal Lecter would say, you have to wonder if they still hear the lambs screaming.