Slashdot Mirror


Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead."

13 of 271 comments (clear)

  1. Reddit by Gudeldar · · Score: 3, Informative

    Reddit also recently switched to Cassandra.

  2. Re:Which DB is better? by h4rr4r · · Score: 5, Informative

    Postgres, for people who care about their data.

  3. Re:Facebook, Twitter and now Digg by h4rr4r · · Score: 3, Informative

    Fits, before that mysql was the best way to store data no one cared about.

  4. Re:Which DB is better? by QuoteMstr · · Score: 3, Informative

    The page you cited, on column-oriented databases, describes an implementation strategy that's applicable to many types of databases. There are database engines that present a perfectly normal SQL interface to a column store, and there's actually a direct link to LucidDB from the article. Likewise, there's nothing stopping a Cassandra-like database from serializing its on-disk bits the other way around.

    Column-orientation has nothing to do with the "NoSQL" databases that are in vogue. It's completely orthogonal. You're talking about using vectors or linked lists when everyone else is arguing over whether to serialize data with XML or JSON.

  5. Re:Good for them by QuoteMstr · · Score: 3, Informative

    But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?

    The relational model is consistent and easy to work with. It's easy to specify constraints that describe what the data should look like, and to allow several applications to interact with the data. It's also easier to optimize a database when you can describe discrete queries instead of directly following links from program code as you would in a navigational/object/document/etc. database.

    Furthermore, application data models aren't all that object-oriented. Most of the time, the manipulated data types (say, "story", "post", and "user") fall into well-defined categories that correspond well to rows in a table. The few mismatches are easily dealt with in application code.

    Sure, using an object database might be "easier" for the first 15 minutes, but you'll kick yourself when you have to manipulate it in any kind of sophisticated fashion.

  6. Re:Which DB is better? by RelliK · · Score: 5, Informative

    Go with PostgreSQL. Reliable, standards-compliant, fast.

    --
    ___
    If you think big enough, you'll never have to do it.
  7. Re:Reddit's reliability has been shitty lately. by Neoncow · · Score: 3, Informative

    The reddit blog discussed the issue recently.

    They claim it is not an EC2 issue, but simply the site getting bigger than it was designed to.

    Their lastest entry discusses why they switched to cassandra. I guess we'll wait for next week to see if the expected performance benefits materialize.

  8. Re:so does it use sql or not? by Anonymous Coward · · Score: 3, Informative

    i can't tell from the 4 lines of text buried in ads that is this supposed article, but i'm guessing this "nosql" still uses an sql database backend?

    and why wouldn't a relational database system not be perfect for facebook?

    1) NoSQL databases are just that NO SQL, there is no relational database involved.

    2) No relational models are not good for Facebook style data, Facebook uses a lot of trees, networks and graphs, none of which are easy to store in a relational system, Facebook also has a lot of dynamic schema requirements, again SQL does not cope with this well, and at the scale that Facebook operates at they are forced to use techniques like sharding and partitioning of their data sets, at which point a lot of what makes the relational model useful becomes difficult to use, i.e. joins across databases servers are really hard to do etc.

  9. Re:Which DB is better? by Billly+Gates · · Score: 3, Informative

    PostgreSQL is a real relational database that support views, nested sql, triggers, foreign keys, and even statistical analysis.

    I think Mysql supports foreign keys now and my info might be dated. But if a database does not support foreign keys then its not a real relational database and mysql had that problem for years.

    Once switching over you can find out how hard processor intensive tasks that took minutes can be done easily in seconds with the features I described above with PostgreSQL. You can save alot of speed with complex queries with PostgreSQL.

  10. Re:Which DB is better? by alexkorban · · Score: 4, Informative

    I have worked with large PostgreSQL databases (150GB or so) and really, Postgres isn't a solution. You run into issues anyway when some of your tables contain millions or even billions of rows. At that stage things like vacuuming or altering the schema start to become damn near impossible, and even querying starts to become a bottleneck.

    Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.

    Based on my experience, I don't think that relational databases are appropriate for really large databases, and at present the only realistic option is horizontal scaling which is a lot easier with things like Cassandra or MongoDB.

    --
    Free posters and articles for business analysts and project managers
  11. Re:Allergic reaction to MySQL by ducomputergeek · · Score: 3, Informative

    When you're dealing with TB/PB range, you call Teradata. At last check they handle 4 of the 5 largest databases in the world, including eBay/Paypal's 13PB's monster and Walmart.

    --
    "The problem with socialism is eventually you run out of other people's money" - Thatcher.
  12. It's "Not Only SQL" by Otis_INF · · Score: 3, Informative

    The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL

    --
    Never underestimate the relief of true separation of Religion and State.
  13. Re:Allergic reaction to MySQL by jbellis · · Score: 4, Informative

    Teradata and the other big relational db products (vertical, greenplum, etc) are all _analytical_ databases, designed for small amounts of complex queries, where adding new data to the system takes minutes if not hours. They are completely unsuitable for running a live application against.