Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL
donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead."
Don't be too quick to put Java down.. it's slower but it scales fairly well.
One aspect of the "cloud" (as in EC2) is that you can not only scale up easily (for $ of course), you can scale down easily (to save $).
When you have fixed "in house" infrastructure to handle peak loads, there's not a lot of motivation to power off absolutely as many servers as you possibly can when you're not at peak load - all you save is the energy costs (and, if you're using remote hosting, you don't get rewarded for this except for whatever value you attach to feeling "green"). You still pay for the floor space, the machines, and perhaps some sort of maintenance contracts regardless of if the server is powered up or down.
Using EC2 (depending on how you've structured it - some dedicated, some non-dedicated instances etc), if utilization drops to 80% over 20 instances, the temptation is to release a couple instances to save a couple bucks and drive utilization up to 90% on the remaining instances -- with potentially unfortunate consequences.
Although I have no idea, I wonder if Reddit is just releasing instances too aggressively now "because they can" in order to save money? If so, the fingrer should be pointed at Reddit, not the cloud (or EC2 specifically).
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading
Thanks for the comprehensive reply.
ORMs are syntactic sugar for the underlying database operations. It's possible to bypass them when you need SQL's full power and access the same data store.
So create a table of addresses and use foreign keys to connect them to whatever other table you'd like. Since when does a relational structure require a garbage schema like your example. But surely you know all that.
But doesn't that then preclude accessing the same data set from programs written in other languages? The beauty of SQL is that it's language-agnostic.
You also make several points relating to toolchains and testing: sure, some databases have better tools than others. But we're talking about differences between models, not differences between particular tools.
I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
...
If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.
When you're dealing with TB/PB of data that doesn't require relational capabilities, there's no reason to use a "good relational database like PostgreSQL" when you can dispense altogether with the relational aspect and its performance hit.
NoSQL may seem like the fad-de-jure, but until recently, nobody was working with such enormous dynamic datasets. When you look at the growth of all these hi-tech companies, they did an incredible amount of in-house hacking to develop the software necessary to glue together their enormous hardware infrastructure.
[Fuck Beta]
o0t!