Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL
donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead."
Cassandra is basically a sloppy implementation of UniVerse and elated products. Why sloppy? Because the idea of a separate file access for each column sucks - use a union or struct as necessary, people!
In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.
Negative moral value of force outweighs the positive value of good intentions.
Reddit also recently switched to Cassandra.
I imagine with the continual growth of these social networks, high performance DB methodologies will experience tremendous growth, and perhaps even paradigm shifts in the way we logically think and design database architectures. Instead of this flat 2D table mentality, imagine n-dimensional matrices of data, scaling dimensions instead of table and rowcounts.
I bet if you converted Facebook to this n-dimensional 'table' model, and did a couple inner-joins and unions, you could rip space-time wide-open!
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
Or away from MySQL? There is a difference.
MySQL is the leading bottleneck and point of failure when your project starts to grow. MySQL is a monoculture. On the lowest end of the spectrum (after SQL Lite) it rules the landscape. Virtually 95% of all hosting companies offer MySQL as the only option for customers. Would be nice if some alternatives emerged and we had some competition in that space.
From the Digg blog - http://about.digg.com/node/564
"And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP."
Cassandra Linux Apache PHP?
creation science book
They'll be able to suck off Kevin Rose's dick that much faster
This sad thing is that Monty's MySQL fan boys will blame this on Oracle when in reality the move to Cassandra (or other NoSQL databases) is what a lot of web sites should be doing regardless of who holds the MySQL reins.
Why not lighttpd?
and why wouldn't a relational database system not be perfect for facebook?
If you mod me down, I will become more powerful than you can imagine....
I became interested in this for use with my projects (probably won't ever outgrow MySQL's capabilities, but it looked like maybe it'd make redundancy easier).
I immediately became disinterested when I read the following line:
"Also, unless you've downloaded a binary distribution, you'll need to compile the software by invoking ant from the top-level directory."
Do I really need Java to run this? Does that sound ridiculous to anyone else? Not just because of how much slower it is, but think of how much overhead is required. On a lighter server configuration, this could easily double memory usage.
Richard Stallman resigns from Free Software Foundation, announces bid for GNAA presidency
I too have a site running on MySQL and I am thinking of switching.
Can anyone tell me if there is any "comparison chart" listing the various features / usability of the various OSS DB packages available so I can make a better educated decision?
Please help !
Thank you !
Muchas Gracias, Señor Edward Snowden !
Will Slashdot switch?
MongoDB is another "NoSQL" solution. You can still have LAMP. I think they do a disservice to the LAMP stack when lumping it in with their issues with MySQL. (unless of course they really are getting rid of Linux, Apache and PHP too.
So what's the advantage of switching?
I have a policy of if it ain't broke don't fix it
These slides present a balanced and comprehensive overview of the current state of free databases. Whether you're in the NoSQL camp or not, they're worth reading.
That said, here's my take:
It's currently fashionable to replace MySQL with some "NoSQL" database or other. This trend is driven by two factors:
I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL. There's a world of difference between it and MySQL, and condemning all relational database systems because of bad experiences with MySQL is like condemning all sandwiches because McDonalds once made you sick. In giving up RDBMSes entirely, these developers lose quite a bit of safety, flexibility, an convenience. It's a huge over-reaction.
This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use: it should be about choosing the best tool for the job. And I believe that in the vast majority of cases, the advantages conferred by a relational system --- enforced integrity, interoperability based on SQL, query flexibility, storage flexibility --- make an RDBMs the best choice for almost any job. If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.
MySQL has never been a good example of a relational database, the underlying implementation is limited. Its MySQL that is the problem here, not relational databases.
I suspect here that it is not the relational model at fault here, but the lack of creativity and competence in implementing a relational database technology. MySQL perhaps has never been a particularly scalable platform, it has a number of severe limitation and does not seem to be designed with a lot of thought for a distributed environment. Its developers seem to have developed it for small scale webpages, and have been notorius on leaving out many advanced features, and thus have limited its effectiveness to small, low powered pages.
Its all in implementation, its not the relational database model that needs fixing, it is the underlying implementations.
On a related note, Reddit's performance and reliability has dropped off significantly since switching to Amazon's "Cloud", and dropped off even further after this switch to Cassandra.
The constant 503 errors, plus horrendous load times when it does manage to work, have driven me and many others away from Reddit. That's why I'm posting here on Slashdot.
Cloud hosting is a stupid idea for anything beyond a blog getting 10 hits per date. All the talk about scalability is pure bunk. I mean, even with the extensive knowledge and infrastructure of Amazon, the Reddit site is slow (and it wasn't like that before they switched).
Am I the only one who frowns at this moniker?
First, it creates a false premise where people need to pick "SQL" versus "no SQL", while many real-world systems intelligently combine relational and non-relational data storage for their needs. There is no conflict.
Second, there's nothing wrong with SQL as a language in particular, and in fact many of the "noSQL" engines are starting to support and extending basic SQL queries, instead of reinventing their own query language for the same purpose.
I suppose "lessRDBMSabuse" was less catchy...
There is this thing, it's called archiving. Sounds like another example of software developers pretending to be DBA's, if you ask me.
---
Databases Feed @ Feed Distiller
So much horseshit in just one slide deck. No matter what you do, unless you have at least a hundred machines at your disposal, Hadoop won't be faster than a single box grep from SSDs. LucidDB is excruciatingly slow for all but tiniest datasets. I've tried a good half dozen "solutions" from this slide deck (including Aster), and other than Postgres all of them suck ass, more or less. If you see ANYTHING other than Nutch with Hadoop as a backend, head for the hills right away.
Mysql sucked for many years but is getting better with each release. It was never designed to be a fully RDBMS .
In Japan people use PostgreSQL and I am surprised that its not common among geeks. Many ISPs now offer it as well as MySQL. The problem is the trendy word is Nosql and mostly non database programmers are promoting the movement due to bad experiences of trying to learn mysql to do things that are very complicated.
PostgreSQL is very easy to switch your existing code too if you used SQL compliant code in languages such as Php. WIth triggers, views, stored procedures, and abilities of self repairing in case of a power failure make postgreSQL an easier platform to develop for.
http://saveie6.com/
Try putting petabytes of data on SSDs and let me know how that works out for you.
Which is more expensive, a few extra machines or developer time? (I'm assuming a solution that scales properly here, you write scalable solutions in any language.)
HAND.
Search has nothing to do with "relational". And SQL is a query language and also has nothing to with how well/badly a given model (relational/whatever) scales.
You are talking out of your ass.
Moving from LAMP to CLAP sounds like a new STD stack for open source develeopment
Many thanks for the explanation ! :)
Muchas Gracias, Señor Edward Snowden !
Okay, I keep hearing about these noSQL solutions, but I can't find a single example!
For example, how do you do some SELECT with couple of JOINS? How do you do SUM over GROUP of things etc. ... or for that matter how one creates table?
And, yes, I have searched for Hadoop and others but all I get these odd pages with no examples.
I'm probably too damn idiot for these NoSQL solutions, since can't find a good tutorial for converting SQL app to one of these.
The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL
Never underestimate the relief of true separation of Religion and State.
There seems to be this angry pushback from a core of dedicated SQL programmers, acting as if someone had insulted their tin god and wanted to invalidate their lives' work. Not at all. All that has been developing is the realization that RDBMS's are not the best fit for all applications, and that other storage schemes might have a better impedance match with the needs of a particular design. RDBMS's are still robust and reliable and useful for (maybe most) applications. Only some apps' data does not fit nicely into rows and columns. And you should design your code around the data, not try to morph the data to your software.
The Linux kernel, Firefox, and graphical toolkits (and things built on top of them) are running on their momentum (i.e. the "viral" effect), but have you noticed how newly successful projects overwhelmingly tend to have permissive licenses like Apache (as is the case with Cassandra), BSD, MIT, PHP, Python, and so on?
Great, a number of sites have switched to Cassandra, that's an interesting social benchmark. What about some real engineering benchmarks? I'd like to consider Cassandra but where is the objective data?
Cassandra's data model page states that "Cassandra is much, much faster at writes than relational systems". Great, so how about some comparative data? There is a slide show on the main Cassandra page with a snippet of data about read latency. Reads range from 7 ms to 44 ms. That's pretty anemic in the RDBMS world. There is a statement that writes are limited by network bandwidth.
There is also a presentation from IBM that shows reads ranging from 25 to 900 ms, but with no write data. The fact that read latency gets worse (increases) by a factor of 2 or more when you go from a 3 node to 6 node Cassandra cluster would seem to be worrisome on the surface.
The Facebook Engineering Notes presentation has almost nothing quantitative (only two sentences have numbers) and nothing is documented about read or write performance.
Some other bloke in another part of the discussion said, Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally
Can't say I know which of you is right.
Uh - these DBs have been around for years and years. C-Tree, Raima, and other DBs are non-SQL and FAST. No DB server involved, so high concurrency wasn't a good idea. We used them for individual per-user DBs. Also, they are fine for write-once, read-many data needs.
I had 10+yrs using them before I was introduced to SQL-based DBs. Back then, MySQL, msql weren't mature enough to trust any data. The only other options were the expensive SQL vendor DBs. Those didn't work for our world-wide royalty free software distribution requirements. We started with Raima (before Velocis) and ended up migrating to C-Tree. BLAZINGLY FAST doesn't describe how fast it was. I think Raima could have been fast too, but that part of the program was written by an engineer, not a CS artist. The Engineer left our team and the CS guy rewrote everything in C-Tree. I think it costs 20x less that way too.
No idea where you got that particular piece of misinformation. :)
Another example of technically knowledgeable people picking a really bad name.
The term shared-nothing architecture dates back to 1986. The concept goes back further; for example, the Teradata RDBMS dates back to 1976-1983.
So ignorance about database technologies and products is one issue, and the one hardest to excuse. There are however other issues that are more understandable:
The existing solutions are proprietary, and web startups tend to prefer open source solutions.
The existing solutions are biased toward OLAP, analytics and data mining, i.e., taking large volumes of data and analyzing large sets of it at a time. Some of the "NoSQL" products are built for this (e.g., Hadoop + HDFS), but there are others that are more aimed toward simple transactional processing.
So it really is the case that there do not exist good relational products that tackle something like Digg, Facebook or Twitter, which want to use commodity hardware in geographically dispersed locations. However, it is also the case that the "NoSQL movement" that has sprung up to fill this gap has a combination of ignorance and animosity towards the relational model, and are just not thinking the problem through.
Are you adequate?
Cassandra is basically a sloppy implementation of UniVerse and elated products. Why sloppy? Because the idea of a separate file access for each column sucks - use a union or struct as necessary, people!
College-Pages.com - Online Colleges, Degrees, and Programs
http://www.firebirdsql.org/
Firebird is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms. Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers. It has been used in production systems, under a variety of names, since 1981.
Name one Web 2.0 application that is able to properly manage multiple MySql? I am aware of only one (http://novaquantum.com) but the point is that MySql is not gaining any ground on 2.0 frontend!