Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

Facebook, Twitter and now Digg by clarkkent09 · 2010-03-12 14:53 · Score: 5, Funny

In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.

--
Negative moral value of force outweighs the positive value of good intentions.

Re:Facebook, Twitter and now Digg by DigiShaman · 2010-03-12 15:31 · Score: 2, Insightful

I used to think that also applied to Slashdot. But no, I've learned a lot both directly and indirectly over the many years (ten years, wow). Even if most of it is crap, the debates and discussions are still quality entries worth keeping.

--
Life is not for the lazy.
Re:Facebook, Twitter and now Digg by h4rr4r · 2010-03-12 15:35 · Score: 3, Informative

Fits, before that mysql was the best way to store data no one cared about.
Re:Facebook, Twitter and now Digg by OnlyJedi · 2010-03-12 15:46 · Score: 2, Insightful

According to various internet sources (so take with a grain of salt):
Mark Zuckerberg's net worth: $2 billion. Made entirely from Facebook.
Twitter's net worth: $589 million.
Digg's net worth: $24.34 million.
Even if each individual datum is nearly worthless, the combined value is far from it. Do you think any of those companies would still be worth what they are if they're databases were irretrievably wiped?
Re:Facebook, Twitter and now Digg by jo42 · 2010-03-12 15:56 · Score: 2, Funny

Sorry, I just can't resist...
> databases were irretrievably wiped
The expression to describe such an fortunate event would be "and nothing of value was [would be] lost".
Re:Facebook, Twitter and now Digg by plover · 2010-03-12 16:15 · Score: 2, Insightful

Worthless?
That data reflects our culture!
Nobody said it couldn't be both at the same time.

--
John
Re:Facebook, Twitter and now Digg by OakDragon · 2010-03-12 16:17 · Score: 2, Funny

In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.

I used to think that also applied to Slashdot. But no...

Correct - Slashdot doesn't use Cassandra!

--
Dark Reflection
Re:Facebook, Twitter and now Digg by prockcore · 2010-03-12 16:20 · Score: 2, Informative

Reddit also switched from memcachedb to Cassandra for their kvstore. From research to launch took 10 days.
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 16:24 · Score: 4, Insightful

Are you seriously arguing that unless the first derivative of one's salary is positive, there's no incentive to work?
No, I did not say that one's salary needs to be monotonically increasing. That is not the point at all. And did you really have to turn this into a calculus problem?
To state it differently, many entrepreneurs are willing to work temporarily for little or even nothing, and to make great sacrifices such as giving up health benefits, vacations, and normal family/social life... things most 9-5 workers would never consider. Being someone's bitch for $1M/yr (or to be pedantic let's say $1M/yr + 5%/yr^2) may sound like a splendid deal to you but there are others who would work much harder for sweat equity in their own venture.
These people exist even if you can't fathom it. I'm one of them.

Reddit by Gudeldar · 2010-03-12 14:54 · Score: 3, Informative

Reddit also recently switched to Cassandra.

Re:Reddit by h4rr4r · 2010-03-12 15:36 · Score: 5, Funny

I was not aware metallurgy was popular amongst the youth.

Away from LAMP? by Anonymous Coward · 2010-03-12 14:58 · Score: 3, Insightful

Or away from MySQL? There is a difference.

Re:Away from LAMP? by DMUTPeregrine · 2010-03-12 20:17 · Score: 2, Funny

Lamp = one of those things with a lightbulb in. Also already taken.

--
Not a sentence!

New acronym in order? by mgkimsal2 · 2010-03-12 14:58 · Score: 5, Funny

From the Digg blog - http://about.digg.com/node/564

"And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP."

Cassandra Linux Apache PHP?

--
creation science book

Re:New acronym in order? by Anonymous Coward · 2010-03-12 15:05 · Score: 3, Funny

Trust me, you don't want the clap!!!!
Re:New acronym in order? by Tablizer · 2010-03-12 17:29 · Score: 3, Funny

[...moving away from LAMP] Cassandra Linux Apache PHP?"
try: Cassandra Ruby Apache PHP

--
Table-ized A.I.

The Monty crowd will blame this on Oracle by heathm · 2010-03-12 15:04 · Score: 2, Insightful

This sad thing is that Monty's MySQL fan boys will blame this on Oracle when in reality the move to Cassandra (or other NoSQL databases) is what a lot of web sites should be doing regardless of who holds the MySQL reins.

Re:The Monty crowd will blame this on Oracle by PietjeJantje · 2010-03-12 23:04 · Score: 5, Insightful

You have to understand the slashdot memes. These are constructed around the state of technology over a decade ago. So, PHP is always bad, Javascript and Ajax are always bad, and when someone mentions MySQL, the karma whores come out to bash it and mention PostgreSQL. They don't need an argument, the authors and upvoters are operating in old-man auto-bot mode. Like I said, it typically involves notions which were fixed years ago if they did exist to begin with. These are elitist-wannabees, using simple rules of engagement, to show you how smart they are. Similar to grammar nazi. It is actually a quite lower-class thing to do. As Hannibal Lecter would say, you have to wonder if they still hear the lambs screaming.

Re:Good for them by Bill,+Shooter+of+Bul · 2010-03-12 15:14 · Score: 3, Insightful

100% of hosting companies do not have twitter, facebook, reddit, or digg as their clients. Its a different market. Mysql does have a competitor in this space called PostgreSQL. Its pretty good. Pretty much every hosting company I would consider doing business with also offers it. But again, PostgreSQL wouldn't have saved the day for these companies, they've reached a different sector of the market due to their enormous scale.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

Re:Nothing new ... by FooAtWFU · 2010-03-12 15:24 · Score: 2, Funny

UniVerse and elated products

Yes! These products are wonderful! They are spectacular! They are a beam of sunshine refreshing my soul! I'm so happy with them! Daisies!

--
The World Wide Web is dying. Soon, we shall have only the Internet.

Re:Which DB is better? by Anonymous Coward · 2010-03-12 15:28 · Score: 3, Insightful

If you need a comparison chart... you don't need to switch.

It's probably not necessary to change such a huge part of your architecture if it's not worth investing serious time investigating and benchmarking the alternatives.

Re:Which DB is better? by h4rr4r · 2010-03-12 15:34 · Score: 5, Informative

Postgres, for people who care about their data.

Allergic reaction to MySQL by QuoteMstr · 2010-03-12 15:39 · Score: 5, Insightful

These slides present a balanced and comprehensive overview of the current state of free databases. Whether you're in the NoSQL camp or not, they're worth reading.

That said, here's my take:

It's currently fashionable to replace MySQL with some "NoSQL" database or other. This trend is driven by two factors:

MySQL's community is fragmenting into several forks as Oracle purchases the rights, which created the impression that MySQL's development is entering a riskier, unstable period.
"NoSQL" is the technology buzzword du jour in the Bay Area. It's difficult to overstate the impact of social forces on technology choice: most technology selections are governed more by what our friends say than by an impartial and disinterested weighing of merits.

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL. There's a world of difference between it and MySQL, and condemning all relational database systems because of bad experiences with MySQL is like condemning all sandwiches because McDonalds once made you sick. In giving up RDBMSes entirely, these developers lose quite a bit of safety, flexibility, an convenience. It's a huge over-reaction.

This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use: it should be about choosing the best tool for the job. And I believe that in the vast majority of cases, the advantages conferred by a relational system --- enforced integrity, interoperability based on SQL, query flexibility, storage flexibility --- make an RDBMs the best choice for almost any job. If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.

Re:Allergic reaction to MySQL by Tablizer · 2010-03-12 17:38 · Score: 2, Informative

Sigh. Most people seem to be stuck on following trends--in pretty much every aspect of their lives. Why think when you can conform to the crowd?
One can potentially make good money surfing bullshit. It's like the dot-com bubble: get in early, lie about your ability, rake in big bucks, and then get out and move on to the next hype bubble while the last one crashes on those left holding the bag.
However, I do believe there's perhaps a place for big non-relational databases. They tend to be single-purpose and situations were few will care much if a few records are lost every week or so. If you have a million customers who only make money for you from occasional ad clicks, then losing a few dozen due to lack of A.C.I.D. is not going to be a bottleneck from a business standpoint. And the info can be delay-copied into a RDBMS where traditional statistics and reports can be done.

--
Table-ized A.I.
Re:Allergic reaction to MySQL by TubeSteak · 2010-03-12 17:58 · Score: 3, Interesting

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
...
If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.
When you're dealing with TB/PB of data that doesn't require relational capabilities, there's no reason to use a "good relational database like PostgreSQL" when you can dispense altogether with the relational aspect and its performance hit.
NoSQL may seem like the fad-de-jure, but until recently, nobody was working with such enormous dynamic datasets. When you look at the growth of all these hi-tech companies, they did an incredible amount of in-house hacking to develop the software necessary to glue together their enormous hardware infrastructure.

--
[Fuck Beta]
o0t!
Re:Allergic reaction to MySQL by kmike · 2010-03-12 19:10 · Score: 3, Insightful

As several MySQL experts already noted, Digg isn't even using the indexes that provide maximum performance in the query that they present as problematic for MySQL:
http://mysqlha.blogspot.com/2010/03/index-only.html
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
So you are right about the NoSQL fashion trend. Looks like for some companies it's easier to throw a pile of cheap commodity hardware driven by some NoSQL BigTable-wannabie at the problem instead of carefully optimizing queries and indexes for the best performance.
Re:Allergic reaction to MySQL by jrumney · 2010-03-12 19:20 · Score: 5, Insightful

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
The adopters of NoSQL deal with huge volumes of worthless information. They don't care about transactional integrity as much as they care about performance, which is why they chose MySQL over a good relational database in the first place.
Re:Allergic reaction to MySQL by ducomputergeek · 2010-03-12 20:48 · Score: 3, Informative

When you're dealing with TB/PB range, you call Teradata. At last check they handle 4 of the 5 largest databases in the world, including eBay/Paypal's 13PB's monster and Walmart.

--
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
Re:Allergic reaction to MySQL by jbellis · 2010-03-13 02:51 · Score: 4, Informative

Teradata and the other big relational db products (vertical, greenplum, etc) are all _analytical_ databases, designed for small amounts of complex queries, where adding new data to the system takes minutes if not hours. They are completely unsuitable for running a live application against.

Re:Which DB is better? by larry+bagina · 2010-03-12 15:40 · Score: 2, Insightful

you should probably look at what queries you're running and what the planner/optimizer is doing with them to verify the problem is mysql and not your schema and indexes.

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:Wow... by Anrego · 2010-03-12 15:41 · Score: 3, Interesting

Don't be too quick to put Java down.. it's slower but it scales fairly well.

Re:Which DB is better? by QuoteMstr · 2010-03-12 15:47 · Score: 3, Informative

The page you cited, on column-oriented databases, describes an implementation strategy that's applicable to many types of databases. There are database engines that present a perfectly normal SQL interface to a column store, and there's actually a direct link to LucidDB from the article. Likewise, there's nothing stopping a Cassandra-like database from serializing its on-disk bits the other way around.

Column-orientation has nothing to do with the "NoSQL" databases that are in vogue. It's completely orthogonal. You're talking about using vectors or linked lists when everyone else is arguing over whether to serialize data with XML or JSON.

Reddit's reliability has been shitty lately. by Anonymous Coward · 2010-03-12 15:48 · Score: 2, Interesting

On a related note, Reddit's performance and reliability has dropped off significantly since switching to Amazon's "Cloud", and dropped off even further after this switch to Cassandra.

The constant 503 errors, plus horrendous load times when it does manage to work, have driven me and many others away from Reddit. That's why I'm posting here on Slashdot.

Cloud hosting is a stupid idea for anything beyond a blog getting 10 hits per date. All the talk about scalability is pure bunk. I mean, even with the extensive knowledge and infrastructure of Amazon, the Reddit site is slow (and it wasn't like that before they switched).

Re:Reddit's reliability has been shitty lately. by uncqual · 2010-03-12 17:25 · Score: 3, Interesting

One aspect of the "cloud" (as in EC2) is that you can not only scale up easily (for $ of course), you can scale down easily (to save $).

When you have fixed "in house" infrastructure to handle peak loads, there's not a lot of motivation to power off absolutely as many servers as you possibly can when you're not at peak load - all you save is the energy costs (and, if you're using remote hosting, you don't get rewarded for this except for whatever value you attach to feeling "green"). You still pay for the floor space, the machines, and perhaps some sort of maintenance contracts regardless of if the server is powered up or down.

Using EC2 (depending on how you've structured it - some dedicated, some non-dedicated instances etc), if utilization drops to 80% over 20 instances, the temptation is to release a couple instances to save a couple bucks and drive utilization up to 90% on the remaining instances -- with potentially unfortunate consequences.

Although I have no idea, I wonder if Reddit is just releasing instances too aggressively now "because they can" in order to save money? If so, the fingrer should be pointed at Reddit, not the cloud (or EC2 specifically).

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:Reddit's reliability has been shitty lately. by Neoncow · 2010-03-12 17:49 · Score: 3, Informative

The reddit blog discussed the issue recently.
They claim it is not an EC2 issue, but simply the site getting bigger than it was designed to.
Their lastest entry discusses why they switched to cassandra. I guess we'll wait for next week to see if the expected performance benefits materialize.

Re:Wow... by M.+Baranczak · 2010-03-12 15:54 · Score: 2, Insightful

If you're trying to run a site on a $15/month hosting account, then no, this is probably not for you. But if you're at the stage where MySQL isn't able to handle all the data you're throwing at it, then chances are you won't care about the extra few MB of memory that the Java runtime requires.

Re:Wow... by John+Hasler · 2010-03-12 16:00 · Score: 2

> But if you're at the stage where MySQL isn't able to handle all the data
> you're throwing at it... ...it's time to move up to PostgreSQL.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.

Re:Good for them by QuoteMstr · 2010-03-12 16:00 · Score: 3, Informative

But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?

The relational model is consistent and easy to work with. It's easy to specify constraints that describe what the data should look like, and to allow several applications to interact with the data. It's also easier to optimize a database when you can describe discrete queries instead of directly following links from program code as you would in a navigational/object/document/etc. database.

Furthermore, application data models aren't all that object-oriented. Most of the time, the manipulated data types (say, "story", "post", and "user") fall into well-defined categories that correspond well to rows in a table. The few mismatches are easily dealt with in application code.

Sure, using an object database might be "easier" for the first 15 minutes, but you'll kick yourself when you have to manipulate it in any kind of sophisticated fashion.

Re:Which DB is better? by RelliK · 2010-03-12 16:13 · Score: 5, Informative

Go with PostgreSQL. Reliable, standards-compliant, fast.

--
___
If you think big enough, you'll never have to do it.

"NoSQL"? by Stan+Vassilev · 2010-03-12 16:27 · Score: 5, Insightful

Am I the only one who frowns at this moniker?

First, it creates a false premise where people need to pick "SQL" versus "no SQL", while many real-world systems intelligently combine relational and non-relational data storage for their needs. There is no conflict.

Second, there's nothing wrong with SQL as a language in particular, and in fact many of the "noSQL" engines are starting to support and extending basic SQL queries, instead of reinventing their own query language for the same purpose.

I suppose "lessRDBMSabuse" was less catchy...

Re:"NoSQL"? by shic · 2010-03-12 21:28 · Score: 2, Informative

Second, there's nothing wrong with SQL as a language
I beg to differ - SQL is preposterously baroque!
That said, if you're problem is of a particular kind, it is a perfectly reasonable, practical, solution to many problems.

Re:Wow... by QuoteMstr · 2010-03-12 16:38 · Score: 3, Insightful

Bullshit. Languages don't scale: programs do.

Writing a program in Java makes is scalable in the same way that painting a car red makes it fast. The JVM is quite good these days, but don't make up advantages that don't exist.

Re:Nothing new ... by hibiki_r · 2010-03-12 16:55 · Score: 2, Interesting

Come on, it cannot be any sloppier than actual UniVerse: It performs extremely poorly on large files, especially when record sizes vary wildly. I've seen in-memory files in which any insert or update operation took 5+ seconds! In my experience, even Postgres in far weaker hardware just spanks UniVerse even on the simple queries where it should have an advantage. If you ever need to read two or three files, either by hand or through I dictionary entries, UniVerse is orders of magnitude slower. When you add the low quality of the system monitoring and debugging tools that are available for it, it turns into one big stinker.

If Cassandra is any slower, it'd have to lock the system up while idle.

Re:Why? by mysidia · 2010-03-12 17:09 · Score: 2, Insightful

A bad policy when dealing with your data.

Once it's broke, it is way too late.

You can't un-LOSE the past 6 hours of transactions or table referential integrity that MySQL trashed, due to an unclean shutdown.

MySQL's great until it comes up to bite you in the arse.

Re:Which DB is better? by Bill,+Shooter+of+Bul · 2010-03-12 17:14 · Score: 2, Insightful

Note: Facebook, twitter, digg: they aren't moving to postgreSQL. Its not better enough to make any kind of difference for that kind of a scale. They don't need features, they need speed.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

Re:Which DB is better? by QuoteMstr · 2010-03-12 17:20 · Score: 3, Insightful

First of all, if he's asking Slashdot for advice (which is barely a step above reading tea leaves [which itself is a step above asking 4chan]), he doesn't need Facebook-level scalability.

Second, you're confusing scalability and performance. Scalable solutions tend to actually be slower than non-scalable ones: the difference is that a scalable system increases in capacity linearly with the number of machines you throw at it ("horizontal" scalability), whereas a fast non-scalable system generally needs the same number of faster, individual machines to increase capacity ("vertical" scaling).

Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally.

Re:Good for them by Anonymous Coward · 2010-03-12 17:40 · Score: 2, Interesting

In my experiences developing applications in both the business and gaming industries, most applications beyond a simple cookbook app/crappy blog are highly object oriented. How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code. If your domain objects are consistently flat, you are probably doing something wrong. I for one do not want to use an API with Address1 - Address5 string properties. What you just listed as story, post, etc are all just objects, usually with nesting. Relational databases suck at dealing with complex object hierarchies, hence all the joins just because object A has a collection of object Bs which contain an object C.

Can you please define what a sophisticated fashion means? Unless you are a DBA and love SQL/config work, it is far easier to write constraints using an object database. You simply use the same validation and rules you should already be using in your application. If you rely on your database along to enforce things like required fields, atomicity, etc, then you have failed at creating a good application and likely are ripe for exploits, security holes, bad data, etc anyway. It is true that relational DBs provide certain easy facilities, but any decent Object Database provides most if not more of these same constructs in another form through its API. For instance, most object databases I have used provide some sort of transactional data structure that supports far more types of locking and concurrency/conflict management than any relational DBs I have ever seen. Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language. I'd say you can do a lot more in Smalltalk than SQL for instance.

If you're referring to querying, apparently you've never queried in Smalltalk, C# with LINQ, LISP, or even just using lambdas in python or ruby. Querying using the actual object is typically far easier than writing a SQL query. These days it is becoming increasingly rare that someone rolls all their own queries in your average app anyway (see ORMs). You'll often end up with something like an ORM translating some things from the UI into a boat load of queries, then you'll have to go and find fixes for the ORM to avoid making the application grind to a halt due to all the chatter. Although a lot of that is often the function of UI elements, ultimately there is a lot of overhead created by patching the relational and object disconnect.

I am wondering how you think going from relational back to objects, even flat ones is somehow easier and more consistent. You're adding an extra language, more layers, and more configuration/management for what gain? Object databases hold records for things like throughput for transactions, data population, etc. The performance thing is a myth of the past. I'd say the stumbling block if anything is simply bad developers. An RDBMS does add some what of an idiot proof layer, but really in the end you just end up with even crappier code in other spots.

Finally, you mention that discrete queries are easier to optimize. I again must disagree. If you want discrete queries, you could describe each query on an object with another object. This is exactly what any good developer should be doing with an ORM anyway. For instance, you could use the specification pattern with the repository pattern to describe and issue your queries, object db or rdbms. Secondly, instead of some crappy tools from the maker of the RDBMS, using an object DB I now have the full facilities of the language to do performance optimization, profiling, logging, etc rather than what a vendor provides. MSSQL provides some great tools for example, but most other DBs while nice implementation wise, provide horrific tool chains.

It is true there are some problems an RDBMS is good for, but your post comes off like someone who has never really use

Re:Good for them by QuoteMstr · 2010-03-12 17:53 · Score: 3, Interesting

Thanks for the comprehensive reply.

How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code.

ORMs are syntactic sugar for the underlying database operations. It's possible to bypass them when you need SQL's full power and access the same data store.

I for one do not want to use an API with Address1 - Address5 string properties.

So create a table of addresses and use foreign keys to connect them to whatever other table you'd like. Since when does a relational structure require a garbage schema like your example. But surely you know all that.

Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language

But doesn't that then preclude accessing the same data set from programs written in other languages? The beauty of SQL is that it's language-agnostic.

You also make several points relating to toolchains and testing: sure, some databases have better tools than others. But we're talking about differences between models, not differences between particular tools.

Re:so does it use sql or not? by Anonymous Coward · 2010-03-12 18:35 · Score: 3, Informative

i can't tell from the 4 lines of text buried in ads that is this supposed article, but i'm guessing this "nosql" still uses an sql database backend?

and why wouldn't a relational database system not be perfect for facebook?

1) NoSQL databases are just that NO SQL, there is no relational database involved.

2) No relational models are not good for Facebook style data, Facebook uses a lot of trees, networks and graphs, none of which are easy to store in a relational system, Facebook also has a lot of dynamic schema requirements, again SQL does not cope with this well, and at the scale that Facebook operates at they are forced to use techniques like sharding and partitioning of their data sets, at which point a lot of what makes the relational model useful becomes difficult to use, i.e. joins across databases servers are really hard to do etc.

Re:Which DB is better? by Billly+Gates · 2010-03-12 18:51 · Score: 3, Informative

PostgreSQL is a real relational database that support views, nested sql, triggers, foreign keys, and even statistical analysis.

I think Mysql supports foreign keys now and my info might be dated. But if a database does not support foreign keys then its not a real relational database and mysql had that problem for years.

Once switching over you can find out how hard processor intensive tasks that took minutes can be done easily in seconds with the features I described above with PostgreSQL. You can save alot of speed with complex queries with PostgreSQL.

--
http://saveie6.com/

Re:Database Evolution by DogDude · 2010-03-12 19:17 · Score: 2, Interesting

I imagine with the continual growth of these social networks, high performance DB methodologies will experience tremendous growth, and perhaps even paradigm shifts in the way we logically think and design database architectures.

Your statement that social networks push databases to their theoretical limits is laughable. Larger, more frequently accessed, more complicated databases have existed for years (decades?) before the current crop of Friendster clones existed. Just because Facebook is the largest, most "high performance" database application that you can think of doesn't make it remotely true.

The problem of dealing with very large, frequently changing databases has been addressed and solved, already. The problem is that most PHP-monkeys have -zero- database knowledge, and instead of doing the work to figure out the right way to do things, they feel like they need to re-invent the wheel. A better solution is to pick up a book written by somebody who's been working with RDBMS' for a few decades. It's not a quick fix, but this problem has already been solved many, many times over.

--
I don't respond to AC's.

Re:Wow... by Billly+Gates · 2010-03-12 19:25 · Score: 3, Insightful

Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.

I suppose you could write your own threading and processes code but most Java developers just use whats built into the api.

--
http://saveie6.com/

Re:Which DB is better? by alexkorban · 2010-03-12 19:58 · Score: 4, Informative

I have worked with large PostgreSQL databases (150GB or so) and really, Postgres isn't a solution. You run into issues anyway when some of your tables contain millions or even billions of rows. At that stage things like vacuuming or altering the schema start to become damn near impossible, and even querying starts to become a bottleneck.

Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.

Based on my experience, I don't think that relational databases are appropriate for really large databases, and at present the only realistic option is horizontal scaling which is a lot easier with things like Cassandra or MongoDB.

--
Free posters and articles for business analysts and project managers

It's "Not Only SQL" by Otis_INF · 2010-03-12 21:02 · Score: 3, Informative

The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL

--
Never underestimate the relief of true separation of Religion and State.

Re:Which DB is better? by roman_mir · 2010-03-12 22:12 · Score: 2, Informative

I just read your comment and checked the PostgreSQL DB I am working with, it's only 1.7GB at this point, but growing, and the most rows in a table is 12,6 million. This DB is heavily used by a number of background processes, which select, insert, update and delete large volumes of data and by 14 people at this point, who run about 400 various reports per day each as well updating some data. The average time that a single user has to wait is 6 seconds per report. Those reports are optimized of-course, but they normally span between 1 day to one month worth of sales data, average being 1 week, while in a day there are on average 5000 sales (the DB grows by that number of sales a day, plus various other product data, client data etc.) (the db is on a single quad-core 5504 Intel, 12GB of RAM, RAID 1 on Intel's 160GB X25 SSD (2 of them) and it's a Gigabit network. This DB is used by the app server, which is a 2 x 4quad core 5405 Intels, 16GB RAM, Java 6 and Tomcat 6 for the front end, with a number of back end systems also talking to the DB from the App server.

My point is that for this given setup, PostgreSQL is showing good performance, however I am sure there are differences in the data model setup that really can kill or make the DB work.

--
You can't handle the truth.

Re:Database Evolution by Anpheus · 2010-03-12 22:27 · Score: 2, Interesting

Now, I'm not an expert on database use and don't want to come across as sarcastic, but it's my impression that a lot of the questions that are being asked of these new types of databases simply don't have past analogues, or if they did, they were solved with this sort of approach in an RBDMS, basically using an RBDMS but without the relational part. Hadoop, Google, and all these social networking sites surely aren't all just... confused? Are they?

Please elaborate on how an RBDMS is applicable to what I guess is now called "scaling horizontally", or perhaps more formally known as sharding, or partitioning with redundancy. It's my impression that most of the RBDMS products available today are simply atrocious at this, but if you can point out which books I need to look at, and which products have good support for this sort of scale, I'd love to learn.

Thanks.

Re:Which DB is better? by alexkorban · 2010-03-12 22:33 · Score: 2, Informative

Oh, absolutely, I'm not surprised that your setup works well, Postgres is a great RDBMS. Of course, how you design your schema matters a great deal too.

But here is another issue I thought of: backup. For our database it was 24 hours to do a full restore, which isn't practical. The only reasonable solution I know is to use replication, which is a nuisance with Postgres and adds maintenance overhead (keeping the schemas in sync). I'd prefer to have built-in redundancy. Again, I think you get that with Cassandra and MongoDB.

I guess in a few years we'll probably end up with something that combines good properties of both key-value stores (redundancy and scalability) and RDBMS (powerful query language, transactions).

--
Free posters and articles for business analysts and project managers

Re:Seems odd to be keeping PhP by maxume · 2010-03-13 01:51 · Score: 2, Informative

Or you could just sporge some jargonistic keywords together in an attempt to advertise your get-rich-slowly scheme.

--
Nerd rage is the funniest rage.

Re:Which DB is better? by DarkOx · 2010-03-13 04:19 · Score: 2, Informative

A good RDBMS engine and as much as people Poopoo MSSQL server its a good engine. I have used it for databases in the 150TB range. If you do your schema right, your indexes correctly, plan your partitions and file groups well you can great performance out of affordable hardware. Now you do need to maintain this thing or develop the automation around building those partitions and moving data into and out of them based on tombstones or some other criteria or your get underwater real fast.

I don't care what technology you pick if you are going deal with that much data you need to:
1.Understand the problem well
2.Spend the time with whatever tools you select to really understand how they work and build whatever you need to fill in where they are deficient.

When you start doing anything that big its not plug and play anymore no matter how you go about it.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html

Re:Which DB is better? by wshs · 2010-03-13 07:59 · Score: 2, Insightful

Putting a proxy between the client and the server to handle the replication does not make Postgre horizontally scalable. Nor does doing a periodic table dump and copying it to the other machines. Postgre might be a ton more efficient than MySQL, but it is in no way scalable.

Re:Which DB is better? by Bill,+Shooter+of+Bul · 2010-03-13 08:58 · Score: 2, Informative

While insightful and informative in its own right, that isn't a logical response to my post.

He was asking for an alternative to Mysql. I was pointing out that moving from mysql to postgresql was not done by large companies with a lot of smart people working for them, because any performance improvements were not worth it.Postgresql's vertical and horizontal scalability did not represent an improvement over mysql. I didn't even mention vertical vs horizontal scalability. In the end you end up with a raw number saying we can handle X many requests in our total system, regardless of the individual performance numbers of any part of the system.

You're right he probably isn't the lead engineer of flickr and probably doens't need cassandra's power, but I think it really says something that while a lot of these companies are switching away from mysql, they aren't switching towards postgresql. But as always, anyone considering any kind of switch must do their due diligence in assessing the potential performance improvements of any new solution.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

Slashdot Mirror

Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

61 of 271 comments (clear)