Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

Nothing new ... by tomhudson · 2010-03-12 14:51 · Score: 1, Interesting

Cassandra is basically a sloppy implementation of UniVerse and elated products. Why sloppy? Because the idea of a separate file access for each column sucks - use a union or struct as necessary, people!

Re:Nothing new ... by FooAtWFU · 2010-03-12 15:24 · Score: 2, Funny

UniVerse and elated products
Yes! These products are wonderful! They are spectacular! They are a beam of sunshine refreshing my soul! I'm so happy with them! Daisies!

--
The World Wide Web is dying. Soon, we shall have only the Internet.
Re:Nothing new ... by WrongSizeGlass · 2010-03-12 15:43 · Score: 1

It pleases us all to see you were able to relate to his post. :-)
Re:Nothing new ... by nb+caffeine · 2010-03-12 16:27 · Score: 1

Seriously, are these like Universe products? I'm working in Unidata on a project, and you're right, it f'ing blows.

--

"Something's wrong with you...and I hope we never do meet again." - Deftones When Girls Telephone Boys
Re:Nothing new ... by hibiki_r · 2010-03-12 16:55 · Score: 2, Interesting

Come on, it cannot be any sloppier than actual UniVerse: It performs extremely poorly on large files, especially when record sizes vary wildly. I've seen in-memory files in which any insert or update operation took 5+ seconds! In my experience, even Postgres in far weaker hardware just spanks UniVerse even on the simple queries where it should have an advantage. If you ever need to read two or three files, either by hand or through I dictionary entries, UniVerse is orders of magnitude slower. When you add the low quality of the system monitoring and debugging tools that are available for it, it turns into one big stinker.
If Cassandra is any slower, it'd have to lock the system up while idle.
Re:Nothing new ... by Anonymous Coward · 2010-03-12 17:12 · Score: 0

I love UniVerse as a database. UVBasic sucks though. Too bad there isn't a "native" way (not that UV SQL/UV.NET crap that tries to map a MV database into a SQL database) to access the data from .NET...
Re:Nothing new ... by tyrione · 2010-03-12 23:56 · Score: 1

Cassandra is basically a sloppy implementation of UniVerse and elated products. Why sloppy? Because the idea of a separate file access for each column sucks - use a union or struct as necessary, people!
They go this route and not PostrgreSQL 8.5? Seriously?
Re:Nothing new ... by tyrione · 2010-03-12 23:57 · Score: 1

PostrgreSQL ==> PostgreSQL.
Re:Nothing new ... by tyrione · 2010-03-13 00:00 · Score: 1

My apologies for not getting the memo that PostgreSQL 8.5 is now 9.0.

Facebook, Twitter and now Digg by clarkkent09 · 2010-03-12 14:53 · Score: 5, Funny

In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.

--
Negative moral value of force outweighs the positive value of good intentions.

Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 15:23 · Score: 0

You should let them know that you are the only person who gets to decide on the value of information. I'm sure they'd love to know what information YOU find so important.
Re:Facebook, Twitter and now Digg by DigiShaman · 2010-03-12 15:31 · Score: 2, Insightful

I used to think that also applied to Slashdot. But no, I've learned a lot both directly and indirectly over the many years (ten years, wow). Even if most of it is crap, the debates and discussions are still quality entries worth keeping.

--
Life is not for the lazy.
Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 15:33 · Score: 0

Worthless?
That data reflects our culture!
Sad* but true.
* Actually, if previous cultures preverved the data of their masses, it wouldent look much diffrent then what you see today. Toilet jokes and sexual humor are aways fashionable.
Re:Facebook, Twitter and now Digg by h4rr4r · 2010-03-12 15:35 · Score: 3, Informative

Fits, before that mysql was the best way to store data no one cared about.
Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 15:35 · Score: 0

It's not just him.. the group voted around April of last year.. digg is crap!
Jury (so to speak) is still out on slashdot..
Re:Facebook, Twitter and now Digg by OnlyJedi · 2010-03-12 15:46 · Score: 2, Insightful

According to various internet sources (so take with a grain of salt):
Mark Zuckerberg's net worth: $2 billion. Made entirely from Facebook.
Twitter's net worth: $589 million.
Digg's net worth: $24.34 million.
Even if each individual datum is nearly worthless, the combined value is far from it. Do you think any of those companies would still be worth what they are if they're databases were irretrievably wiped?
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 15:51 · Score: 1

Made entirely from Facebook.
No. It was made by schmoozing investors. None of companies you list has ever turned a profit.
This is the kind of reckless behavior that leads to financial bubbles. Pay should be much lower initially. I doubt Zuckerberg would have worked any less hard (or hacked any fewer email accounts) if he had been paid the mere subsidence wage of $1 million per year.
Re:Facebook, Twitter and now Digg by jo42 · 2010-03-12 15:56 · Score: 2, Funny

Sorry, I just can't resist...
> databases were irretrievably wiped
The expression to describe such an fortunate event would be "and nothing of value was [would be] lost".
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 16:03 · Score: 1

I doubt Zuckerberg would have worked any less hard (or hacked any fewer email accounts) if he had been paid the mere subsidence wage of $1 million per year.
Entrepreneurs are a funny breed. It's the extreme risk and reward - the prospect of riches just around the corner that drives them, not the daily feed bag (which keeps corporate drones climbing the ladder). Or course $1M/year is a lot of dough but it doesn't matter what the number is, once it's rolling in steady the motivation is gone. In other words, I disagree.
Re:Facebook, Twitter and now Digg by John+Hasler · 2010-03-12 16:09 · Score: 1

The risk is not total loss of the entire database but occasional corruption here and there. However, for Facebook that's tolerable as long as it doesn't rise to a level such that it irritates the users. Given that the average Facebook user can't remember her best friend's phone number, that's a pretty high level.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 16:10 · Score: 1

Are you seriously arguing that unless the first derivative of one's salary is positive, there's no incentive to work?
Re:Facebook, Twitter and now Digg by plover · 2010-03-12 16:15 · Score: 2, Insightful

Worthless?
That data reflects our culture!
Nobody said it couldn't be both at the same time.

--
John
Re:Facebook, Twitter and now Digg by Pojut · 2010-03-12 16:16 · Score: 1

Unfortunately, this is in fact true for many people -_-;;

--
Living With a Nerd
Re:Facebook, Twitter and now Digg by OakDragon · 2010-03-12 16:17 · Score: 2, Funny

In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.

I used to think that also applied to Slashdot. But no...

Correct - Slashdot doesn't use Cassandra!

--
Dark Reflection
Re:Facebook, Twitter and now Digg by prockcore · 2010-03-12 16:20 · Score: 2, Informative

Reddit also switched from memcachedb to Cassandra for their kvstore. From research to launch took 10 days.
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 16:24 · Score: 4, Insightful

Are you seriously arguing that unless the first derivative of one's salary is positive, there's no incentive to work?
No, I did not say that one's salary needs to be monotonically increasing. That is not the point at all. And did you really have to turn this into a calculus problem?
To state it differently, many entrepreneurs are willing to work temporarily for little or even nothing, and to make great sacrifices such as giving up health benefits, vacations, and normal family/social life... things most 9-5 workers would never consider. Being someone's bitch for $1M/yr (or to be pedantic let's say $1M/yr + 5%/yr^2) may sound like a splendid deal to you but there are others who would work much harder for sweat equity in their own venture.
These people exist even if you can't fathom it. I'm one of them.
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 16:31 · Score: 1, Interesting

Let me ask the question a different way then: which particular tasks related to founding a company would you personally perform in exchange for $2 billion, but not in exchange for $1 million? Would you work longer hours? Talk to your family less?
I cannot conceive of incentive to work increasing appreciably after about $1 million. We can talk about the exact figure, but clearly $2 billion is ludicrous for a private individual.
Excessive compensation is rent seeking and harms society in numerous ways: it distorts the political process through over-concentration of resources; it leads to production of luxury goods that have less utilitarian benefit than mass-marked ones; and worst of all, excessive compensation leads to financial bubbles because it causes too many dollars to chase too few investment opportunities.
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 16:41 · Score: 1

Let me ask the question a different way then: which particular tasks related to founding a company would you personally perform in exchange for $2 billion, but not in exchange for $1 million? Would you work longer hours? Talk to your family less?
Would I prefer $1M now vs $2B later? Are you seriously that obtuse, or have I been trolled?
Do you have any notion of what the "tasks related to founding a company" even are? Just some legal paperwork, I suppose?
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 16:46 · Score: 0, Troll

No. To simplify the scenario, let's pretend instead that you receive a lump-sum payment of $2 billion or $1 million two years. What specific actions would take for the former that you would not take for the latter?
As for the "tasks related to founding a company" bit: the intent was to screen out irrelevant answers like "I'd have sex with Newt Gingrich for $2 billion but not for $1 million".
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 17:05 · Score: 0, Offtopic

You're not the brightest crayon in the box are you? Nobody knows who the winners will be until years down the road. There are no billions of dollars in play on day one.
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 17:10 · Score: 0, Flamebait

You're being deliberately obtuse. The question under discussion has to do with incentive to work, not with speculation. My point, to which you still have not responded, is that obscene financial rewards don't cause people to work any harder than high but normal rewards do.
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 17:22 · Score: 1

Your rhetorical question is drivel. It doesn't even parse in English, let alone relate to any plausible scenario I can imagine an entrepreneur encountering:

To simplify the scenario, let's pretend instead that you receive a lump-sum payment of $2 billion or $1 million two years. What specific actions would take for the former that you would not take for the latter?

Why don't you put down your drink for a minute and see if you can muster a moment of clarity. If you can express yourself a little better then I might continue this conversation. Otherwise, good night.
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 17:26 · Score: 0, Flamebait

Yes, I omitted the word "in" between "million" and "two". Therefore, I am wrong. The strength of your argument being overwhelming, I am forced to concede.
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 17:49 · Score: 0, Troll

Yes, I omitted the word "in" between "million" and "two". Therefore, I am wrong. The strength of your argument being overwhelming, I am forced to concede.
I might have guessed that, or I might have guessed "for", which would have made only slightly less sense.
If you have a point, why don't you come out and state it instead of asking me to answer an implausible question?
Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 18:59 · Score: 1, Insightful

seanadams, you are a blathering idiot who knows he's lost the argument but doesn't have the maturity to accept it. QtrMstr has patiently, and repeatedly, made his point in very clear terms (I for one, had no trouble understanding him). And you've responded with the equivalent of clapping your hands to your ears and singing "naa-na-na-na-na". I've already spent all the moderation points I had in modding you down. Posting this as anonymous because I can't imagine how your moronic and immature babblings have gotten whatever points they did get.
Re:Facebook, Twitter and now Digg by MichaelSmith · 2010-03-12 19:09 · Score: 1

Digg's net worth: $24.34 million.
Makes me wonder why its owners put so much effort into making it suck. Their discussion system used to be half decent. Then they changed it and it is totally useless again.

--
http://michaelsmith.id.au
Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 20:12 · Score: 0

No. It was made by schmoozing investors.
None of that has even been "made" yet. That's his piece of the pie on paper, but he doesn't earn the big payday until the investors get theirs. And even if it goes the other way, the investors' preferred shares protect them to a degree.
In reality, his present compensation is probably much closer to the $1m you're imagining.
Re:Facebook, Twitter and now Digg by Billly+Gates · 2010-03-12 20:48 · Score: 1

I would be willing to work hard for 2 billion if I felt 1 million was merely adequate.
One of my dreams is to start a business some day and that big pay off makes it worth it. With wages going down and cost of living going up I do now want someone else deciding the fate of myself, retirement, and family. Its time I became one of them then work for them.

--
http://saveie6.com/
Re:Facebook, Twitter and now Digg by QuoteMstr · 2010-03-12 21:16 · Score: 1

With wages going down and cost of living going up I do now want someone else deciding the fate of myself, retirement, and family. Its time I became one of them then work for them.
Obviously, you should adjust my proposed figures for changes in the price level. My question stands: what specific, personal actions would you undertake for billions that you would not undertake for mere millions?
Re:Facebook, Twitter and now Digg by seanadams.com · 2010-03-12 22:09 · Score: 1

Dude you are really fixated on this question. Do you actually think that people get to entertain such figures when starting a business? Facebook is a one-in-million success, and the big rewards come long after the risks are taken and the hard work is done. Furthermore, in the case of facebook the founders still haven't actually collected those billions - maybe if they have an IPO it'll happen.
Just for shits I will try to answer your question. But I can not ignore the point that the big reward is only a chance, and then only if one works very very hard. here goes
If you paid me $1M/yr, I would show up for work and do whatever you reasonably asked of me until I had a nice house that was paid off and enough to retire. This might take 3 yrs at which point I'd lose interest unless you were willing to pay me a lot more. Or if I didn't totally hate working for you (entrepreneurs hate working for anyone) I might stick around.
But if my own business paid me next to nothing, but with a very good chance at earning $2B+ in the future, I would work my fucking balls off day and night. And the poorer I were, the harder I would work as long as that carrot was out there. In terms of specific personal actions I might sacrifice my physical health (seeing the doctor when I should), sanity (getting enough sleep), eating properly, personal relations (dating), social life (going out drinking), appearance (eg taking time to buy fashionable clothes and dress properly), whatever it takes. It's a totally different ballgame than going to work and getting paid well every dayy.
I should point out though that such wild capital appreciation is not the only reason people go into business for themselves. Many people just don't want to have a boss, would like to fit their job around a certain lifestyle, etc. That doesn't entail these sorts of sacrifices. Bottom line is not everyone has the same values as you - don't think it's so ridiculous that what's just not worth it to you might be motivating to someone else.
Re:Facebook, Twitter and now Digg by dwarfsoft · 2010-03-12 22:16 · Score: 1

...The expression to describe such an fortunate event would be ...
I conclude from your use of "an" that you originally thought that this was an unfortunate event...

--
Cheers, Chris
Re:Facebook, Twitter and now Digg by Billly+Gates · 2010-03-12 22:26 · Score: 1

I understand what you are saying fully.
One of the things ladder climbers do is just what you described, which is thinking that hard work *might* pay off as your employer grows. After all no one was a CEO overnight (a few exceptions of course).
I guess you can say I would sacrafice alot of myself if I had a job that might pay more. This is because I have large amounts of debt from student loans and from being laid off for a year. So much that I will be working for free for 10 years and live off my wife's salary. If I did not have that problem I probably would work hard but leave it at work and enjoy home life if I were making a good 80k a year.
I see your point and it has to do with bargaining power. Right now the bank and employers have me by the balls so I would be willing to do that. But having your own business finally means doing things your way.

--
http://saveie6.com/
Re:Facebook, Twitter and now Digg by Anonymous Coward · 2010-03-12 22:42 · Score: 0

The other day I was looking to get Fallout 3 expansions, the best advices came from a reddit thread.
Sure that's not digg, but maybe there's some hope to find actual relevant information there.
Re:Facebook, Twitter and now Digg by maxume · 2010-03-13 01:17 · Score: 1

She didn't call you back, huh?

--
Nerd rage is the funniest rage.
Re:Facebook, Twitter and now Digg by dfghjk · 2010-03-13 02:02 · Score: 1

Curious that you seem to think this distinguishes entrepreneurs from workers. It does not. There are many "workers" that invest sweat equity for a share of success as well but that does not make them entrepreneurs. Meanwhile, many entrepreneurs aren't willing to give up a regular paycheck.
Re:Facebook, Twitter and now Digg by CyDharttha · 2010-03-13 02:21 · Score: 1

Facebook did turn a profit last year, earlier than predicted:
http://www.marketingvox.com/facebook-sees-profit-on-target-for-500m-year-045281/
Re:Facebook, Twitter and now Digg by roman_mir · 2010-03-13 03:42 · Score: 1

many entrepreneurs aren't willing to give up a regular paycheck
- r u sure those are 'entrepreneurs'? I gave up a regular paycheck a few months back, about 12-16K /month, depending on overtime to try and do my own thing, which may or may not materialize. It could be the worst idea I had so far, but it's mine and I am doing it, if it fails, I will blame myself, but if I never tried, I would have certainly ended up blaming myself for never trying.

--
You can't handle the truth.
Re:Facebook, Twitter and now Digg by teknopurge · 2010-03-13 07:11 · Score: 1

Incorrect - Excel still maintains that title.

--
Website Hosting
Re:Facebook, Twitter and now Digg by Michael+Kristopeit · 2010-03-13 12:24 · Score: 1

every new idea you have is the worst idea you've had so far.
you are stupid.
Re:Facebook, Twitter and now Digg by roman_mir · 2010-03-13 12:29 · Score: 1

Looks like I have a cult of followers forming there.

--
You can't handle the truth.

Reddit by Gudeldar · 2010-03-12 14:54 · Score: 3, Informative

Reddit also recently switched to Cassandra.

Re:Reddit by jibjibjib · 2010-03-12 15:15 · Score: 1

Just for persistent cache, not for their main database.
Re:Reddit by Anonymous Coward · 2010-03-12 15:22 · Score: 0

So if you run a site full of teenagers with zero buying power who think that steeling is the best thing ever - Cassandra is for YOU!
Re:Reddit by h4rr4r · 2010-03-12 15:36 · Score: 5, Funny

I was not aware metallurgy was popular amongst the youth.
Re:Reddit by WrongSizeGlass · 2010-03-12 15:46 · Score: 1

Clearly you haven't spent enough time at reddit.com lately.

Database Evolution by cosm · 2010-03-12 14:56 · Score: 1

I imagine with the continual growth of these social networks, high performance DB methodologies will experience tremendous growth, and perhaps even paradigm shifts in the way we logically think and design database architectures. Instead of this flat 2D table mentality, imagine n-dimensional matrices of data, scaling dimensions instead of table and rowcounts.

I bet if you converted Facebook to this n-dimensional 'table' model, and did a couple inner-joins and unions, you could rip space-time wide-open!

--
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF

Re:Database Evolution by Anonymous Coward · 2010-03-12 15:04 · Score: 1, Insightful

its already multi-dimensional. you have a record, it has keys in it, the values can be objects. that's three or more dimensions there depending on how complicated the objects are.
Re:Database Evolution by DogDude · 2010-03-12 19:17 · Score: 2, Interesting

I imagine with the continual growth of these social networks, high performance DB methodologies will experience tremendous growth, and perhaps even paradigm shifts in the way we logically think and design database architectures.

Your statement that social networks push databases to their theoretical limits is laughable. Larger, more frequently accessed, more complicated databases have existed for years (decades?) before the current crop of Friendster clones existed. Just because Facebook is the largest, most "high performance" database application that you can think of doesn't make it remotely true.

The problem of dealing with very large, frequently changing databases has been addressed and solved, already. The problem is that most PHP-monkeys have -zero- database knowledge, and instead of doing the work to figure out the right way to do things, they feel like they need to re-invent the wheel. A better solution is to pick up a book written by somebody who's been working with RDBMS' for a few decades. It's not a quick fix, but this problem has already been solved many, many times over.

--
I don't respond to AC's.
Re:Database Evolution by Anpheus · 2010-03-12 22:27 · Score: 2, Interesting

Now, I'm not an expert on database use and don't want to come across as sarcastic, but it's my impression that a lot of the questions that are being asked of these new types of databases simply don't have past analogues, or if they did, they were solved with this sort of approach in an RBDMS, basically using an RBDMS but without the relational part. Hadoop, Google, and all these social networking sites surely aren't all just... confused? Are they?
Please elaborate on how an RBDMS is applicable to what I guess is now called "scaling horizontally", or perhaps more formally known as sharding, or partitioning with redundancy. It's my impression that most of the RBDMS products available today are simply atrocious at this, but if you can point out which books I need to look at, and which products have good support for this sort of scale, I'd love to learn.
Thanks.
Re:Database Evolution by PietjeJantje · 2010-03-12 22:47 · Score: 1

Yes, they are all stupid monkeys, these people running the biggest sites, and you are sooo smart, yet can only point to "a book". The rest of my reply is in a book too, and has been given many, many times over.
Re:Database Evolution by tyrione · 2010-03-13 10:44 · Score: 1

its already multi-dimensional. you have a record, it has keys in it, the values can be objects. that's three or more dimensions there depending on how complicated the objects are.
I guess the original poster doesn't have a classic background in Calculus, Linear Algebra, Linear/Non-linear Programming, basic Physics, Finite Automata and Discrete Mathematics, but holds their knowledge to learning SQL.
Re:Database Evolution by cervo · 2010-03-14 03:58 · Score: 1

DB2 contains a shared nothing architecture for partitioning horizontally. Back in 1999 I worked for a company that had like 20 nodes for about 10 TB of data. Now computers are significantly more powerful, so those 20 nodes could handle much more data. Also you can always throw more nodes at it. Each table had a partitioning key which would determine how the data is split. The partitioning key does not seem that different from something like Dryad LINQ's scheme for partitioning tables.

OF course I think the big difference is that there aren't really good open source solutions. You have to pay a lot for IBM DB2. I have also heard people swear that Oracle can handle massive amounts of data. Even a normal RDBMS implementation of Oracle costs a fortune. I'm sure whatever their clustering solution is cannot be obtained by the average company..... Microsoft SQL Server doesn't seem to have a good clustering solution just yet.

MySQL is practically free, assuming you don't subscribe to Monty's bastardized reading of the GPL. PostgreSQL is also free....... IBM DB2 not so much.... I have seen some clustering for both MySQL and PostgreSQL but the consensus seems to be that it is not adequate.

Away from LAMP? by Anonymous Coward · 2010-03-12 14:58 · Score: 3, Insightful

Or away from MySQL? There is a difference.

Re:Away from LAMP? by morgan_greywolf · 2010-03-12 15:22 · Score: 1

Considering that LAMP stands for Linux + Apache + MySQL + [ PHP | Perl | Python ], I'd have to say that no, there isn't.

--
My blog
Re:Away from LAMP? by DMUTPeregrine · 2010-03-12 15:31 · Score: 1

LAMP stands for "Linux Apache MySQL PHP". Moving away from MySQL IS moving away from LAMP. In this case, they seem to be moving to LACP. If they had moved to PostgreSQL it might be termed LAPP.

--
Not a sentence!
Re:Away from LAMP? by Anonymous Coward · 2010-03-12 16:11 · Score: 0

Now they just need to switch to FreeBSD so they can use FAPP.
Re:Away from LAMP? by mysidia · 2010-03-12 17:23 · Score: 1

Sorry, LACP is already taken. You Linux folken can't have it, it belongs to the network itself, (IEEE 802.3ad) :)
LAMP was meant to mean Linux Apache Mysql and Perl though.
Re:Away from LAMP? by Loconut1389 · 2010-03-12 18:01 · Score: 1

LACP = Link Aggregation Control Protocol. Already taken. But I'm up for a LAPP dance.
Re:Away from LAMP? by DMUTPeregrine · 2010-03-12 20:17 · Score: 2, Funny

Lamp = one of those things with a lightbulb in. Also already taken.

--
Not a sentence!
Re:Away from LAMP? by nacturation · 2010-03-13 21:29 · Score: 1

Sorry, LACP is already taken. You Linux folken can't have it, it belongs to the network itself, (IEEE 802.3ad) :)
How about saying "Digg got the CLAP"? Surely that's not taken?

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Re:Away from LAMP? by mysidia · 2010-03-14 04:48 · Score: 1

Ok, well, CLAP is free sort of, seems a decent choice.
It's an abbreviation for cleft lip and palate, a physical deformity, but I doubt that there could be any confusion.
No more so than there could be confusion between LAMP and Lamps you plug in to produce light with.
Re:Away from LAMP? by Anonymous Coward · 2010-03-15 21:35 · Score: 0

Cassandra
Linux
Apache
Perl

New acronym in order? by mgkimsal2 · 2010-03-12 14:58 · Score: 5, Funny

From the Digg blog - http://about.digg.com/node/564

"And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP."

Cassandra Linux Apache PHP?

--
creation science book

Re:New acronym in order? by Anonymous Coward · 2010-03-12 15:05 · Score: 3, Funny

Trust me, you don't want the clap!!!!
Re:New acronym in order? by Anonymous Coward · 2010-03-12 16:42 · Score: 0

I use Cassandra Linux IIS TCL
Dont ask how that works out, because every month or so it makes me want to kill someone.
Re:New acronym in order? by solferino · 2010-03-12 16:58 · Score: 1

This reminds me of the original name for the Daihatsu Applause, before they did their complete model name reaction testing.
Re:New acronym in order? by Tablizer · 2010-03-12 17:29 · Score: 3, Funny

[...moving away from LAMP] Cassandra Linux Apache PHP?"
try: Cassandra Ruby Apache PHP

--
Table-ized A.I.
Re:New acronym in order? by Anonymous Coward · 2010-03-12 18:24 · Score: 0

Cassandra Redhat Apache PHP = CRAP!
Re:New acronym in order? by Anonymous Coward · 2010-03-12 21:20 · Score: 0

You were doing so well until the exclamation marks.

The Monty crowd will blame this on Oracle by heathm · 2010-03-12 15:04 · Score: 2, Insightful

This sad thing is that Monty's MySQL fan boys will blame this on Oracle when in reality the move to Cassandra (or other NoSQL databases) is what a lot of web sites should be doing regardless of who holds the MySQL reins.

Re:The Monty crowd will blame this on Oracle by DarkofPeace · 2010-03-12 15:12 · Score: 1

True, but sometimes a friendly nudge don't hurt.
Re:The Monty crowd will blame this on Oracle by Taco+Cowboy · 2010-03-12 19:58 · Score: 1

... in reality the move to Cassandra (or other NoSQL databases) is what a lot of web sites should be doing regardless of who holds the MySQL reins
Please pardon me for being dense ...
Can someone please tell me why web sites should move to Non-SQL databases from SQL?

--
Muchas Gracias, Señor Edward Snowden !
Re:The Monty crowd will blame this on Oracle by PietjeJantje · 2010-03-12 23:04 · Score: 5, Insightful

You have to understand the slashdot memes. These are constructed around the state of technology over a decade ago. So, PHP is always bad, Javascript and Ajax are always bad, and when someone mentions MySQL, the karma whores come out to bash it and mention PostgreSQL. They don't need an argument, the authors and upvoters are operating in old-man auto-bot mode. Like I said, it typically involves notions which were fixed years ago if they did exist to begin with. These are elitist-wannabees, using simple rules of engagement, to show you how smart they are. Similar to grammar nazi. It is actually a quite lower-class thing to do. As Hannibal Lecter would say, you have to wonder if they still hear the lambs screaming.
Re:The Monty crowd will blame this on Oracle by Exitar · 2010-03-13 02:46 · Score: 1

DB fanboism?
Now I've really seen everything!
Re:The Monty crowd will blame this on Oracle by tyrione · 2010-03-13 10:47 · Score: 1

You have to understand the slashdot memes. These are constructed around the state of technology over a decade ago. So, PHP is always bad, Javascript and Ajax are always bad, and when someone mentions MySQL, the karma whores come out to bash it and mention PostgreSQL. They don't need an argument, the authors and upvoters are operating in old-man auto-bot mode. Like I said, it typically involves notions which were fixed years ago if they did exist to begin with. These are elitist-wannabees, using simple rules of engagement, to show you how smart they are. Similar to grammar nazi. It is actually a quite lower-class thing to do. As Hannibal Lecter would say, you have to wonder if they still hear the lambs screaming.
You managed to not answer his technical question and take a pot shot at a much more robust database solution to boot.
Re:The Monty crowd will blame this on Oracle by mjwalshe · 2010-03-13 10:53 · Score: 1

Hannibal Lecter was one generation from the slums a typical tory boy
Re:The Monty crowd will blame this on Oracle by jpkunst · 2010-03-13 23:14 · Score: 1

Exactly right. I'm sure PostgreSQL is a great database but thanks to its relentless Slashdot troll army I've really grown to dislike it and I don't feel any inclination to look into it any further.

Lighttpd by Anonymous Coward · 2010-03-12 15:05 · Score: 0

Why not lighttpd?

so does it use sql or not? by timmarhy · 2010-03-12 15:12 · Score: 1

i can't tell from the 4 lines of text buried in ads that is this supposed article, but i'm guessing this "nosql" still uses an sql database backend?

and why wouldn't a relational database system not be perfect for facebook?

--
If you mod me down, I will become more powerful than you can imagine....

Re:so does it use sql or not? by larry+bagina · 2010-03-12 15:23 · Score: 1

no...

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:so does it use sql or not? by Anonymous Coward · 2010-03-12 18:35 · Score: 3, Informative

i can't tell from the 4 lines of text buried in ads that is this supposed article, but i'm guessing this "nosql" still uses an sql database backend?
and why wouldn't a relational database system not be perfect for facebook?
1) NoSQL databases are just that NO SQL, there is no relational database involved.
2) No relational models are not good for Facebook style data, Facebook uses a lot of trees, networks and graphs, none of which are easy to store in a relational system, Facebook also has a lot of dynamic schema requirements, again SQL does not cope with this well, and at the scale that Facebook operates at they are forced to use techniques like sharding and partitioning of their data sets, at which point a lot of what makes the relational model useful becomes difficult to use, i.e. joins across databases servers are really hard to do etc.
Re:so does it use sql or not? by Anonymous Coward · 2010-03-12 21:28 · Score: 1, Informative

Scale.. Getting mysql to survive in a world where you need hundreds of machines to host the data layer and manage the 100k+ operations per second is actually quite hard. The replication layer is laughable at best and fault tolerance towards disk + machine failure is awful.
These systems have no SQL in them at all. Hence the name. In Cassandra you have what amounts in python to a set of dict objects. Its a large, hashed table that stores key value objects.
Re:so does it use sql or not? by JamesP · 2010-03-13 01:14 · Score: 1

No... as in NoSql
But actually, in some cases NoSQL is used in a way that uses a MySQL/PostgreSQL backend (in a totaly no-sql way, just for the replication/ backup benefits)

--
how long until /. fixes commenting on Chrome?

Which DB is better? by Taco+Cowboy · 2010-03-12 15:13 · Score: 1

I too have a site running on MySQL and I am thinking of switching.

Can anyone tell me if there is any "comparison chart" listing the various features / usability of the various OSS DB packages available so I can make a better educated decision?

Please help !

Thank you !

--
Muchas Gracias, Señor Edward Snowden !

Re:Which DB is better? by Anonymous Coward · 2010-03-12 15:28 · Score: 3, Insightful

If you need a comparison chart... you don't need to switch.
It's probably not necessary to change such a huge part of your architecture if it's not worth investing serious time investigating and benchmarking the alternatives.
Re:Which DB is better? by Taco+Cowboy · 2010-03-12 15:32 · Score: 1

Thank you for the reply.
It's not "switching for switching sake".
The reason for switching is simple: When my site first launch, MySQL is more than enough for it.
As it grows and grows, it's taxing MySQL more and more and right now it's already at the brim.
So ... Anyone has any info on where to look for a "comparison chart" or anything like that?
Please help !

--
Muchas Gracias, Señor Edward Snowden !
Re:Which DB is better? by h4rr4r · 2010-03-12 15:34 · Score: 5, Informative

Postgres, for people who care about their data.
Re:Which DB is better? by larry+bagina · 2010-03-12 15:40 · Score: 2, Insightful

you should probably look at what queries you're running and what the planner/optimizer is doing with them to verify the problem is mysql and not your schema and indexes.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Which DB is better? by WrongSizeGlass · 2010-03-12 15:41 · Score: 1

The whole NoSQL concept takes a little getting used to. I'm not knocking by any means, I've just been using the whole relational model for decades and need to digest this new approach before I can fully embrace it.

You can try this wiki page for an explanation of the concept.
Re:Which DB is better? by QuoteMstr · 2010-03-12 15:47 · Score: 3, Informative

The page you cited, on column-oriented databases, describes an implementation strategy that's applicable to many types of databases. There are database engines that present a perfectly normal SQL interface to a column store, and there's actually a direct link to LucidDB from the article. Likewise, there's nothing stopping a Cassandra-like database from serializing its on-disk bits the other way around.
Column-orientation has nothing to do with the "NoSQL" databases that are in vogue. It's completely orthogonal. You're talking about using vectors or linked lists when everyone else is arguing over whether to serialize data with XML or JSON.
Re:Which DB is better? by itzdandy · 2010-03-12 15:56 · Score: 1

What is at the brim? do you think that you have a performance issue? Considered a master/slave/slave/etc cluster? Do you do a ton of reads and few writes or many writes?
It's hard to say that mysql is at the brim without some explanation.
Re:Which DB is better? by RelliK · 2010-03-12 16:13 · Score: 5, Informative

Go with PostgreSQL. Reliable, standards-compliant, fast.

--
___
If you think big enough, you'll never have to do it.
Re:Which DB is better? by Bill,+Shooter+of+Bul · 2010-03-12 17:14 · Score: 2, Insightful

Note: Facebook, twitter, digg: they aren't moving to postgreSQL. Its not better enough to make any kind of difference for that kind of a scale. They don't need features, they need speed.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
Re:Which DB is better? by QuoteMstr · 2010-03-12 17:20 · Score: 3, Insightful

First of all, if he's asking Slashdot for advice (which is barely a step above reading tea leaves [which itself is a step above asking 4chan]), he doesn't need Facebook-level scalability.
Second, you're confusing scalability and performance. Scalable solutions tend to actually be slower than non-scalable ones: the difference is that a scalable system increases in capacity linearly with the number of machines you throw at it ("horizontal" scalability), whereas a fast non-scalable system generally needs the same number of faster, individual machines to increase capacity ("vertical" scaling).
Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally.
Re:Which DB is better? by Anonymous Coward · 2010-03-12 17:40 · Score: 0, Insightful

If you need a chart to help you pick the best database for your site God help you. Either hire a DBA or just stick with LAMP.
Re:Which DB is better? by Billly+Gates · 2010-03-12 18:51 · Score: 3, Informative

PostgreSQL is a real relational database that support views, nested sql, triggers, foreign keys, and even statistical analysis.
I think Mysql supports foreign keys now and my info might be dated. But if a database does not support foreign keys then its not a real relational database and mysql had that problem for years.
Once switching over you can find out how hard processor intensive tasks that took minutes can be done easily in seconds with the features I described above with PostgreSQL. You can save alot of speed with complex queries with PostgreSQL.

--
http://saveie6.com/
Re:Which DB is better? by TheLink · 2010-03-12 19:26 · Score: 1

You have to figure out whether your company and user base are the sort that might grow fast or not.

If you're only at the "brim" now with MySQL and you are only growing 10-30% every year, just switch to a better RDBMS product and your needs might be well taken care of by Intel, AMD, Broadcom, Cisco and the SSD/storage manufacturers for the next 5-10 years.

If you are growing really fast, then sure you need something that really scales well horizontally. Horizontal scaling comes at a cost though.

Just look at facebook, google, amazon, ebay etc. There seem to be about as many different custom solutions as there are these sort of "internet companies" (if not more :) ). So what works for Google might not work as well for Ebay.

However the concepts of scaling out, dealing and processing lots of data are common, so you might just poach a few good people from those companies so that they can set up a system that works well for your requirements. If you are growing that fast, they'll be worth the investment.
--
- Too many replies beneath your current threshold
Re:Which DB is better? by Taco+Cowboy · 2010-03-12 19:49 · Score: 1

Note: Facebook, twitter, digg: they aren't moving to postgreSQL. Its not better enough to make any kind of difference for that kind of a scale. They don't need features, they need speed.
That is exactly what I am looking for ---
There are so many DB offerings, open-source and/or close-source, however, till now, I can't find a comparison chart or similar thing comparing the various DBs.
Of course, I can talk to the sales rep. But as sales rep goes, they will tell you that the product they sell is the best.
I don't only need speed, but also to think of scalability, plus I also need to know what type of features are they offering.
Please help !
Thank you all !

--
Muchas Gracias, Señor Edward Snowden !
Re:Which DB is better? by alexkorban · 2010-03-12 19:58 · Score: 4, Informative

I have worked with large PostgreSQL databases (150GB or so) and really, Postgres isn't a solution. You run into issues anyway when some of your tables contain millions or even billions of rows. At that stage things like vacuuming or altering the schema start to become damn near impossible, and even querying starts to become a bottleneck.

Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.

Based on my experience, I don't think that relational databases are appropriate for really large databases, and at present the only realistic option is horizontal scaling which is a lot easier with things like Cassandra or MongoDB.

--
Free posters and articles for business analysts and project managers
Re:Which DB is better? by fotoguzzi · 2010-03-12 20:24 · Score: 1

I think this is a troll. Four-digit slashid. Vague Leading question.

--
Their they're doing there hair.
Re:Which DB is better? by Anonymous Coward · 2010-03-12 20:42 · Score: 0

I think Mysql supports foreign keys now and my info might be dated. But if a database does not support foreign keys then its not a real relational database and mysql had that problem for years.
Nice troll - the thread you linked to contains NO discussion of foreign key support as far as I can see.
That and the fact that MySQL's lack of foreign key support was for a storage engine that they didn't fucking own!
Re:Which DB is better? by shic · 2010-03-12 21:25 · Score: 1

My two-pence...
I don't think you'll ever find a useful comparison chart - because, while charts exist, they won't focus on what your application needs. Your application is the issue - not the DBMS choice; you need to think about what interactions you need (want) from an application perspective - then look for a solution that meets that specification. A lot depends upon granularity of transactions; on the volumes of data per transaction; on the volumes of data over time; on acceptable latency; on acceptable redundancy and fault tolerance - etc. etc.
The package I'd advise you to look at, is Oracle-Embedded (which has an open source licence, as well as a commercial one) which moves you out of the realm of relational databases all together. If this is a suitable solution, it suggests an entirely new way to think about what sort of back end you really need.
In summary, no-one else's comparison of RDBMS is likely to be relevant to your domain. All such comparisons are extremely fragile in the context of application demand.
Re:Which DB is better? by Anonymous Coward · 2010-03-12 21:36 · Score: 0

And have a low number of queries per second.
Re:Which DB is better? by RichardJenkins · 2010-03-12 21:47 · Score: 1

There's one on wikipedia http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
Really though, this probably won't be a whole lot of help in deciding if you need to switch to any DB listed in my experience.
Re:Which DB is better? by Anonymous Coward · 2010-03-12 22:02 · Score: 0

Postgress all the way. It's like night and day compared with mysql. It may be incredible but on my workload (tables with nearly one billion records) on the same machines, Postgress works on average 400% faster. You almost cannot buy this kind of difference with money.
Re:Which DB is better? by roman_mir · 2010-03-12 22:12 · Score: 2, Informative

I just read your comment and checked the PostgreSQL DB I am working with, it's only 1.7GB at this point, but growing, and the most rows in a table is 12,6 million. This DB is heavily used by a number of background processes, which select, insert, update and delete large volumes of data and by 14 people at this point, who run about 400 various reports per day each as well updating some data. The average time that a single user has to wait is 6 seconds per report. Those reports are optimized of-course, but they normally span between 1 day to one month worth of sales data, average being 1 week, while in a day there are on average 5000 sales (the DB grows by that number of sales a day, plus various other product data, client data etc.) (the db is on a single quad-core 5504 Intel, 12GB of RAM, RAID 1 on Intel's 160GB X25 SSD (2 of them) and it's a Gigabit network. This DB is used by the app server, which is a 2 x 4quad core 5405 Intels, 16GB RAM, Java 6 and Tomcat 6 for the front end, with a number of back end systems also talking to the DB from the App server.
My point is that for this given setup, PostgreSQL is showing good performance, however I am sure there are differences in the data model setup that really can kill or make the DB work.

--
You can't handle the truth.
Re:Which DB is better? by alexkorban · 2010-03-12 22:33 · Score: 2, Informative

Oh, absolutely, I'm not surprised that your setup works well, Postgres is a great RDBMS. Of course, how you design your schema matters a great deal too.

But here is another issue I thought of: backup. For our database it was 24 hours to do a full restore, which isn't practical. The only reasonable solution I know is to use replication, which is a nuisance with Postgres and adds maintenance overhead (keeping the schemas in sync). I'd prefer to have built-in redundancy. Again, I think you get that with Cassandra and MongoDB.

I guess in a few years we'll probably end up with something that combines good properties of both key-value stores (redundancy and scalability) and RDBMS (powerful query language, transactions).

--
Free posters and articles for business analysts and project managers
Re:Which DB is better? by MemoryDragon · 2010-03-13 01:16 · Score: 1

Problem with Postgresql in web szenarios always has been and will be the replication, sure there is Slony, but that one is definitely subpar to what mysql delivers in the replication area. All other aspects are way superior.
But for sites like Twitter and Digg which probably just have large tables and not many of them moving to a non sql db which has a strong cloud and replication infrastructure might be the better option than to go to the next big relational DB (which then has deficits in the area where it counts for them)
Re:Which DB is better? by Taco+Cowboy · 2010-03-13 01:33 · Score: 1

Many, MANY THANKS for your reply !

you need to think about what interactions you need (want) from an application perspective - then look for a solution that meets that specification
What you've said is very true !
Since I am more of a user than a DB programmer, I have yet to find a way to look for the right solution without knowing which package has which features.
That was why I was looking for a "comparison chart" or something similar, to enable me to, at a glance, know which package has what, and which hasn't.
MANY THANKS AGAIN for your reply !! I sure will take a look at Oracle.

--
Muchas Gracias, Señor Edward Snowden !
Re:Which DB is better? by Anonymous Coward · 2010-03-13 02:51 · Score: 0

Welcome to 1998, asshole.
Re:Which DB is better? by DarkOx · 2010-03-13 04:19 · Score: 2, Informative

A good RDBMS engine and as much as people Poopoo MSSQL server its a good engine. I have used it for databases in the 150TB range. If you do your schema right, your indexes correctly, plan your partitions and file groups well you can great performance out of affordable hardware. Now you do need to maintain this thing or develop the automation around building those partitions and moving data into and out of them based on tombstones or some other criteria or your get underwater real fast.
I don't care what technology you pick if you are going deal with that much data you need to:
1.Understand the problem well
2.Spend the time with whatever tools you select to really understand how they work and build whatever you need to fill in where they are deficient.
When you start doing anything that big its not plug and play anymore no matter how you go about it.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Which DB is better? by PhilDin · 2010-03-13 04:59 · Score: 1

I can't help asking, what (if anything) did you do to keep your system running in light of the chunky scaling requirements? As an aside, it would be nice to see some "recipes" from the Postgresql developers on how to make your installation scale under different types of usage profile.I haven't seen anything like this around.

--
Mia kusenveturilo estas plena da angiloj
Re:Which DB is better? by SanityInAnarchy · 2010-03-13 05:21 · Score: 1

plan your partitions and file groups well you can great performance out of affordable hardware. Now you do need to maintain this thing or develop the automation around building those partitions and moving data into and out of them...
At which point you're basically killing a lot of the advantages of relational databases, and reinventing a lot of the advantages of the newer databases. At some point, it makes sense to move off of RDBMS altogether.

--
Don't thank God, thank a doctor!
Re:Which DB is better? by Anonymous Coward · 2010-03-13 05:26 · Score: 0

I have worked with large PostgreSQL databases (150GB or so) and really, Postgres isn't a solution. You run into issues anyway when some of your tables contain millions or even billions of rows. At that stage things like vacuuming or altering the schema start to become damn near impossible, and even querying starts to become a bottleneck.
Now how do you scale that if your database is still growing? Postgres doesn't have a decent clustering solution that I know of, so your options are either to roll your own, or to scale vertically. Both of those are expensive options.
If you partition your tables (yes, you can do that in PostgreSQL) properly, you shouldn't be having single tables with billions of records in them. Vacuuming smaller ones is acceptable.
Re:Which DB is better? by teknopurge · 2010-03-13 07:09 · Score: 1

mod parent up.

--
Website Hosting
Re:Which DB is better? by Anonymous Coward · 2010-03-13 07:44 · Score: 0

They don't need features, they need speed.
They need scalability. Speed might be a welcome byproduct.
Re:Which DB is better? by wshs · 2010-03-13 07:59 · Score: 2, Insightful

Putting a proxy between the client and the server to handle the replication does not make Postgre horizontally scalable. Nor does doing a periodic table dump and copying it to the other machines. Postgre might be a ton more efficient than MySQL, but it is in no way scalable.
Re:Which DB is better? by Bill,+Shooter+of+Bul · 2010-03-13 08:58 · Score: 2, Informative

While insightful and informative in its own right, that isn't a logical response to my post.
He was asking for an alternative to Mysql. I was pointing out that moving from mysql to postgresql was not done by large companies with a lot of smart people working for them, because any performance improvements were not worth it.Postgresql's vertical and horizontal scalability did not represent an improvement over mysql. I didn't even mention vertical vs horizontal scalability. In the end you end up with a raw number saying we can handle X many requests in our total system, regardless of the individual performance numbers of any part of the system.
You're right he probably isn't the lead engineer of flickr and probably doens't need cassandra's power, but I think it really says something that while a lot of these companies are switching away from mysql, they aren't switching towards postgresql. But as always, anyone considering any kind of switch must do their due diligence in assessing the potential performance improvements of any new solution.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
Re:Which DB is better? by alexkorban · 2010-03-13 10:24 · Score: 1

One thing we did was upgrade from Postgres 8.1 to 8.3. From what I read, 8.1 performance degrades rapidly with multiple concurrent long queries. 8.3 also has more efficient storage, which helps with the main problem - hard drive throughput. IIRC, we got about a 10% improvement in query times with 8.3.

We also had two databases on one server, so the other thing that helped a lot was to run them on two separate servers. The largest table we had was clustered by one of the fields which made queries on that field fast. We didn't use autovacuuming and instead vacuumed overnight. A hardware upgrade also helped. We did some query profiling and made sure everything was indexed appropriately. None of this is rocket science of course, and just shows that as your database grows you have to get more and more involved in ensuring good performance.

We investigated vertical scaling with a better, more expensive server, and that would have helped for a while, but the database was projected to double in size in 1-2 years, so that would be no more than a stopgap measure. The conclusion I came to was that we had to move away from standard relational databases. One option was to use sharding (but I think sharding is a workaround for RDBMS limitations, so I don't like it that much), and the other option was to use something like a key-value store that can scale horizontally. Unfortunately, I didn't stay at the company long enough to implement this, so I can't tell you which of those would be a successful solution.

--
Free posters and articles for business analysts and project managers
Re:Which DB is better? by alexkorban · 2010-03-13 10:38 · Score: 1

I agree that there isn't a plug and play solution for large amounts of data (at least not yet), and of course doing things right helps immensely.

I still think that things could be a lot easier than what we have with the current generation of RDBMS. As an example, Skype uses Postgres but they have to jump through a lot of hoops to make it work for them. For one thing, they can't just run SQL queries anymore, and they have to maintain the shards somehow (e.g. they probably need a way of balancing them). Backup/restore probably isn't viable for them either, so they must have implemented some form of redundancy. Another limitation is that with shards you need to route all queries through an indexing server which can also become a bottleneck. In short, this is a very difficult problem to solve.

The appropriate solution also depends on the structure of your data. For example, in my case we had a massive table with hundreds of millions of rows that dwarfed everything else, and we did relatively simple queries on the data. A more suitable dataset for RDBMS would have a lot of tables with roughly the same number of rows in them, where you run queries with lots of joins and filters.

I'm actually curious what the data in your 150TB database was like and what sort of hardware was required for it.

--
Free posters and articles for business analysts and project managers
Re:Which DB is better? by petit_robert · 2010-03-13 15:09 · Score: 1

You must have been doing something wrong :
http://www.computerworld.com/s/article/9087918/Size_matters_Yahoo_claims_2_petabyte_database_is_world_s_biggest_busiest
Re:Which DB is better? by rycamor · 2010-03-13 15:53 · Score: 1

While insightful and informative in its own right, that isn't a logical response to my post.
He was asking for an alternative to Mysql. I was pointing out that moving from mysql to postgresql was not done by large companies with a lot of smart people working for them, because any performance improvements were not worth it.
That really depends on the kind of company and the kinds of applications involved.
MySQL can probably still handle a higher # of raw queries per second *using MyISAM tables* than PostgreSQL, however once you test with a table type like InnoDB, which has to support such things as constraints and transactions, the MySQL speed advantage is lost.
Many large web companies don't really need transactions, MVCC, constraints and other such niceties because the type of data they work with doesn't demand such precision. That's a perfect answer as to why many of the are investigating non-relational alternatives. However, a company doing business applications involving money, or other critical, non-trivial data can't afford to take such chances with data consistency.
Developers need to understand the real nature of the trade-offs involved in all these things. NoSQL type systems can be fine as long as you don't need complicated ad-hoc queries, provable consistency or mission-critical business rules. Once you start needing those things, though, you ignore relational databases at your peril.
Of course, there's always another way to skin a cat. It is possible to work with a hybrid set-up, such as relational database at the core for business rules and consistency, but high volume demand-driven data cached out to NoSQL systems. For example, in 2008 Yahoo created a custom PostgreSQL hybrid with cloud storage backend to handle clickstream data, weighing in at 2 petabytes. I can only imagine how big it is now.
Re:Which DB is better? by rycamor · 2010-03-13 16:02 · Score: 1

MySQL replication barely deserves the name. It plain and simple DOES NOT guarantee data consistency. Maybe it's nice and easy to work with, but I would never trust mission-critical data to it.
I know that replication has been the sore point of PostgreSQL, mainly because there were several different approaches being developed--all provided as external solutions, and it could be confusing to try and wade through all the data to find the best option. Fortunately, version 9.0 (moving from alpha to beta soon) will come with built-in hot standby and streaming replication.
Re:Which DB is better? by Bill,+Shooter+of+Bul · 2010-03-13 18:49 · Score: 1

Very true. The best tool is always dependant on the particular application. My whole point in my first reply was essentially saying that the previous post had nothing to do with the story. its not a question of what is the best relational system. But really a question of how much of the relational model can you afford at high volume? These companies were not in absolute need of relational database features. It would have been easier if relational systems could scale as well as nosql systems. The relational model is very nice and has many great features.
And FYI, those companies were not using MyIsam. They were using innodb. There are many situations in which innodb is faster than MyIsam, even if you aren't using transactions, due to non blocking reads.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.
Re:Which DB is better? by dave87656 · 2010-03-13 19:20 · Score: 1

I don't jave a chart but I can tell you that we switched to PostgreSQL for performance reasons and haven't regretted it. IMHO Postgresql is significantly faster for transaction environments.
Re:Which DB is better? by dave87656 · 2010-03-13 19:26 · Score: 1

he doesn't need Facebook-level scalability.
True, and I suspect he doesn't need the complexity of a key/value solution. It sounds simple at first, but as soon as your application gets any kind of sofistication, it might get dicey.

Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally
For almost all real world applications PG will be more than sufficient. How many applications have facebook or twitter volume?
Re:Which DB is better? by dave87656 · 2010-03-13 19:31 · Score: 1

That and the fact that MySQL's lack of foreign key support was for a storage engine that they didn't fucking own!
It really doesn't matter if MySQL owned InnoDB or not. It's now all from the same company and you can bet the Oracle will not let MySQL compete with their bread and butter. InnoDB will remain limited. Good enough to get the low end but real applications will have to pony up.
Re:Which DB is better? by dave87656 · 2010-03-13 19:38 · Score: 1

The package I'd advise you to look at, is Oracle-Embedded (which has an open source licence, as well as a commercial one)

Isn't that Berkely-DB?
I ran some tests on a simple key-value store and it is extremely fast and the Java version allows you to store java objects and read them without having to convert your data to and from DB fields.
The problem I found is that it has no network access. If your application is truly embedded, it's an excellent solution and very fast (tens of thousands of stores per second on small amounts of data).
Re:Which DB is better? by alexkorban · 2010-03-13 21:13 · Score: 1

I think you missed the part where it says "based on a heavily modified PostgreSQL engine". I'm aware of Yahoo's database, and there's no way you can say that it's a "Postgres database". This was my point right from the start: when you have a lot of data, you are forced to move away from a stock standard RDBMS and do something else.

--
Free posters and articles for business analysts and project managers
Re:Which DB is better? by nagnamer · 2010-03-13 22:15 · Score: 1

Oracle for people who care more about their data then about the budget. :)

--
Every harsh word you utter has the right address. It only sounds harsh because the one on the envelope is the wrong one.
Re:Which DB is better? by Dalroth · 2010-03-14 03:00 · Score: 1

Your memory is nearly 10x the size of your database. Come back when your database is 1000x the size of your RAM.
Re:Which DB is better? by lemonjelo · 2010-03-14 06:10 · Score: 1

Been waiting to see mention of this without having to read up on nosql yet =) it seems there could be a db engine that provides sql features backed by nosql data stores when things mature there.

--

pimtamf
Re:Which DB is better? by airjrdn · 2010-03-15 01:27 · Score: 1

I'm glad you posted an actual size. I'm always curious what others feel is a large database, to me, 150G isn't overly large at all. The company I work for processes phone records, where we receive about 6M new records each day. Depending on the database (some for KPI metrics, some for warehousing duties, etc.) the sizes vary dramatically, but our "monthly" databases are typically in the 250G range, while our largest is currently just under 3TB. I checked what I believe is our largest table, and it currently has just under a billion records. It's been higher in the past, but we've lowered the # of months of data stored there recently. We are a Microsoft shop, and all of this is in SQL Server. For normal storage/queries, a decent SQL box will suffice depending on how many users are hitting it, etc., but we built a custom distributed processing system to do the actually processing work.

--

My Tech Posts on Twitter

Re:Good for them by Bill,+Shooter+of+Bul · 2010-03-12 15:14 · Score: 3, Insightful

100% of hosting companies do not have twitter, facebook, reddit, or digg as their clients. Its a different market. Mysql does have a competitor in this space called PostgreSQL. Its pretty good. Pretty much every hosting company I would consider doing business with also offers it. But again, PostgreSQL wouldn't have saved the day for these companies, they've reached a different sector of the market due to their enormous scale.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

What about Slashdot? by Futurepower(R) · 2010-03-12 15:22 · Score: 1

Will Slashdot switch?

Re:Wow... by Thantik · 2010-03-12 15:30 · Score: 1

You couldn't even be bothered to read up on what ANT actually was, could you...

"Ant is a Java-based build tool. In theory, it is kind of like Make, without Make's wrinkles and with the full portability of pure Java code."

Re:Wow... by Anonymous Coward · 2010-03-12 15:33 · Score: 0

I know exactly what ant is. I use it on a regular basis. I was pointing out that cassandra uses Java to at least some extent, which is disgusting (which is proven by the fact that jdk is part of the dependencies for it with apt).

Re:Wow... by FooAtWFU · 2010-03-12 15:34 · Score: 1

Well, I don't know too many people who program in C and use Ant. And a glance at the FAQ implies it's Java-based (it talks about the JVM a bit).

I guess Cassandra just isn't really targeted at the market segment where the overhead of a JVM would make much of a difference, even if it would make redundancy easier.

--
The World Wide Web is dying. Soon, we shall have only the Internet.

Linux Apache MongoDB [PHP | Perl | Python] by Anonymous Coward · 2010-03-12 15:35 · Score: 0

MongoDB is another "NoSQL" solution. You can still have LAMP. I think they do a disservice to the LAMP stack when lumping it in with their issues with MySQL. (unless of course they really are getting rid of Linux, Apache and PHP too.

Why? by toastar · 2010-03-12 15:35 · Score: 1

So what's the advantage of switching?

I have a policy of if it ain't broke don't fix it

Re:Why? by Miseph · 2010-03-12 16:59 · Score: 1

Presumably the advantage is that what they have now doesn't work well and that they are concerned it will continue to work less and less until some arbitrary point in the future where they would have to declare it no longer works at all, and that what they're changing to seems to resolve the issue.
Call me crazy, but I'm pretty sure that somebody at Digg is aware of that particular catchphrase.

--
Try not to take me more seriously than I take myself.
Re:Why? by mysidia · 2010-03-12 17:09 · Score: 2, Insightful

A bad policy when dealing with your data.
Once it's broke, it is way too late.
You can't un-LOSE the past 6 hours of transactions or table referential integrity that MySQL trashed, due to an unclean shutdown.
MySQL's great until it comes up to bite you in the arse.
Re:Why? by berzerke · 2010-03-12 18:26 · Score: 1

...Call me crazy, but I'm pretty sure that somebody at Digg is aware of that particular catchphrase...
Well, I've seen switches to new software and OSs because a new exec decreed it, regardless of how well what they had was working. Could be he (or she) got kickbacks, or some smooth talking salesman pulled the wool over his eyes, his son got a job with the new company, etc. Change is sometimes made for political reasons rather than technical ones.
While I see no evidence that was the case this time, it has happened, and will happen again.
Re:Why? by TheLink · 2010-03-12 20:26 · Score: 1

Yes, but how fast are they growing?

If they aren't growing that fast, they don't need to scale that fast. Then they could just switch to Postgresql, and then ride comfortably behind the cutting edge of Intel and friends. Lots of companies aren't growing faster than the hardware performance increases. If you're a widget maker, you may grow fast, but not that fast, and maybe only fast in the initial stages.

I can see lots of scenarios where MySQL would be hitting limits, but where Postgresql would be OK.

In my opinion, MySQL never worked well, at best it worked OK. So many MySQL features were mutually exclusive - you want read speed, you can't have transactions, you want concurrent write speed you can't have full text search.
--
- Too many replies beneath your current threshold
Re:Why? by DarkOx · 2010-03-13 04:22 · Score: 1

One of the many reasons you backup your data right?

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Why? by mysidia · 2010-03-13 05:17 · Score: 1

You don't get it, apparently. When I say you lost 6 hours of transactions, that means you restored to your most recent backup (6 hours or so old)
When you are running an E-Commerce site, that could be thousands of dollars in lost orders.
Even worse... someone cancelled their account, but your database went 6 hours back in time, before they clicked 'Cancel account', so not only did the cancel not get through, but they thought it did, and they'll be pissed off when you bill next month.
There are a lot of applications where you can't afford to lose minutes of data, let-alone hours.
MySQL doesn't provide the level of data integrity of other DBMs.
Heck, it doesn't even really support an automated recovery procedure.
Its limited (and not very good), and not really usable on a large scale, point in time recovery capabilities are basically unknown to 90% of its users.
With PostgreSQL you get a lot of data integrity protections by default.
You don't have to make a decision to switch to InnoDB to get basic integrity protections and basic transactional support.
In MySQL you have to. And InnoDB performance sucks, and generally can't be used for large DBs, they are almost always MyISAM.
Re:Why? by wshs · 2010-03-13 08:05 · Score: 1

A good DBA knows to back up his database via replication, not daily dumps.
Re:Why? by mysidia · 2010-03-13 08:40 · Score: 1

A good DBA knows to back up his database via replication, not daily dumps.
No they don't. Replication is a load distribution technology. For large database, it is not a data backup strategy.
With replication, if your DB encounters a software error, or the hacker issues 'DELETE ALL'... there goes your web site.... You need point in time recovery. Your replication setup will faithfully replicate just the error that causes you to need a restore.
And there you go (in MySQL), right back to restoring a dumpfile, only now you have two or more servers to fix instead of one, problem magnification, oops!
MySQL does not support true replication or synchronous mirroring. Only asynchronous transaction duplication, by the way.
And MySQL's "replication" feature has some serious problems with it (granted PostgreSQL doesn't even include a replication feature in the current version, you pick between a few external add-ons or commercial solutions, some of which are fully synchronous).
Oh yeah, as for the problem's with MySQL's replication feature... it doesn't provide an assurance of transactional integrity. It doesn't replicate all statements correctly
There can be errors in the replication.
An error in your application could cause it to make changes on the backup/replica server instead. In this case, the replica and the source dataset won't match -- MySQL replication does not implement a multi-master functionality, all modifications need to be done on the source ("master") which propagate to the slave, but MySQL doesn't help in any way for you to ensure a mistake doesn't foobar your replica.
Bringing replication into the mix just underscores more ways in which MySQL is lacking (in terms of data integrity), and could use some serious improvements.
MySQL brings in these binary log things, it's actually a mess since it keeps multiple sets of binary logs (master log VS relay log), but lest I digres...
Re:Why? by wshs · 2010-03-13 08:57 · Score: 1

MySQL does support multi master replication, and it even has auto increment offsets. Not sure if older versions support point in time recovery, but with periodic backups (peh), replication, and query logging, you can achieve the goal quite easily.
Re:Why? by mysidia · 2010-03-13 09:37 · Score: 1

Um "auto-increment offsets" are a kludgy workaround for the fact MySQL doesn't support multi-master replication.
You can't perform for example, "UPDATE xytable SET x = 5 WHERE y = 2 And category=6" on a server and get a usable result. You cannot guarantee no additional (conflicting) row will be inserted into 'xytable' during your update.
MySQL "replication" is not synchronous, and therefore, you cannot determine insertion order, or even transaction order.
Due to the fact that your "SELECT FOR UPDATE" or "UPDATE" statement only locks the table for writes on the local mysqld.
There is no such thing as remote locking in MySQL.
Don't be fooled by the simpleness of that example though. Integrity of the data and database transactions are a very serious issue. MySQL has no multi-master replication mechanism that is suitable for use in an OLTP system, sorry.

Allergic reaction to MySQL by QuoteMstr · 2010-03-12 15:39 · Score: 5, Insightful

These slides present a balanced and comprehensive overview of the current state of free databases. Whether you're in the NoSQL camp or not, they're worth reading.

That said, here's my take:

It's currently fashionable to replace MySQL with some "NoSQL" database or other. This trend is driven by two factors:

MySQL's community is fragmenting into several forks as Oracle purchases the rights, which created the impression that MySQL's development is entering a riskier, unstable period.
"NoSQL" is the technology buzzword du jour in the Bay Area. It's difficult to overstate the impact of social forces on technology choice: most technology selections are governed more by what our friends say than by an impartial and disinterested weighing of merits.

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL. There's a world of difference between it and MySQL, and condemning all relational database systems because of bad experiences with MySQL is like condemning all sandwiches because McDonalds once made you sick. In giving up RDBMSes entirely, these developers lose quite a bit of safety, flexibility, an convenience. It's a huge over-reaction.

This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use: it should be about choosing the best tool for the job. And I believe that in the vast majority of cases, the advantages conferred by a relational system --- enforced integrity, interoperability based on SQL, query flexibility, storage flexibility --- make an RDBMs the best choice for almost any job. If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.

Re:Allergic reaction to MySQL by __aasqbs9791 · 2010-03-12 16:11 · Score: 1, Offtopic

I think it comes down to the sad fact that most people aren't good at their jobs. They tend to rise to one level above where they are actually competent, and stay there. And from my experience, they aren't usually very happy in whatever that position is, which (and IMHO) might be the reason that people in modern societies are often less happy (overall) than people in less advanced societies. Not many people enjoy that.
Re:Allergic reaction to MySQL by timmarhy · 2010-03-12 16:19 · Score: 1

go live in a mud hut then if it'll make you happy (i suspect it wont).
society isn't miserable, it's the media telling us we are miserable that has people thinking it. just look at how every single event has to be a crisis or the worst ever of something. and then, a word from our sponsors who sell product X that is the cure for what ales you.
if you want to destress and be happy, go on a total media black out. it's amazing how much less pressure you feel and happier you are if you refuse to read the news or watch the news on TV.

--
If you mod me down, I will become more powerful than you can imagine....
Re:Allergic reaction to MySQL by QuoteMstr · 2010-03-12 16:22 · Score: 1

Actually, compared to people in industrial nations like ours, hunter-gatherers are happier and have more leisure time. After all, that's the environment to which we're biologically adapted. You can make a serious argument that agriculture is the worst thing to ever befall humanity.
Re:Allergic reaction to MySQL by Anonymous Coward · 2010-03-12 16:53 · Score: 0

This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use
Sigh. Most people seem to be stuck on following trends—in pretty much every aspect of their lives. Why think when you can conform to the crowd?
Re:Allergic reaction to MySQL by Tablizer · 2010-03-12 17:38 · Score: 2, Informative

Sigh. Most people seem to be stuck on following trends--in pretty much every aspect of their lives. Why think when you can conform to the crowd?
One can potentially make good money surfing bullshit. It's like the dot-com bubble: get in early, lie about your ability, rake in big bucks, and then get out and move on to the next hype bubble while the last one crashes on those left holding the bag.
However, I do believe there's perhaps a place for big non-relational databases. They tend to be single-purpose and situations were few will care much if a few records are lost every week or so. If you have a million customers who only make money for you from occasional ad clicks, then losing a few dozen due to lack of A.C.I.D. is not going to be a bottleneck from a business standpoint. And the info can be delay-copied into a RDBMS where traditional statistics and reports can be done.

--
Table-ized A.I.
Re:Allergic reaction to MySQL by Anonymous Coward · 2010-03-12 17:44 · Score: 0

I would like some *ale* but I wouldn't like something *ailing* me.
Re:Allergic reaction to MySQL by TubeSteak · 2010-03-12 17:58 · Score: 3, Interesting

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
...
If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.
When you're dealing with TB/PB of data that doesn't require relational capabilities, there's no reason to use a "good relational database like PostgreSQL" when you can dispense altogether with the relational aspect and its performance hit.
NoSQL may seem like the fad-de-jure, but until recently, nobody was working with such enormous dynamic datasets. When you look at the growth of all these hi-tech companies, they did an incredible amount of in-house hacking to develop the software necessary to glue together their enormous hardware infrastructure.

--
[Fuck Beta]
o0t!
Re:Allergic reaction to MySQL by kmike · 2010-03-12 19:10 · Score: 3, Insightful

As several MySQL experts already noted, Digg isn't even using the indexes that provide maximum performance in the query that they present as problematic for MySQL:
http://mysqlha.blogspot.com/2010/03/index-only.html
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/
So you are right about the NoSQL fashion trend. Looks like for some companies it's easier to throw a pile of cheap commodity hardware driven by some NoSQL BigTable-wannabie at the problem instead of carefully optimizing queries and indexes for the best performance.
Re:Allergic reaction to MySQL by Billly+Gates · 2010-03-12 19:13 · Score: 1

Its not only that we have less leisure time but the fact that our worth is based on money. Inflation is very high if you count insurance, food, rent/mortgage, and gas prices (economists don't count this) and depressing wages and you have misery.
There is always someone richer than you who is busy trying to take away what you have.

--
http://saveie6.com/
Re:Allergic reaction to MySQL by Billly+Gates · 2010-03-12 19:20 · Score: 1

If you do a search you need some relational abilities. If PostgreSQL can not handle this then Oracle can. Is it just me or alot of the NoSQL databases are reimplementing SQL to make up for the shortcomings?

--
http://saveie6.com/
Re:Allergic reaction to MySQL by jrumney · 2010-03-12 19:20 · Score: 5, Insightful

I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL.
The adopters of NoSQL deal with huge volumes of worthless information. They don't care about transactional integrity as much as they care about performance, which is why they chose MySQL over a good relational database in the first place.
Re:Allergic reaction to MySQL by scribblej · 2010-03-12 19:33 · Score: 1

While I agree with you, I'm a developer of ... medium-sized systems using Postgresql, and this article greatly piqued my interest, considering the single biggest problem I've had with Postgres is it's lack of any good replication or redundancy methods. Right now I tend to use WAL replication to a "warm-standby" server, but this is hardly ideal in any sense.
Don't misunderstand me, I dearly love Postgres. It's just the replication where it really falls flat. Yes, I am aware of all the projects like Slony and pgcluster. I like the idea of pgcluster, but last time I was able to test it, it would fail in funny ways and didn't seem ready for production.
Re:Allergic reaction to MySQL by QuoteMstr · 2010-03-12 19:48 · Score: 1

The adopters of NoSQL deal with huge volumes of worthless information. They don't care about transactional integrity as much as they care about performance, which is why they chose MySQL over a good relational database in the first place.

If you'd read the linked slides, you would have learned that ACID and the relational data model are orthogonal concepts. You can have neither, either, or both.
Re:Allergic reaction to MySQL by Splab · 2010-03-12 20:31 · Score: 1

I love the fact that you mention PgSQL as a good database - it absolutely is, but there is no way in hell it would ever be able to handle the load of Facebook et. al. PgSQL has no clustering option and thus is unable to scale out, only up - you could move some partitioning into the software, but that would lose the point of having a database.
Also, while MySQL is in trouble, the reason for moving away to a database like Casndra for these companies isn't as much busswords as it is just a good business decision. When you don't care about your dataset, that is, you can accept "eventually correct", using casndra over *SQL makes perfect sense.
Re:Allergic reaction to MySQL by ducomputergeek · 2010-03-12 20:48 · Score: 3, Informative

When you're dealing with TB/PB range, you call Teradata. At last check they handle 4 of the 5 largest databases in the world, including eBay/Paypal's 13PB's monster and Walmart.

--
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
Re:Allergic reaction to MySQL by Anonymous Coward · 2010-03-13 01:55 · Score: 0

What about Walmart's 13PB customers?
Re:Allergic reaction to MySQL by Lennie · 2010-03-13 02:09 · Score: 1

Not sure about agriculture in general, but the animals for meat products are probably the biggest carbon-dioxide producers of all.

So if things really are as bad as some people who talk about the climate say, then yes, maybe agriculture was a really bad decision.

But I won't hold it against them, all though the indians were good examples how to life with the earth and they tried to tell us.

--
New things are always on the horizon
Re:Allergic reaction to MySQL by Lennie · 2010-03-13 02:12 · Score: 1

Slony works really well, but schema changes as not a lot of fun.

--
New things are always on the horizon
Re:Allergic reaction to MySQL by jbellis · 2010-03-13 02:51 · Score: 4, Informative

Teradata and the other big relational db products (vertical, greenplum, etc) are all _analytical_ databases, designed for small amounts of complex queries, where adding new data to the system takes minutes if not hours. They are completely unsuitable for running a live application against.
Re:Allergic reaction to MySQL by Wrath0fb0b · 2010-03-13 03:06 · Score: 1

So you are right about the NoSQL fashion trend. Looks like for some companies it's easier to throw a pile of cheap commodity hardware driven by some NoSQL BigTable-wannabie at the problem instead of carefully optimizing queries and indexes for the best performance.

Hardware is generally cheaper than developers -- especially the really rare MySQL wizard that groks the SELECT procedure deeply enough to be able to rewrite them to use fewer disk seeks. That kind of talent is expensive if you can even find and hire it (and pray you don't get some poser that has no idea what he's doing).
I was just casually browsing this article because I don't know much about DBs, but if you tell me that there's a problem that can be solved by throwing more hardware at the problem or hiring a very skilled optimizing DBA, I would take hardware 19 times out of 20. I'm not disputing the software solution is technically feasible, just that it seems like a risky bet.
Re:Allergic reaction to MySQL by TedZ · 2010-03-13 03:07 · Score: 1

So you are right about the NoSQL fashion trend. Looks like for some companies it's easier to throw a pile of cheap commodity hardware driven by some NoSQL BigTable-wannabie at the problem instead of carefully optimizing queries and indexes for the best performance.
Companies do whatever is cheapest. Today, it's cheaper to scale with hardware than with optimizing queries and indices. This is just what Richard Gabriel's described in his classic essays, see http://www.dreamsongs.com/WorseIsBetter.html
Do you know the cost (salary or consulting) of a MySQL expert? How about the cost of optimizing for that one database, tying yourself down to it with non-standard SQL? How about MySQL's historical baggage, piles upon piles of backwards compatibility?
It's not as simple as it seems, the business of data.
Re:Allergic reaction to MySQL by kmike · 2010-03-13 03:47 · Score: 1

Hardware is generally cheaper than developers -- especially the really rare MySQL wizard that groks the SELECT procedure deeply enough to be able to rewrite them to use fewer disk seeks.

The thing is, the stuff they missed in their SQL queries doesn't even need a MySQL wizard in blue cape to grok. There were no JOINs, no subselects, nothing high SQL magic at all - an average self-taught DBA would spot the suboptimal index usage. They should have totally solved it themselves.

I was just casually browsing this article because I don't know much about DBs, but if you tell me that there's a problem that can be solved by throwing more hardware at the problem or hiring a very skilled optimizing DBA, I would take hardware 19 times out of 20. I'm not disputing the software solution is technically feasible, just that it seems like a risky bet.
The funny thing is that they still can't skip "a very skilled optimizing DBA" step even with the NoSQL solution. They still need a database architect, and they still need to optimize their queries. But this time, finding a good DBA would be much harder since I imagine the number of NoSQL specialists (and in them the number of experts specializing in Cassandra) must be much lower than the number of good MySQL DBAs.
Of course, now that they have a system that supposedly scales with a simple addition of new hardware to the farm, they may get away from optimization for some time - if their DB architecture is good.
Re:Allergic reaction to MySQL by kmike · 2010-03-13 03:51 · Score: 1

Do you know the cost (salary or consulting) of a MySQL expert? How about the cost of optimizing for that one database, tying yourself down to it with non-standard SQL?
But now they are optimizing for another, even less standard database (Cassandra), tying themselves down to it with non-standard query syntax. What was your point again?
Re:Allergic reaction to MySQL by ergo98 · 2010-03-13 06:54 · Score: 1

Teradata and the other big relational db products (vertical, greenplum, etc) are all _analytical_ databases
Many of the very expensive database products focus on analytics because that is where the big, big money is.
Greenplum, however, is essentially a clusterable version of Postgresql. Column-oriented tables are an option for your table, but otherwise it's just a really, really scalable version of the open source product.

where adding new data to the system takes minutes if not hours
They load many TB per hour.
Re:Allergic reaction to MySQL by TedZ · 2010-03-13 07:53 · Score: 1

But now they are optimizing for another, even less standard database (Cassandra), tying themselves down to it with non-standard query syntax.

Er, that the "worse is better" AKA simpler solution will win? That near-linear scalability with additional hardware is more valuable than backwards compatibility with 40-year old technology? That you are not considering the practical side but just arguing technical merit, which gets us back to Gabriel's "Worse Is Better"?

What was your point again?
Heh, newbies.
Re:Allergic reaction to MySQL by tyrione · 2010-03-13 10:51 · Score: 1

I think it comes down to the sad fact that most people aren't good at their jobs. They tend to rise to one level above where they are actually competent, and stay there. And from my experience, they aren't usually very happy in whatever that position is, which (and IMHO) might be the reason that people in modern societies are often less happy (overall) than people in less advanced societies. Not many people enjoy that.
This comment is not Off-topic. It's systematic of today's pool of talents. You've got plenty of tools and very few technicians. The poster is spot on.
Re:Allergic reaction to MySQL by GWBasic · 2010-03-13 15:01 · Score: 1

make an RDBMs the best choice for almost any job
Some NoSQL databases are much better when it comes to the Object/Relational impedance mismatch. This is why I'm a huge fan of MongoDB. Even though it's not intended to be embedded, I find that it's much easier to work with then SQLite.

--
No, I will not work for your startup
Re:Allergic reaction to MySQL by Bengie · 2010-03-14 08:07 · Score: 1

Depends on your queries. A poorly written query could easily be tens of thousands of times slower if written by someone who doesn't understand how DBs optimize. I've taken queries that ran in 3 seconds and had them run in under 100ms. I've taken queries that ran for 10 minutes and had them run in under 3 seconds.
I guess it depends on what you're doing.

MySQL not best example of relational technology by Eravnrekaree · 2010-03-12 15:40 · Score: 1

MySQL has never been a good example of a relational database, the underlying implementation is limited. Its MySQL that is the problem here, not relational databases.

I suspect here that it is not the relational model at fault here, but the lack of creativity and competence in implementing a relational database technology. MySQL perhaps has never been a particularly scalable platform, it has a number of severe limitation and does not seem to be designed with a lot of thought for a distributed environment. Its developers seem to have developed it for small scale webpages, and have been notorius on leaving out many advanced features, and thus have limited its effectiveness to small, low powered pages.

Its all in implementation, its not the relational database model that needs fixing, it is the underlying implementations.

Re:MySQL not best example of relational technology by FlyingGuy · 2010-03-12 19:08 · Score: 1

If you want to scale without limits and have the money to pay for it buy Oracle.
If you want to scale with a few limits, but don't have any money get Postgres
If you want to play around and write some of the most stupid syntax on the face of the earth then play with any of the afore mentioned text databases.

--
Hey KID! Yeah you, get the fuck off my lawn!

Re:Wow... by Anrego · 2010-03-12 15:41 · Score: 3, Interesting

Don't be too quick to put Java down.. it's slower but it scales fairly well.

Reddit's reliability has been shitty lately. by Anonymous Coward · 2010-03-12 15:48 · Score: 2, Interesting

On a related note, Reddit's performance and reliability has dropped off significantly since switching to Amazon's "Cloud", and dropped off even further after this switch to Cassandra.

The constant 503 errors, plus horrendous load times when it does manage to work, have driven me and many others away from Reddit. That's why I'm posting here on Slashdot.

Cloud hosting is a stupid idea for anything beyond a blog getting 10 hits per date. All the talk about scalability is pure bunk. I mean, even with the extensive knowledge and infrastructure of Amazon, the Reddit site is slow (and it wasn't like that before they switched).

Re:Reddit's reliability has been shitty lately. by uncqual · 2010-03-12 17:25 · Score: 3, Interesting

One aspect of the "cloud" (as in EC2) is that you can not only scale up easily (for $ of course), you can scale down easily (to save $).

When you have fixed "in house" infrastructure to handle peak loads, there's not a lot of motivation to power off absolutely as many servers as you possibly can when you're not at peak load - all you save is the energy costs (and, if you're using remote hosting, you don't get rewarded for this except for whatever value you attach to feeling "green"). You still pay for the floor space, the machines, and perhaps some sort of maintenance contracts regardless of if the server is powered up or down.

Using EC2 (depending on how you've structured it - some dedicated, some non-dedicated instances etc), if utilization drops to 80% over 20 instances, the temptation is to release a couple instances to save a couple bucks and drive utilization up to 90% on the remaining instances -- with potentially unfortunate consequences.

Although I have no idea, I wonder if Reddit is just releasing instances too aggressively now "because they can" in order to save money? If so, the fingrer should be pointed at Reddit, not the cloud (or EC2 specifically).

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:Reddit's reliability has been shitty lately. by Neoncow · 2010-03-12 17:49 · Score: 3, Informative

The reddit blog discussed the issue recently.
They claim it is not an EC2 issue, but simply the site getting bigger than it was designed to.
Their lastest entry discusses why they switched to cassandra. I guess we'll wait for next week to see if the expected performance benefits materialize.
Re:Reddit's reliability has been shitty lately. by uncqual · 2010-03-12 22:57 · Score: 0, Offtopic

Get a life. You obviously understood my comment (else you wouldn't have been able to post your response). The goal of writing is COMMUNICATION -- I obviously succeeded. Your, sir or madam, appear to be a pedantic idiot who has nothing better to do than syntax check comments. What do you think of Hemingway or e. e. cummings (or, have you ever even heard of them?)?

Oh, and if you can't express yourself without using about 6% profanity, you might want to take a class in English or English Literature.

(Oh, I'm around 4x a "teenager" in age and have hired a lot of 22 year olds who have much greater intellects than you seem to have. Life Suggestion: Learn something about technology and THEN be a pedantic idiot if you still, in your pathetic world, need to validate yourself by being a pedantic idiot. Please post when you feel you can comfortably retire long before "traditional retirement age" - let us know how that works out after you have graduated from DeVry.)

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:Reddit's reliability has been shitty lately. by maxume · 2010-03-13 01:19 · Score: 1

They switched to Cassandra yesterday or so. It has been faster and more reliable in my experience.

--
Nerd rage is the funniest rage.
Re:Reddit's reliability has been shitty lately. by Anonymous Coward · 2010-03-13 02:33 · Score: 1, Funny

Says the guy who can't decide whether to put his periods inside or outside the quotes.

Re:Good for them by Anonymous Coward · 2010-03-12 15:53 · Score: 0

But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?

Re:Wow... by M.+Baranczak · 2010-03-12 15:54 · Score: 2, Insightful

If you're trying to run a site on a $15/month hosting account, then no, this is probably not for you. But if you're at the stage where MySQL isn't able to handle all the data you're throwing at it, then chances are you won't care about the extra few MB of memory that the Java runtime requires.

Re:Wow... by John+Hasler · 2010-03-12 16:00 · Score: 2

> But if you're at the stage where MySQL isn't able to handle all the data
> you're throwing at it... ...it's time to move up to PostgreSQL.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.

Re:Good for them by QuoteMstr · 2010-03-12 16:00 · Score: 3, Informative

But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?

The relational model is consistent and easy to work with. It's easy to specify constraints that describe what the data should look like, and to allow several applications to interact with the data. It's also easier to optimize a database when you can describe discrete queries instead of directly following links from program code as you would in a navigational/object/document/etc. database.

Furthermore, application data models aren't all that object-oriented. Most of the time, the manipulated data types (say, "story", "post", and "user") fall into well-defined categories that correspond well to rows in a table. The few mismatches are easily dealt with in application code.

Sure, using an object database might be "easier" for the first 15 minutes, but you'll kick yourself when you have to manipulate it in any kind of sophisticated fashion.

Re:Wow... by Anonymous Coward · 2010-03-12 16:04 · Score: 0

/etc/init.d/cassandra stop
free -m
-/+ buffers/cache: 213 1259 /etc/init.d/cassandra start
free -m
-/+ buffers/cache: 308 1164

Note that memory usage increases by 100MB, and that's immediately after installing it.
Sites with such large volume could easily benefit from a low memory usage configuration.

Re:Wow... by QuoteMstr · 2010-03-12 16:07 · Score: 1

This isn't your grandfather's JVM.

These days, Java is quite fast and efficient, and there are even a lot of different alternative VMs you can try. Sure, startup time isn't the best, and Swing is still a lumbering, over-engineered, ill-fitting albatross: but these problems don't matter for server applications.

IMHO, the best part is that you can write programs that run on the JVM in a dialect of Lisp and interact seamlessly with other code on the JVM.

Re:Wow... by Anonymous Coward · 2010-03-12 16:24 · Score: 0

And even after that stop it left 100 tasks in 2 processes running jsvc. This on a server that used to only have 50 processes at a time (with both apache and mysql running with about 10 processes each).

"NoSQL"? by Stan+Vassilev · 2010-03-12 16:27 · Score: 5, Insightful

Am I the only one who frowns at this moniker?

First, it creates a false premise where people need to pick "SQL" versus "no SQL", while many real-world systems intelligently combine relational and non-relational data storage for their needs. There is no conflict.

Second, there's nothing wrong with SQL as a language in particular, and in fact many of the "noSQL" engines are starting to support and extending basic SQL queries, instead of reinventing their own query language for the same purpose.

I suppose "lessRDBMSabuse" was less catchy...

Re:"NoSQL"? by shic · 2010-03-12 21:28 · Score: 2, Informative

Second, there's nothing wrong with SQL as a language
I beg to differ - SQL is preposterously baroque!
That said, if you're problem is of a particular kind, it is a perfectly reasonable, practical, solution to many problems.
Re:"NoSQL"? by Anonymous Coward · 2010-03-12 22:00 · Score: 1, Informative

Many of the systems that support SQL as a wrapper do so at a much lower scalability and performance rating.
The reason its NoSQL vs SQL is that SQL comes with a mindset of "complicated queries." When you say SQL people think of transactions, joins, wheres and such. NoSQL is by design far simpler than that. Its pushing the complexity into the application layer and as such it must be thought of as something that is specifically not SQL.
Re:"NoSQL"? by Anpheus · 2010-03-12 22:34 · Score: 1

As long as you stick to a strict subset of SQL and don't get fancy, you mean. Otherwise you'll have to rewrite it for every database engine you want your code to run on.
Re:"NoSQL"? by Tablizer · 2010-03-13 04:23 · Score: 1

SQL is preposterously baroque!
Hey, what's wrong with baroque music?
But I agree that SQL could use an overhaul, rework, or replacements studied. It has poor decomposability, for one: it's hard to break up into referenced parts, resulting in long run-on sentences.

--
Table-ized A.I.
Re:"NoSQL"? by Anonymous Coward · 2010-03-13 16:39 · Score: 0

Then XQuery must be rococo.

Somethings not right by Anonymous Coward · 2010-03-12 16:36 · Score: 0

There is this thing, it's called archiving. Sounds like another example of software developers pretending to be DBA's, if you ask me.

Re:Wow... by QuoteMstr · 2010-03-12 16:38 · Score: 3, Insightful

Bullshit. Languages don't scale: programs do.

Writing a program in Java makes is scalable in the same way that painting a car red makes it fast. The JVM is quite good these days, but don't make up advantages that don't exist.

Re:Wow... by Anonymous Coward · 2010-03-12 16:42 · Score: 0

1999 called. They want their bitching about Java back.

Re:Wow... by salemboot · 2010-03-12 16:47 · Score: 1

so does ASP.net and C#.

Re:Wow... by Anonymous Coward · 2010-03-12 16:48 · Score: 0

Totally!!! It scales from slow to glacial with no effort at all!!!

Re:Wow... by LBArrettAnderson · 2010-03-12 16:49 · Score: 1

I'm sorry, but Java still doesn't compare to C, and those differences *especially* apply to high load server applications.

Re:Wow... by Anonymous Coward · 2010-03-12 16:50 · Score: 0

You'll have to live with it. "It scales", and it scales great in "the cloud". I know, I cringe too.

Re:Wow... by Anonymous Coward · 2010-03-12 16:53 · Score: 1, Insightful

Java the language isn't scalable on it's own.. there's no magic scaling technology built into the jvm.. but the general Java "culture" tends to (in my opinion) achieve at least medium scalability.

When judging a language, you _have_ to look at the culture around it. These days nothing is 100% custom build.. a sizable project is going to import a wide variety of 3'rd party libraries. The general attitude of the community is going to determine how suitable these libraries are for whatever scale you will be using them at.

Same as how languages like perl on their own don't produce unmaintainable code.. it's the perl "write once, read never" culture that leads to so much unreadable code.

Seems odd to be keeping PhP by physburn · 2010-03-12 17:00 · Score: 0

The data of course is the taxed part of structure, depend as it does on how much previous activative there has been on the subject at Digg, but it seems strange to still be keep the other parts of LAMP, and not to moving to a structure its everything is clustered, the including the web server and the application code. Cassandra is based on Java, and storing map and objects, it would make sense to me if they over from apache and php, to apache tomcat, or maybe glass fish. I guess now we'll all have to have Cassandra on our CVs to look professional.

---

Databases Feed @ Feed Distiller

Re:Seems odd to be keeping PhP by maxume · 2010-03-13 01:51 · Score: 2, Informative

Or you could just sporge some jargonistic keywords together in an attempt to advertise your get-rich-slowly scheme.

--
Nerd rage is the funniest rage.

Re:Good for them by mysidia · 2010-03-12 17:29 · Score: 1

No DBM will save the day if you aren't very careful about how you design your usage of very large databases.

In the case of SQL, that would be things like Schema, choice of Index columns, views, stored procs, joins, SQL statement contents.

In some cases, the performance of a SQL statement can be horrible, but can be rewritten in a different way to answer the same question but have stellar performance.

Re:Wow... by Anonymous Coward · 2010-03-12 17:39 · Score: 0

Not necessarily true - simple things like the standard algorithm for matrix multiplication could easily scale if most matrix operations are done on explicit and well defined matrices rather than by someone who re-invented the wheel. True, we should distinguish between the language and the library (it seems more of the library is built into Java than with C, but that is natural as Java is a much younger language). Still there are certain tendencies that one develops in programming in different languages that can alter performance and even internally, and of course the nature of garbage collection can always affect things.

Re:Good for them by Anonymous Coward · 2010-03-12 17:40 · Score: 2, Interesting

In my experiences developing applications in both the business and gaming industries, most applications beyond a simple cookbook app/crappy blog are highly object oriented. How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code. If your domain objects are consistently flat, you are probably doing something wrong. I for one do not want to use an API with Address1 - Address5 string properties. What you just listed as story, post, etc are all just objects, usually with nesting. Relational databases suck at dealing with complex object hierarchies, hence all the joins just because object A has a collection of object Bs which contain an object C.

Can you please define what a sophisticated fashion means? Unless you are a DBA and love SQL/config work, it is far easier to write constraints using an object database. You simply use the same validation and rules you should already be using in your application. If you rely on your database along to enforce things like required fields, atomicity, etc, then you have failed at creating a good application and likely are ripe for exploits, security holes, bad data, etc anyway. It is true that relational DBs provide certain easy facilities, but any decent Object Database provides most if not more of these same constructs in another form through its API. For instance, most object databases I have used provide some sort of transactional data structure that supports far more types of locking and concurrency/conflict management than any relational DBs I have ever seen. Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language. I'd say you can do a lot more in Smalltalk than SQL for instance.

If you're referring to querying, apparently you've never queried in Smalltalk, C# with LINQ, LISP, or even just using lambdas in python or ruby. Querying using the actual object is typically far easier than writing a SQL query. These days it is becoming increasingly rare that someone rolls all their own queries in your average app anyway (see ORMs). You'll often end up with something like an ORM translating some things from the UI into a boat load of queries, then you'll have to go and find fixes for the ORM to avoid making the application grind to a halt due to all the chatter. Although a lot of that is often the function of UI elements, ultimately there is a lot of overhead created by patching the relational and object disconnect.

I am wondering how you think going from relational back to objects, even flat ones is somehow easier and more consistent. You're adding an extra language, more layers, and more configuration/management for what gain? Object databases hold records for things like throughput for transactions, data population, etc. The performance thing is a myth of the past. I'd say the stumbling block if anything is simply bad developers. An RDBMS does add some what of an idiot proof layer, but really in the end you just end up with even crappier code in other spots.

Finally, you mention that discrete queries are easier to optimize. I again must disagree. If you want discrete queries, you could describe each query on an object with another object. This is exactly what any good developer should be doing with an ORM anyway. For instance, you could use the specification pattern with the repository pattern to describe and issue your queries, object db or rdbms. Secondly, instead of some crappy tools from the maker of the RDBMS, using an object DB I now have the full facilities of the language to do performance optimization, profiling, logging, etc rather than what a vendor provides. MSSQL provides some great tools for example, but most other DBs while nice implementation wise, provide horrific tool chains.

It is true there are some problems an RDBMS is good for, but your post comes off like someone who has never really use

Re:Good for them by QuoteMstr · 2010-03-12 17:53 · Score: 3, Interesting

Thanks for the comprehensive reply.

How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code.

ORMs are syntactic sugar for the underlying database operations. It's possible to bypass them when you need SQL's full power and access the same data store.

I for one do not want to use an API with Address1 - Address5 string properties.

So create a table of addresses and use foreign keys to connect them to whatever other table you'd like. Since when does a relational structure require a garbage schema like your example. But surely you know all that.

Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language

But doesn't that then preclude accessing the same data set from programs written in other languages? The beauty of SQL is that it's language-agnostic.

You also make several points relating to toolchains and testing: sure, some databases have better tools than others. But we're talking about differences between models, not differences between particular tools.

So much horseshit in just one slide deck. by melted · 2010-03-12 18:00 · Score: 1

So much horseshit in just one slide deck. No matter what you do, unless you have at least a hundred machines at your disposal, Hadoop won't be faster than a single box grep from SSDs. LucidDB is excruciatingly slow for all but tiniest datasets. I've tried a good half dozen "solutions" from this slide deck (including Aster), and other than Postgres all of them suck ass, more or less. If you see ANYTHING other than Nutch with Hadoop as a backend, head for the hills right away.

Re:Wow... by Randle_Revar · 2010-03-12 18:45 · Score: 1

It's kind of like Make, but with a lot more XML

--
Climate Progress - Hell and High Water

There is postgreSQL by Billly+Gates · 2010-03-12 19:04 · Score: 1

Mysql sucked for many years but is getting better with each release. It was never designed to be a fully RDBMS .

In Japan people use PostgreSQL and I am surprised that its not common among geeks. Many ISPs now offer it as well as MySQL. The problem is the trendy word is Nosql and mostly non database programmers are promoting the movement due to bad experiences of trying to learn mysql to do things that are very complicated.

PostgreSQL is very easy to switch your existing code too if you used SQL compliant code in languages such as Php. WIth triggers, views, stored procedures, and abilities of self repairing in case of a power failure make postgreSQL an easier platform to develop for.

--
http://saveie6.com/

Re:There is postgreSQL by maxume · 2010-03-13 01:48 · Score: 1

Imagine if Mysql got worse with every release.

--
Nerd rage is the funniest rage.

Re:Wow... by MichaelSmith · 2010-03-12 19:14 · Score: 1

Languages with namespaces scale better than languages without namespaces.

--
http://michaelsmith.id.au

Re:Wow... by Billly+Gates · 2010-03-12 19:25 · Score: 3, Insightful

Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.

I suppose you could write your own threading and processes code but most Java developers just use whats built into the api.

--
http://saveie6.com/

Try putting petabytes of data by Anonymous Coward · 2010-03-12 19:27 · Score: 0

Try putting petabytes of data on SSDs and let me know how that works out for you.

Except it does. by warrax_666 · 2010-03-12 19:40 · Score: 1

Which is more expensive, a few extra machines or developer time? (I'm assuming a solution that scales properly here, you write scalable solutions in any language.)

--
HAND.

Re:Except it does. by Anonymous Coward · 2010-03-13 01:26 · Score: 0

It depends on how many "a few" actually are. If you have more than a handful of customers, the money they would have to spend on hardware because your software is too slow will easily pay for months of developer time. This may not seem to be your problem, but it becomes your problem when your customers decide to use another product, because yours requires more expensive hardware and is thus more expensive overall.
Re:Except it does. by Lennie · 2010-03-13 02:00 · Score: 1

Depends on the site and or sites and the budget. Facebook has one developer for each 1 million users. Guess again.

--
New things are always on the horizon

Are you trolling? by Anonymous Coward · 2010-03-12 19:55 · Score: 0

Search has nothing to do with "relational". And SQL is a query language and also has nothing to with how well/badly a given model (relational/whatever) scales.

You are talking out of your ass.

New standard stack for open source development by tommis · 2010-03-12 20:10 · Score: 1

Moving from LAMP to CLAP sounds like a new STD stack for open source develeopment

Thanks !! by Taco+Cowboy · 2010-03-12 20:35 · Score: 1

Many thanks for the explanation ! :)

--
Muchas Gracias, Señor Edward Snowden !

How one use these NoSQL thinsanyway? by Anonymous Coward · 2010-03-12 20:56 · Score: 0

Okay, I keep hearing about these noSQL solutions, but I can't find a single example!

For example, how do you do some SELECT with couple of JOINS? How do you do SUM over GROUP of things etc. ... or for that matter how one creates table?

And, yes, I have searched for Hadoop and others but all I get these odd pages with no examples.

I'm probably too damn idiot for these NoSQL solutions, since can't find a good tutorial for converting SQL app to one of these.

Re:How one use these NoSQL thinsanyway? by Anonymous Coward · 2010-03-12 22:28 · Score: 0

Mongodb has a nice mongodb tutorial. In one of their examples the following javascript statement "db.things.find({x:4}, {j:true}).forEach(function(x) { print(tojson(x));});" performs the equivalent of the following SQL statement "SELECT j FROM things WHERE x=4" http://www.mongodb.org/display/DOCS/Tutorial
Re:How one use these NoSQL thinsanyway? by Lennie · 2010-03-13 02:31 · Score: 1

How nice and readable ! ;-)
I deal with javascript pretty much everyday and like it, but even I think it was ugly the first time I looked at it.
A second look just tells me the indention would really help here.
db.things.find ( {x:4}, {j:true} ).forEach (
function (x) {
print ( tojson(x) );
}
);
I can see how that would work with map/reduce.

--
New things are always on the horizon

It's "Not Only SQL" by Otis_INF · 2010-03-12 21:02 · Score: 3, Informative

The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL

--
Never underestimate the relief of true separation of Religion and State.

Re:Wow... by Anonymous Coward · 2010-03-12 21:27 · Score: 0

What large scale website is running some retarded JVM and Java? I can tell you because they are all slow as hell.

Re:Wow... by Anonymous Coward · 2010-03-12 22:10 · Score: 0

If you get to the point where mysql won't be able to manage all that data then Java overhead would be probably way more than few megabytes.

Why the angry SQLers? by ishmalius · 2010-03-12 23:22 · Score: 1

There seems to be this angry pushback from a core of dedicated SQL programmers, acting as if someone had insulted their tin god and wanted to invalidate their lives' work. Not at all. All that has been developing is the realization that RDBMS's are not the best fit for all applications, and that other storage schemes might have a better impedance match with the needs of a particular design. RDBMS's are still robust and reliable and useful for (maybe most) applications. Only some apps' data does not fit nicely into rows and columns. And you should design your code around the data, not try to morph the data to your software.

Re:Why the angry SQLers? by Macka · 2010-03-13 02:15 · Score: 1

Well said.

Re:Wow... by Anonymous Coward · 2010-03-12 23:38 · Score: 0

You just don't get it. Yeah, anyone can use the threading code built INTO java. It doesn't stop anyone from writing a pile of shit, or a masterpiece. Nor does it prevent someone from creating the worst layout in JPA, or the best one.

Where are the Engineering Benchmarks? by Anonymous Coward · 2010-03-13 01:14 · Score: 0

Great, a number of sites have switched to Cassandra, that's an interesting social benchmark. What about some real engineering benchmarks? I'd like to consider Cassandra but where is the objective data?

Cassandra's data model page states that "Cassandra is much, much faster at writes than relational systems". Great, so how about some comparative data? There is a slide show on the main Cassandra page with a snippet of data about read latency. Reads range from 7 ms to 44 ms. That's pretty anemic in the RDBMS world. There is a statement that writes are limited by network bandwidth.

There is also a presentation from IBM that shows reads ranging from 25 to 900 ms, but with no write data. The fact that read latency gets worse (increases) by a factor of 2 or more when you go from a 3 node to 6 node Cassandra cluster would seem to be worrisome on the surface.

The Facebook Engineering Notes presentation has almost nothing quantitative (only two sentences have numbers) and nothing is documented about read or write performance.

Re:Wow... by JamesP · 2010-03-13 01:18 · Score: 1

Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.

OMFG! The amount of fanboyism is amazing...

Java libraries may be good but they, IN NO WAY make a program 'automatically scale'

You can't just write a non-trivial program in java and have it be automatically scaling horizontally .

Don't take away the merits of Cassandra developes saying it was easy because of java.

--
how long until /. fixes commenting on Chrome?

Re:Wow... by MemoryDragon · 2010-03-13 01:18 · Score: 1

I think the entire platform is the issue, on language level you get threading,vast concurrency apis etc...
on platform level you get cloud features like distribution over multiple servers in realtime, transactional locking over an entire cloud etc...
It really depends on which features of the stack you choose but the scaling features of Java and JEE are phenomenal without too much effort.

Postgresql Horizontal scalability by Anonymous Coward · 2010-03-13 02:07 · Score: 0

Some other bloke in another part of the discussion said, Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally

Can't say I know which of you is right.

Uh - these DBs have been around for years and year by Anonymous Coward · 2010-03-13 02:13 · Score: 0

Uh - these DBs have been around for years and years. C-Tree, Raima, and other DBs are non-SQL and FAST. No DB server involved, so high concurrency wasn't a good idea. We used them for individual per-user DBs. Also, they are fine for write-once, read-many data needs.

I had 10+yrs using them before I was introduced to SQL-based DBs. Back then, MySQL, msql weren't mature enough to trust any data. The only other options were the expensive SQL vendor DBs. Those didn't work for our world-wide royalty free software distribution requirements. We started with Raima (before Velocis) and ended up migrating to C-Tree. BLAZINGLY FAST doesn't describe how fast it was. I think Raima could have been fast too, but that part of the program was written by an engineer, not a CS artist. The Engineer left our team and the CS guy rewrote everything in C-Tree. I think it costs 20x less that way too.

Re:Wow... by Anonymous Coward · 2010-03-13 02:16 · Score: 0

Bullshit. Languages don't scale

If that language is PHP then you are correct, it doesn't scale. Which is why people have to make heavy use of memcached and similar. Before you argue with me wannabe coders, how many thousand hits per second does your site get? Thats what I thought. Facebook etc took the more hardware, less dev time approach and thats how they scaled with PHP. It's pretty pathetic.

Java all the way.

There is no "separate file access for each column" by jbellis · 2010-03-13 02:48 · Score: 1

No idea where you got that particular piece of misinformation. :)

Re:Wow... by Anonymous Coward · 2010-03-13 02:53 · Score: 0

Amazon, eBay, Gmail and Google Search to name a few.

Re:Good for them by tepples · 2010-03-13 03:33 · Score: 1

If you rely on your database along to enforce things like required fields, atomicity, etc, then you have failed at creating a good application

At some point, I don't want to create a good application. I want to create a good model for the business, and then create applications that interact with that model. If I enforce constraints only in an application, someone else will forget to enforce the constraints when she writes a different application written in a different language sitting on the same database. So I enforce some constraints, such as not null or foreign key, in the table definitions. In tricker cases, I write test case reports to make sure no other application interacting with the same database has made it less than consistent. Where I denormalize to work around limitations in the RDBMS's query optimizer *cough*MySQL subqueries with GROUP BY*cough*, I put the denormalized columns into a separate table whose name ends with "cache", with a table comment pointing to which module of which app is responsible for maintaining it.

Re:Good for them by Anonymous Coward · 2010-03-13 03:40 · Score: 0

ORMs are not syntatic sugar. Reading code, yes, but they are doing a lot of work in the background and therefore are far more than sugar. They are often the first place to look when you have a slow, data-driven application. It is true you can bypass them as you mention, but that negates all the benefits. Instead you are left with tedious programming tasks they are some of the most error prone and boring parts of the application. I also don't think you answered how they are not patches on the relational model. The truth is often for serious applications, you will have to bypass an ORM, and then what have you gained? Why not use an object database and build an application with a consistent API? Building exceptions around an ORM only leads to code that is hard to test, debug, and read.

Regarding the address fields, I think you missed the point. When you get those fields into code, you are left with two choices. The first choice is having Address1-5 as I mentioned. The second choice is having a 1:many or many:many object. Neither make sense in application terms. What would make more sense for example is Customer.Addresses collection for example, or something else more specific to the objects. My point being not that relational databases cause a garbage structure because of normalization, but rather cause bad objects because relational tables do not match the structure of object oriented languages and simple application logic.

I don't believe object databases preclude access to other languages. If you want other languages to access the data, you can simply build web services on top of the data. IMO, for key functionality in anything but a small application, you should be doing this anyways regardless of using SOA or another architecture. It is easy enough to build a generic service layer for example that simply serializes objects to XML, JSON, or SOAP without actually having to build a sophisticated service layer. This is exactly what some CRM software today does to allow people to program against the internals (ironically, on top of a relational store). The database is not a great integration point and there is a lot of hard evidence that shows this. It is a tempting and easy one, but most likely two applications have different requirements. If you try to make your data a jack of all trades, youi will likely fail at everything. Data warehousing is a good example here because without wildly warping the schema, doing reports on the relational model becomes nearly impossible.

Finally, the point about different tools is that languages tend to have far better tools and more of them than databases. It eliminates having to deal with the burden that some systems place on the developer due to their pool tool chains. Frankly, there is not a DB tool chain that can compete with a language tool chain for debugging, optimization, development environment, and consistency. MS SQL as I mentioned is the closest, and I'm guessing how people around here feel about that.

With respect, I am getting the feeling you have never programmed a serious application against a relational model. There is evidence with ORMs, design patterns, and the wealth of other kinds of databases that the relational model is not good at solving many problems. The fact is that most real applications that aren't some crappy ruby on rails demo are highly object oriented and require modeling the domain as it exists, not to match a database. Read up on domain driven design and that approach alone should make the points clear, and in particular some of the benefits of using proper objects. Your application and your data are two different things. The application should not morph to the data and vice-versa. Object databases solve most of this problem, but even in those cases at some point you will need to do things like use DTOs and presentation objects to get away from the disconnect between data, domain, and presentation. The fact is I can't remember the last time I simply had an RDBMS that only consistent of simple relationships that didn't necess

The name NOSQL is misleading. by Futurepower(R) · 2010-03-13 04:54 · Score: 1

Another example of technically knowledgeable people picking a really bad name.

Re:Good for them by Anonymous Coward · 2010-03-13 05:25 · Score: 0

If you want to create a good model for your business, why not write an API? An API is a lot better as an integration point. You can then build libraries you can reference and your model will always be consistent. Data and model are two different things often as well. You probably need more than one layer to properly enforce things and expose the correct functionality across applications. Maybe you're not aware, but object databases do offer the data to multiple channels, giving you the objects themselves as the model. I think that's a lot better than some crappy relational column store. Most also offer integration for multiple languages, for example Gemstone now supports Java, C, C++, Smalltalk, and Ruby. You could build more if you wanted, but I think services is the way to go anyway. Pick the language that deals with the Object Database best, then build your UI or other layers in the language of your choice (ideally the same, but not necessarily).

Different languages require different database drivers, adapters, ORMs, code, etc. This opens or for a myriad of mistakes and inconsistencies. The constraints you speak of might be honored, but you can't possibly make all your DB constraints as smart and bullet proof as application constraints. This is again less of a data store issue and more of a design issue. You could solve this by building what I speak of above on your relational store. My point is just that the relational store is not a crutch for bad programming and not a place to enforce business logic or otherwise. I understand the need for data logic, but the data store itself should be protected by most of these things long before the data ever reaches it. Is it better if my app tells me it is the wrong data type or the db spits out some ridiculous error that I then have to propogate back to the application anyway? Then I also have to hope that everything was wrapped in transactions because god forbid (MySQL especially) the database should try to perform an operation and then fail half way through, requiring someone to manually untangle the mess. If a bad object makes it in, just delete the object (equiv in most implementations to dereferencing a pointer). Simple, effective, and easy to be atomic and consistent.

Your example of implementing a cache with some table comments exactly proves my point. That is error prone, far from consistent, and ensures some idiot will overlook your careful work. This is less likely in an object database as what you get out already has everything you need. It is true that if you have more layers, you'd run into the same problem, but again, an API will solve this. Simply reference the API or web services and you're good to go across multiple applications. Is it any wonder some of the largest web sites are integrating via service APIs and not directly the database? You could have multiple stores, shards, whatever, and a service will hide that complexity.

Although it's not an Object Database, look at Azure Tables or even the relational SQL Data Services from Microsoft. I think it's a crap implementation, but if it demonstrates one thing, it is that integration via an API possibly built on services is the way to go to build massively scalable databases and provide integration points.

It's been done decades ago. by Estanislao+Mart�nez · 2010-03-13 08:39 · Score: 1

Please elaborate on how an RBDMS is applicable to what I guess is now called "scaling horizontally", or perhaps more formally known as sharding, or partitioning with redundancy. It's my impression that most of the RBDMS products available today are simply atrocious at this, but if you can point out which books I need to look at, and which products have good support for this sort of scale, I'd love to learn.

The term shared-nothing architecture dates back to 1986. The concept goes back further; for example, the Teradata RDBMS dates back to 1976-1983.

So ignorance about database technologies and products is one issue, and the one hardest to excuse. There are however other issues that are more understandable:

Existing solutions for achieving horizontally scalable RDBMSs tend to use special hardware. For example, Teradata uses a proprietary interconnect between nodes. Small web startup definitely are not going to be able to afford this, and even the large ones would rather use commodity hardware, and have it geographically dispersed.
The products are designed for and marketed toward large, established enterprises, who already have a large volume of data, and are already willing to throw a lot of money on hardware, software licenses and support so they can analyze that data. Web startups, in contrast, want to start small.
The existing solutions are proprietary, and web startups tend to prefer open source solutions.
The existing solutions are biased toward OLAP, analytics and data mining, i.e., taking large volumes of data and analyzing large sets of it at a time. Some of the "NoSQL" products are built for this (e.g., Hadoop + HDFS), but there are others that are more aimed toward simple transactional processing.

So it really is the case that there do not exist good relational products that tackle something like Digg, Facebook or Twitter, which want to use commodity hardware in geographically dispersed locations. However, it is also the case that the "NoSQL movement" that has sprung up to fill this gap has a combination of ignorance and animosity towards the relational model, and are just not thinking the problem through.

--

Are you adequate?

Firebird by ProfessionalHostage · 2010-03-13 14:30 · Score: 1

http://www.firebirdsql.org/

Firebird is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms. Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers. It has been used in production systems, under a variety of names, since 1981.

Re:Wow... by MemoryDragon · 2010-03-14 19:20 · Score: 1

Slower as what, C++ definitely in most szenarios, PHP and other scripting languages, it runs circles around them.

Web 2.0 world by Anonymous Coward · 2010-03-23 04:49 · Score: 0

Name one Web 2.0 application that is able to properly manage multiple MySql? I am aware of only one (http://novaquantum.com) but the point is that MySql is not gaining any ground on 2.0 frontend!

Slashdot Mirror

Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL

271 comments