Horizontal Scaling of SQL Databases?

← Back to Stories (view on slashdot.org)

Horizontal Scaling of SQL Databases?

Posted by timothy on Thursday November 18, 2010 @08:48AM from the side-to-side dept.

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"

222 comments

Min score:

Reason:

Sort:

XML by Anonymous Coward · 2010-11-18 08:49 · Score: 2, Funny

Just store everything in a big XML file.
1. Re:XML by Anonymous Coward · 2010-11-18 08:53 · Score: 5, Funny
  
  XXXML
2. Re:XML by icebraining · 2010-11-18 08:54 · Score: 0
  
  Accessed by a Samba share.
  
  --
  Dilbert RSS feed
3. Re:XML by word_virus · 2010-11-18 09:02 · Score: 1
  
  Sounds like someone's been watching my screencasts.
4. Re:XML by drouil11 · 2010-11-18 09:04 · Score: 1
  
  Have you tried big and tall?
5. Re:XML by suomynonAyletamitlU · 2010-11-18 10:49 · Score: 1
  
  "Extra-extra-extra-medium" large? I'm not sure how you get "extra-medium" in the first place, much less even moreso.
6. Re:XML by Anonymous Coward · 2010-11-18 12:26 · Score: 0
  
  1080?
7. Re:XML by MichaelSmith · 2010-11-18 14:14 · Score: 1
  
  Make it four X and you've got a deal.
  
  --
  http://michaelsmith.id.au
8. Re:XML by yuje · 2010-11-18 14:54 · Score: 1
  
  Better yet, a big Microsoft Excel file. It'll be a database that even upper management will understand!
9. Re:XML by Stuarticus · 2010-11-18 22:37 · Score: 1
  
  Nah, we just rooted you.
  
  --
  If you think someone isn't free to have a different definition of "freedom" you may be a tyrant.
What limitations are you running into? by Anonymous Coward · 2010-11-18 08:55 · Score: 5, Insightful

It would be a lot easier to talk about solutions if you said which limitations you run into.
Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?
1. Re:What limitations are you running into? by Anonymous Coward · 2010-11-18 09:04 · Score: 5, Interesting
  
  It would be a lot easier to talk about solutions if you said which limitations you run into.
  Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?
  My money is on "No one here likes SQL" and "There aren't any exports on RDBMs to help us get things set up properly".
2. Re:What limitations are you running into? by DarkOx · 2010-11-18 10:11 · Score: 4, Insightful
  
  I would have to agree, its really hard to imagine a "start up" can't make anything work on traditional SQL RDBMS(es). If you put the right hardware underneath it even SQL Server 2000 (64bit anyway) will scale just fine to terabyte size databases at thousands of transactions per second. That is not on impossible hardware for a successful start to buy either, we are talking a dedicated storage controller with gigabyte or so cache and few dozen SAS drives. I know I have worked on such projects.
  You need the schema right, and if its more reads than writes you might even de-normalize a little and you will need to partition the data appropriately, but it can be done. This is why realDBAs still make the big bucks. There is a lot to know in that domain. You probably should hire someone who is an expert on whatever stuff you are using now to consult before you go down the path of NOSQL. All you told us is you are a growing start up with is not much to go on but without know what you are doing its hard for me to believe you are doing anything on a scale that can't be done well with a relational database; but maybe I am wrong and maybe you are doing something huge. Remember as soon as you go down the NOSQL path you are going to have to be doing a great deal of heavy lifting because the quantity of libraries and off the shelf stuff out there is not great.
  
  --
  Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
3. Re:What limitations are you running into? by Skal+Tura · 2010-11-18 11:35 · Score: 1
  
  Which is most likely scenario as SQL scales tremendously given some thought to it.
  
  --
  Pulsed Media Seedboxes
4. Re:What limitations are you running into? by Foofoobar · 2010-11-19 02:48 · Score: 1
  
  Well as someone who was hesitant about NOSQL, I've learned some of the benefits. NATURALLY, you want to use it in coordination with an RDBMS but it allows for caching of large 'one-off' datasets.
  
  Often I want the speed of the database but don't want to index large strings (as they slow down a query). NOSQL allows one to move those large datasets out of the RDBMS but still maintain the relationship in the RDBMS.
  
  --
  This is my sig. There are many like it but this one is mine.
Re:XML Go Diagonal by Anonymous Coward · 2010-11-18 08:56 · Score: 0

After that go diagonal, you get a preemptive database which can guess your sql needs.
Relational stuff scales by Anonymous Coward · 2010-11-18 09:00 · Score: 5, Insightful

Learn partitioning principles, get a database product that does partitioning properly, learn normalization, never worry again about not being able to scale with relational databases. It just requires some real skills but relational databases really do scale all the way up.
1. Re:Relational stuff scales by ani23 · 2010-11-18 09:07 · Score: 2, Interesting
  
  Partitioning does complicate backups and HA/DR scenarios as the entire system is dependent on all machines being up and running. Also in most commercial db's (I know about db2) this feature takes you to the enterprise tier of software which is usually very expensive.
2. Re:Relational stuff scales by h4rr4r · 2010-11-18 09:13 · Score: 4, Informative
  
  Postgres seems to not charge extra for that.
3. Re:Relational stuff scales by TooMuchToDo · 2010-11-18 10:34 · Score: 1
  
  OH SNAP
4. Re:Relational stuff scales by Steeltoe · 2010-11-18 11:08 · Score: 1
  
  Ditto on Postgres :-)
  And if someone's having performance problems on Postgres, learn:
  A) Indexes
  B) CLUSTER
  C) RTFM
  Really, that's all there is to it! I'm sure more advanced setups can be made, but Postgres will scale fine for a small startup just using the basics. However, if you never CLUSTER or VACUUM (not preferable) Postgres, it can become a dog if you have alot of UPDATEs.
  Basically, the poster should just RTFM. It is time spent educating yourself, and making it better next time.. Asking Slashdot a generic question such as this is a bit silly IMHO.
  
  --
  http://www.debunkingskeptics.com/
5. Re:Relational stuff scales by TooMuchToDo · 2010-11-18 11:12 · Score: 0
  
  Exactly. If Postgres can drive .org, I think it can drive near anything if done properly (well, not google, but anything smaller than that).
6. Re:Relational stuff scales by atomic777 · 2010-11-18 11:58 · Score: 2, Insightful
  
  Right. "We are hitting limitations in what we can do with X" means they cannot solve an underlying difficult problem Z, and are hoping that by swapping X with magic fairy dust Y, that somehow Z will go away. Sales people owe their BMWs to this simple fallacy.
7. Re:Relational stuff scales by psilambda · 2010-11-18 13:15 · Score: 1
  
  Other comments, I think, are talking about just throughput, but just in case anyone is talking about processing the data I would like to note the following: if you think that PostgreSQL does not provide the processing performance you want to scale to, then you have not been paying attention: check out my Field Forge announcement (currently the top news item on the PostgreSQL.org web site) and then go check out Amazon's EC2 GPU cloud announcement. Put them together and I am not sure anyone knows yet how high you can scale to. You probably now have available more distributed, high performance parallel processing power than you can imagine.
8. Re:Relational stuff scales by Splab · 2010-11-18 19:28 · Score: 1
  
  Postgres doesn't have clustering, so how exactly are you achiving this?
  The new "hot" standby option for postgres is a step in the right direction, but most can't live with "eventually consistent" in their hot standby environmen.t
9. Re:Relational stuff scales by theshowmecanuck · 2010-11-18 19:50 · Score: 1
  
  I listened to enterprise dbs intro lecture (web presentation) on the new hot standby feature, and IMHO it is not ready for prime time. It doesn't feature automatic cut over for high availability, and frankly point in time recovery seems to be kludgy. And I LIKE Postgres. It is getting closer to enterprise ready, but until it can do HA it is hard to say yes. Yeah I know there are third party apps that you can plug in to provide close to this (if not completely) but unless it is a built in feature of the RDBMS, it is still kludgy in my books. But if you don't need HA, it is a great RDBMS without a doubt.
  
  --
  -- I ignore anonymous replies to my comments and postings.
10. Re:Relational stuff scales by Anonymous Coward · 2010-11-18 22:04 · Score: 0
  
  If you only need a massive distributed hash-table, there's no problem. But once you start needing something similar to what an RDBMS provides, you start hitting workarounds, compromises and kludges :).
  This is probably related to what the submitted said: http://en.wikipedia.org/wiki/CAP_theorem
  Put enough buzzword wrapping on it and maybe it starts to look better :).
11. Re:Relational stuff scales by Anonymous Coward · 2010-11-19 03:10 · Score: 0
  
  Postgres seems to not charge extra for that.
  Are you crazy? It's like 500 times the cost of standard Postgres!
Consider scaling via other layers? by mlts · 2010-11-18 09:01 · Score: 3, Interesting

Another idea is to scale using other layers, if there are problems at the SQL server level.
At the lower areas, one can go with a mainframe (parallel sysplex) and have geographically separate pieces of hardware acting coherently.
At the higher layers, have the app use multiple SQL servers and handle the redundancy in this layer.
1. Re:Consider scaling via other layers? by Anonymous Coward · 2010-11-18 09:05 · Score: 0
  
  Also I assume if your looking at stuff like nosql you already did the obvious like, using an in-memory caching system with tons of memory.
2. Re:Consider scaling via other layers? by mlts · 2010-11-18 09:24 · Score: 1
  
  That is what mainframes are for. Yes, the technology is old and not exciting, but one of the strong points of mainframes is I/O, which is critical to most database architectures.
Call me skeptical by Kjella · 2010-11-18 09:05 · Score: 5, Insightful

Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong. As long as the volume is small you can make almost anything happen on SQL. Hell, most small business I've known run mostly on Excel. Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...

--
Live today, because you never know what tomorrow brings
1. Re:Call me skeptical by ani23 · 2010-11-18 09:13 · Score: 1
  
  We have a winner. No more discussions on this topic. mmmkay sigh . . . . .
2. Re:Call me skeptical by doroshjt · 2010-11-18 09:14 · Score: 1
  
  My Space uses SQL Server, so unless this start up is bigger then myspace, I think they are just doing it wrong.
3. Re:Call me skeptical by Squeebee · 2010-11-18 09:16 · Score: 5, Funny
  
  Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.
  Naturally I start thinking of this: http://xtranormal.com/watch/6995033
4. Re:Call me skeptical by MatthiasF · 2010-11-18 09:16 · Score: 1, Insightful
  
  In Cloud scenarios, a distributed relational database is cumbersome or even impossible to maintain. Hence why lots of web companies have moved over to NoSQL solutions tailored to their processes.
  
  So, you're describing centralized, local databases whereas the OP is focusing on decentralized, cloud databases.
5. Re:Call me skeptical by craftycoder · 2010-11-18 09:19 · Score: 3, Interesting
  
  My thoughts exactly. I have a couple 100 GB in a MsSQL database with extensive normalization and it is lightning fast. It's all about indexes and appropriate design.
6. Re:Call me skeptical by Ruke · 2010-11-18 09:23 · Score: 2, Insightful
  
  I think the real problem is that people are seeing inconsistencies in their growing systems, and looking to grow to a system that doesn't have inconsistencies. Which is basically impossible. It's not that the big players don't ever have inconsistent data - Amazon's Dynamo relies on reaching a quorum, rather than a totally consistent state. Rather, the big players have a much better idea of exactly how inconsistent their data can be, while still giving their system good performance.
7. Re:Call me skeptical by RobertM1968 · 2010-11-18 09:24 · Score: 4, Insightful
  
  Agreed... the biggest limitation I see with SQL (My, DB2, Postgres anyway... found plenty in MS) are people who don't know how to lay out a database, people who don't know how to install and configure the server daemon(s), people who have no idea how to properly select appropriate hardware, and people who don't know how the heck to do a query (as a for instance, I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards).
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
8. Re:Call me skeptical by hedpe2003 · 2010-11-18 09:28 · Score: 1
  
  ... Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data...
  Can we pretend that he does - and actually offer some useful information? Thanks
  
  From the guy who knows nothing about NoSQL other than what wikipedia just provided him.
  
  --
  Comprehensive solutions via a competition of ideas like no other.
9. Re:Call me skeptical by PRMan · 2010-11-18 09:28 · Score: 2, Funny
  
  MySpace is also slower than maple syrup in January.
  
  --
  Peter predicted that you would "deliberately forget" creation 2000 years ago...
10. Re:Call me skeptical by suso · 2010-11-18 09:32 · Score: 1
  
  Naturally I start thinking of this: http://xtranormal.com/watch/6995033
  Thank you for posting that. I'm so sick of the NoSQL shit. Learn to design schemas.
11. Re:Call me skeptical by Cylix · 2010-11-18 09:34 · Score: 5, Funny
  
  I just select * from * and then sort it out with grep and cut.
  
  --
  "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
12. Re:Call me skeptical by Anonymous Coward · 2010-11-18 09:38 · Score: 1, Insightful
  
  The real problem is scale. Any SQL-DB server will cope with most application fine, but add live data on a public facing site with a decent volume of users, and they're crawl to a slow death. This is why non-crucial sites use denomenilization and do the dastardly deed of data duplication to speed up their bad query and suspect table design.
  So what's the solution? Very large and expensive boxes for the simplest method (no one likes this these days), and then lots of boxes performing certain tasks. Which has its own huge costs, because skilled people in this field are very few and far between, and are already working for Yahoo, Google and Fartybook.
13. Re:Call me skeptical by nschubach · 2010-11-18 09:42 · Score: 2, Insightful
  
  It's rather fast now that nobody uses it anymore.
  (sorry, I couldn't resist.)
  
  --
  Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
14. Re:Call me skeptical by Anonymous Coward · 2010-11-18 09:43 · Score: 2, Insightful
  
  I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases...
  
  Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong.
  Agreed. My knee-jerk response once I saw the sentence in the article's summary was "No, you're not [hitting the limititations in what we can do with relational databases]. You're hitting the limits of what you know about performance tuning and scalability with the relational databases you have.
  NoSQL, BigTable, and Cassandra are designed for extremely fast key-value pair lookups over enormous datasets (as one poster puts it, > exabyte-sized.) With these solutions alone, you lose:
  a) ACID
  b) FK relations/semantic modeling
  which is huge. (If you don't know why losing ACID and FK relations is such a bad thing, you might as well stop here, hit the library for a good database textbook, read and understand it, then come back in 3-6 months and rephrase your question.)
  If you *really* have > exabyte-sized data in a table or two and you really are hitting the limits of what current RDBMS engines can provide (and if you haven't looked at DB2 or Oracle, maybe you should - their optimizers are better than Postgres or (laugh) MySQL), you'd probably want to work around (a) and (b) by using some sort of enterprise transaction management system (e.g. JTA if you're using Java EE), then incorporate the tables you need into NoSQL, Cassandra, or BigTable by providing middleware to interface with these hash stores that provides support for two-phase distributed commit and fakes the FK relationship to cross datastore boundaries.
  And if you think that doesn't sound too bad, think again: what I just described is a HUGE undertaking. Are you really sure you haven't exhausted all other options to stick with proven database technology that performs well up to exceptionally large-sized datasets? Maybe it's time to hire, you know, a real DBA - this type of analysis is what they get paid the big bucks for.
15. Re:Call me skeptical by nschubach · 2010-11-18 09:46 · Score: 1
  
  Are you sick of the NoSQL talk because you know specialize in SQL and feel as if it's a competitor, because it's gained a lot of attention recently and happens to be talked about more than SQL, or is there some other reason for the sick feeling?
  (I do very light SQL development and have not touched a "NoSQL" solution, but I do not find myself sickened by people investigating alternatives.)
  
  --
  Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
16. Re:Call me skeptical by mini+me · 2010-11-18 09:48 · Score: 0
  
  The big boys are using NoSQL databases because it is easier (read: cheaper) to scale than relational (SQL) databases. That does not mean relational databases cannot scale.
  The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema. Several of the NoSQL options are very good at handling that kind of data, SQL, not so much.
  Yes, there are a handful of people that think they need a Facebook-scale database before they have even released their project into the world, but they are the exception. Most people are using NoSQL databases because they are a better fit for the job than a traditional SQL database. Those people will not argue that SQL cannot do the job, just that a NoSQL database can do the job better.
17. Re:Call me skeptical by Anonymous Coward · 2010-11-18 09:50 · Score: 0
  
  How is telling you that you are looking up the wrong tree not useful information about NoSQL? The reality is you really don't need NoSQL solutions unless you're an internet giant, and somebody setting up a NoSQL environment is going to hurt their performance more than help it and just generally make things harder for themselves. If you need NoSQL, you know you need it and you probably have the talent to implement it.
18. Re:Call me skeptical by nschubach · 2010-11-18 09:51 · Score: 1
  
  Is it people that don't know how to lay out a database or that you need to know how to lay out a database so it does fit with their need?
  I see a lot of hate around alternatives to SQL and most of them blame the design of data retention rather than accepting that there may be another way to achieve what is needed. It sounds to me like people trying to justify their job (which may not be necessary under a different model that doesn't need someone to "design" anything.)
  Honest question there...
  
  --
  Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
19. Re:Call me skeptical by ADRA · 2010-11-18 09:52 · Score: 1
  
  Wow, that was a great video. Thanks for the link.
  
  --
  Bye!
20. Re:Call me skeptical by thetoadwarrior · 2010-11-18 09:58 · Score: 1
  
  Have you seen how "fast" MySpace is? It's certainly no Google.
21. Re:Call me skeptical by Anonymous Coward · 2010-11-18 09:59 · Score: 0
  
  NoSQL is about sacrificing different parts of ACID for speed. It really is a matter of which one are you ok with. If you do not want to give up any of the letters then you are stuck with SQL.
  However, the dude hit all the 'your doing it wrong buttons'. Startup, post on slashdot, jumping to nosql buzzword. It could be anything from a badly written application, to a bad index, to badly laid out data. There are literally hundreds of ways to grind a SQL server to a halt. There are also just as many fixes for those bad things.
  What it comes down to is many times people are afraid to test large datasets. 'oh that will never happen' is the rallying cry. So they do not even try realistic scenarios. Just 'small' data sets. So all the interfaces 'work' but the data storage/retrieval is horrid.
  DB optimization is fairly straight forward. But if you want it done 'easy' it usually costs 150 an hour and you hire a contractor that does it for a living.
  Also in this case I would say he is trolling around for a 'swap out' that is just 'faster'. NoSQL does not get you that if your application is acting crappy. It is a matter of stop thinking and LOOK at what the thing is doing. But many times devs like to speculate what is wrong. He is actually looking for a justification at a rewrite of whatever he works on. But wants a swap out. Not going to happen. He would really be better of looking at why things are not performing. And if he is unwilling to do that then NoSQL is not going to get him what he wants either.
22. Re:Call me skeptical by Anonymous Coward · 2010-11-18 10:06 · Score: 0
  
  Data consistency will always be the anchor on any system. Unless you don't care about data reliability and accuracy you'll always need systems to be either centralized or constantly in synch which means locking, etc.
23. Re:Call me skeptical by Anonymous Coward · 2010-11-18 10:11 · Score: 0
  
  Sometimes it's the difference between what works and whats better then what is possible. A lot of things are possible, it doesn't mean they're actually a good way to do anything. That said I know pretty much nothing about NoSQL so I can't add anything more useful then my previous warning.
24. Re:Call me skeptical by Anonymous Coward · 2010-11-18 10:15 · Score: 0
  
  I love it when "big boys" try to pinch pennies by using NoSQL. As soon as they try to do more than small scale testing (megabyte sized dbs) they quickly find performance and concurrency going down the drain. Then they spend 10x the amount of money they thought they saved trying to get their cheapo system up to a usuable speed running giga-terabyte dbs but it is always poor (if it works at all). They end up scrapping the whole mess and buying an RDB anyway.
25. Re:Call me skeptical by Brian+Quinlan · 2010-11-18 10:15 · Score: 1
  
  Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.
  Naturally I start thinking of this: http://xtranormal.com/watch/6995033
  A million requests per day translates to 11.5 requests per second. That's a pretty trivial amount of traffic. A massive site like Facebook is probably serving about 4 orders-of-magnitude more requests that that.
26. Re:Call me skeptical by gfody · 2010-11-18 10:16 · Score: 1
  
  It's a sickening display of ignorance coming from people who are supposed to be professionals. Nobody takes issue with people investigating alternatives to SQL but SQL has come under heavy fire by NoSQL proponents and yes one can become very sick of hearing the same old fallacious arguments again and again.
  
  --
  
  bite my glorious golden ass.
27. Re:Call me skeptical by Anonymous Coward · 2010-11-18 10:18 · Score: 0
  
  I think it's more that there's so much hype for something that mostly just compensates for user incompetence at the expense of real world usage. I know I'm sick of hearing about it. If I had a legitimate use for it, I'd use it, but I certainly don't want to hear any more blind evangelism for it. NoSQL is a tool. Use it if it works for you and shut up about it, 'cause I don't want to hear you brag about your lack of RDBMS experience.
28. Re:Call me skeptical by Hoi+Polloi · 2010-11-18 10:21 · Score: 1
  
  Good thing all my db's are massive, flat text files.
  
  --
  It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
29. Re:Call me skeptical by Anonymous Coward · 2010-11-18 10:22 · Score: 0
  
  "lightning fast"? How many Libraries of Congress per Jigawatt is that?
30. Re:Call me skeptical by vadim_t · 2010-11-18 10:25 · Score: 4, Informative
  
  A lot of people don't understand how a database really works, so they do it horribly wrong. As a result, it's dreadfully slow. So they go and use some key/value lookup system because "they're fast". There you often get one of two things:
  They still don't understand the problem, so they recreate it yet again. If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side, then replacing the SQL DB with a key/value system just isn't going to make things better.
  Or, they improve performance, but since they don't understand what ACID is for, they eventually end up with weird inconsistencies. In some cases this might be acceptable, but you really don't want to see it happening in an order tracking system.
  The sickening feeling people get is not because it's a competitor. In a large part it isn't a competitor, but a different class of system with different tradeoffs. The sickening feeling comes from seeing people not understand what they're doing, and then run towards the latest technology because it's what $BIG_COMPANY uses without understanding it any better, and generally making an even bigger mess.
  The performance of specialized solutions like key/value systems doesn't come from magic. They're not really new, and don't use anything very groundbreaking. They simply use different tradeoffs at the cost of sacrificing quite a lot of what is present in a RDBMS. It's important to understand first whether you can really afford to discard those things, because if you can't, it's either not going to work right, or you'll have to graft all that you removed on top of it anyway.
31. Re:Call me skeptical by Squeebee · 2010-11-18 10:26 · Score: 2, Funny
  
  Would you have preferred I have said bazillions?
32. Re:Call me skeptical by cratermoon · 2010-11-18 10:28 · Score: 1
  
  Could you be specific about which fallacious arguments you have in mind? Preferably, cite 3 different fallacies with multiple sources for each one.
33. Re:Call me skeptical by Natural+Join · 2010-11-18 10:46 · Score: 5, Interesting
  
  The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.
  There is no such thing as "data which does not fit into any schema", just like there is no such thing as data which cannot be encoded into binary. All data necessarily has a schema. However much or little of the schema you may choose to model in your (SQL or other type of) schema is, like the rest of software engineering, a design tradeoff.
  The various NoSQL approaches do not solve the full generality of data management problems the way SQL databases do. They are narrower in scope, and as is generally the case, they can achieve better performance by virtue of doing less. They can be much faster with certain data access paths, but at a cost of the fact that other data access paths become prohibitive.
  The frustrating thing for many of us is that the NoSQL spin on data management is about where mainstream data management was in the 1960s. As the field matured, it learned many important lessons, all of which are now being tossed out the window by people saying "oh we don't need that" but of course, they just haven't needed it yet. As these problems become apparent to them, they will spend the next decades of their lives reinventing what the data management field figured out in the 80s and 90s. Until then, they'll be making beginner mistakes, like thinking that their data somehow doesn't fit into any schema.
34. Re:Call me skeptical by mini+me · 2010-11-18 11:22 · Score: 1
  
  You can, of course, create a key/value table in a SQL database, but then you are just creating your own NoSQL database on top of SQL. Why wouldn't you use a database designed to store data in that format, in that case?
35. Re:Call me skeptical by 19thNervousBreakdown · 2010-11-18 11:24 · Score: 1
  
  The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.
  R'lyeh is apparently the new Silicon Valley.
  
  --
  <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
36. Re:Call me skeptical by Estanislao+Mart�nez · 2010-11-18 11:27 · Score: 1
  
  I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards.
  Eh, I wonder if you're overstating the performance implications of that. Those are all row-oriented databases. Unless all of the columns your query needs are found in an index, it's going to have to read the whole row from disk anyway; the extra costs from the * then become (a) memory and CPU usage and (b) network bandwidth. In my experience, network bandwidth is usually not a big problem; memory and CPU usage can be an issue, but the big performance killers tend to be inefficient joins (because they don't scale linearly), while scalar stuff (the which the * would fall into) are usually cheap.
  
  --
  Are you adequate?
37. Re:Call me skeptical by Anonymous Coward · 2010-11-18 11:39 · Score: 0
  
  The are good reasons to use "noSQL", that doesn't always mean that those will be the ones in play when the decisions are made. One thing I want to point out... for a start up servers cost money, and while with many data types, patterns, usage scenarios, etc relational is perfectly good, but in a competitive world, if nosql is faster for your scenario, then that's a cost reduction. Less servers, a smaller bill and that's important for a start up. Still I agree from my own experience that in most cases performance issues are down to gross incompetence -> poor design.
38. Re:Call me skeptical by mini+me · 2010-11-18 11:41 · Score: 1
  
  NoSQL is not about ACID. Several NoSQL databases are ACID compliant. NoSQL is about not using SQL to query a database. Some NoSQL databases are even relational, just like SQL.
39. Re:Call me skeptical by Monkeedude1212 · 2010-11-18 11:45 · Score: 1
  
  This made my day, thank you.
40. Re:Call me skeptical by Skal+Tura · 2010-11-18 12:04 · Score: 1
  
  Yup, someone simply made up a buzzword it caught.
  "NoSQL" or key/value datastore is good for caching data. but i don't see much value beyond that if there's any relations.
  Thing is, most people are ignorant and clueless. Probably 90% of code i see (and i see it a lot) i discard as bad quality. Ironically, those with most buzzwords in marketing about code quality seems to have the lowest code quality in practical terms, and most idiotic software architecture. And yes, i do mean Magento.
  Magento is a modern marvel example of this, they claim all the buzzwords which are hip. MVC, EAV, Enterprise, Open Source etc. They are just that: Features with checkmarks.
  MVC rules are broken left and right, so are EAV rules, it's not truly open source neither: The code is so shit it's clear it's just a gigantic push for consulting services, some features are paywalled etc.
  In the end, the cost of rolling out a new quite standard web shop was over 4x compared to slightly modified oscommerce codebase (basicly changed from standard to ease changing layout, nothing else) and not all features got completed.
  Things like payment gateways required thousands of lines to code for the most basic of support, where OSCommerce counterpart module was under 200lines. Hell, developing anything custom into it is a major PITA unless you are unfortunate enough that you HAVE to work with Magento for a reeeeally long time.
  Also, it's quite slow, and idiotic design.
  Drawbacks were like:
  - MVC layers were all mixed up, especially in the view there could be quite a bit logic.
  - Forced JS for your customers
  - The "SEO" features makes you rank worse (Duplicate content issues all over the place, the JS forcing etc.)
  - Completely stupid abstractions
  - EAV meaning for them: Shitload of useless abstraction
  - Abstractions or simply hooks were inconsistently spread all over the place in illogical locations
  - Uses totally "wierd methods" for handling plugins etc. (quite well known software, but not used by anyone else for that task and fault prone)
  - That plugin installation was likely to fail
  - Error handling: Bad (at best) to complete hindrance. Errors where execution should be stopped did not even yield a error message (even in logs), errors which were informative at best (such as adding text to inexistant string variale $inexists .= 'foobar') caused a complete halt
  - Templates: By default it consists of tens of thousands of lines of CSS, which is mostly duplicate & irrelevant special cases just reaffirming the already applied styles. Minimum requirement for working layout is ~5k lines of CSS.
  - Templates: CSS requires insane amounts of tiny special cases to make it work overally.
  - Templates: Unless you use a reaaally specific definition, no matter how tiny the change, you were likely to break something else when you fixed this. This is due to the fact that by default the HTML is so crap, and rewriting it all is too big of an task.
  - CSS: Classes & ID differences are misunderstood, generic class mismatches, inconsistent use and collisions.
  - The amount of files in the system is simply stunning, take out popcorn and sit back while waiting the counting to finish on regular workstation. (I'm not saying those files should be joined, but some forethough into the design should have taken before doing the design so that instead of requiring that 1-2 files which are actually necessary you are required to create 5+ directories and 20+ files).
  Well you get the feeling. It's a really good example of how not to code anythign, should be taken into schools as an example of what to avoid. Pretty much anything they've done is more or less wrong.
  Basicly they taken everything what is Buzz and what has been Buzz and forced it all into a single system.
  Btw, that list was out of my head, and last time i've touched magento was almost 1½yrs ago. It's something i will never forget, it was that excruciatingly painfull.
  
  --
  Pulsed Media Seedboxes
41. Re:Call me skeptical by Skal+Tura · 2010-11-18 12:07 · Score: 1
  
  Roflmao! "Push to save data that does not fit into any schema."
  Excuse me, if you cannot comprehend the data (no schema can be created), then let someone else do it who actually knows what they are doing.
  All data is just data, they have relations, they have types, they have patterns etc. There's no magical data no one cannot comprehend enough to put it into a DB Schema.
  
  --
  Pulsed Media Seedboxes
42. Re:Call me skeptical by Skal+Tura · 2010-11-18 12:17 · Score: 2, Informative
  
  So you've not worked on anything like that, where actually someone knew how to make a relational database.
  Ty very much, but our DBs are running fine with over 100million rows that's almost purely textual data being searched (relational full text searches) and 500+ q/s, and double that in hits per sec with a single modern server still having plenty of free resources.
  Ok that doesn't change that much, but then we got this one thing which over 100x the size, runs even way heavier searches (exponentially more complex), and updates almost constantly and public uses it from just 2 nodes, and this has been designed to have over 100 pageviews per sec.
  All of that runs on top of MySQL and standard hardware. (No SSD, no gigantic amounts of ram, no gigantic amounts of HDDs etc.)
  And the most expensive server was 5500eur, the more complex one uses ~3½k eur blades.
  
  --
  Pulsed Media Seedboxes
43. Re:Call me skeptical by Skal+Tura · 2010-11-18 12:20 · Score: 1
  
  I use extensively the so called NoSQL stuff, but for caching. The actual, real data and it's relations are still stored in a RDBMS which is accessed if there's no cache hit.
  NoSQL for most part is just key/value pairs, nothing special.
  Try to map out in NoSQL reliably a very complex data structure (think 50+ interconnected relations of one to many or many to many)
  
  --
  Pulsed Media Seedboxes
44. Re:Call me skeptical by mini+me · 2010-11-18 12:35 · Score: 1
  
  Perhaps my wording was poor, but if you have knowledge of the structure of the data in advance, you are not thinking about the kinds of applications I am. Only the user knows what the structure of each record is going to be as they enter it.
  Yes, you can still do it with SQL, but it is very, very ugly. Why would you try to shoehorn the problem into SQL when databases exist that are designed for the job?
45. Re:Call me skeptical by Klinky · 2010-11-18 13:57 · Score: 3, Insightful
  
  NoSQL is not just key-value lookups. Take a look at Redis or MongoDB, there are novel ideas in both of them & yes they do bring new things to the table. They are NOT memcache. I am also not sure people are "sacrificing a lot of what is present in an RDBMS" by choosing NoSQL over an RDBMS. I think your gripe is with people who don't know what the hell they're doing, but you project that griping on to NoSQL in general. There are some things that RDBMSs are really good at, there are some things RDBMSs aren't so great at. The huge majority of people in the NoSQL communities and the users of these solutions know that loading a million objects client-side and discarding all but 5 is stupid & no one would suggest that is a failing of either RDBMS or NoSQL solutions, but squarely on the user.
  I would have to say that NoSQL is more relatable to how people think about objects and their relation to each other. People don't easily boil their objects down into relational tables and how each of those tables should interact with each other. This takes skill & talent & can be a bit of a pain to dive into which is why we have a bunch of ORM solutions which add another layer of cruft on top of RDBMSs. NoSQL is basically getting rid of the ORM & the tables(though some still use table-like structures). For apps that use that would normally use ORM(a lot of web apps) extensively that's great. For newbies who don't have years and years under their belt designing, tweaking normalization, sharding/partitioning it can be easier to pickup. Some of the NoSQL solutions have clustering/horizontal scaling and/or replication built-in, with no or very little schema/query changes required.
  So for some NoSQL will be a better solution and for others a RDBMS will be a better solution. I wouldn't knock either. Just because you can do something in an RDBMS doesn't mean it's better than a NoSQL solution & visa versa.
46. Re:Call me skeptical by Klinky · 2010-11-18 14:11 · Score: 1
  
  ...and your hatred of Magento has what to do with NoSQL?
47. Re:Call me skeptical by GWBasic · 2010-11-18 14:53 · Score: 1
  
  Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...
  It depends on the kinds of queries and/or feature set you're trying to do. Don't assume that NoSQL is all about scalability, I chose MongoDB for my startup because we have a requirement that's very difficult to address in MySQL, but trivial in MongoDB.
  
  --
  No, I will not work for your startup
48. Re:Call me skeptical by Anonymous Coward · 2010-11-18 15:09 · Score: 0
  
  I see you belong to the NoAwk school...
49. Re:Call me skeptical by Anonymous Coward · 2010-11-18 15:19 · Score: 0
  
  +1 vadim_t! rgds SchemaCzar
50. Re:Call me skeptical by evanism · 2010-11-18 15:44 · Score: 1
  
  Oscommerce is no better, really. It has an incredibly irritating mix of OO and procedural php, a mash of templates and bizarre attempts at MVC and some code in the core procedures that is replicated 4 times for identical outputs. A very high WTF ratio on this one. I have spent the best part of a month getting it into shape. Re databases, the guys who designed it sure do love their normalisation!
  
  --
  Just bought a new quantum computer, but I'm uncertain how it works.
51. Re:Call me skeptical by ieatcookies · 2010-11-18 16:13 · Score: 1
  
  Amount of data is not always the concern.. some companies have so many concurrent connections that the databases can't keep up. These situations are ideal for scaling horizontally using various methods.. including caching on top of the database and eventually being consistant (to allow always available reads, etc). A very simple solution is to replicate your db and have all reads come from your slaves and all writes to the master, this helps spread the connections around. Software like Cassandra are an example of moving away from full out relational dbs for the purpose of having always available data..
52. Re:Call me skeptical by Anonymous Coward · 2010-11-18 16:30 · Score: 0
  
  Yeah... I agree... Take Slashdot or many others like it that serve up thousands or perhaps millions of requests per hour. Nearly all common solutions work well these days but you should be able to serve well over 1000 select queries per second. Even joins aren't usually very costly so my guess is that you are joining on multiple tables, taking the result set and feeding that into another query set.
  Serioously... think hard about your data. You need a full time dedicated person to solve these problems. WoW uses SQL Server and it servers millions of requests per hour on each shard. If you can't match WoW, then you're doing it wrong.
53. Re:Call me skeptical by Anonymous Coward · 2010-11-18 16:45 · Score: 0
  
  They also need to learn how to run, and more importantly, how to read an explain plan.
54. Re:Call me skeptical by Bodrius · 2010-11-18 17:27 · Score: 1
  
  This is all true, but ignores the fact that for a lot of applications and teams RDBMS were overkill in the first place, so they are hardly sacrificing anything by switching to NoSQL.
  It's the same reason a lot of people in the early dot-com days believed MySQL was awesome precisely because it was such a crappy RDBMS ('who needs transactions or referential integrity anyway? it just slows things down')... arguably with robust simpler storage now there is more awareness of which facilities are sacrificed, and which advantages the R in the acronym bring to the table when it is needed.
  
  --
  Freedom is the freedom to say 2+2=4, everything else follows...
55. Re:Call me skeptical by Anonymous Coward · 2010-11-18 18:10 · Score: 0
  
  >Ty very much, but our DBs are running fine with over 100million rows that's almost purely
  >textual data being searched (relational full text searches) and 500+ q/s
  With all due respect, that's a fraction of the scale that these massively distributed systems are aimed at. Imagine if your load was the next order of magnitude -- you had that much pressure just on your *indexes*.
  Most people can't even think of a problem that fits that solution, which is why the hype is really just that.
  On the other hand there are lots of applications that need datastores that are not a good fit for any conventional relational model, and plenty of things that can be done in the kind of brute-force parallel algorithm where MapReduce is a big win and huge distributed non-relational databases are perfectly suited.
56. Re:Call me skeptical by fishbowl · 2010-11-18 18:34 · Score: 1
  
  >Data consistency will always be the anchor on any system.
  There are applications where data consistency is completely off the list of requirements, while really fast, globally distributed lookups of of huge collections of simple maps is the primary or only consideration.
  Some models fit an enterprise accounting or ERP type of system. Other models fit an exabyte sized store of full motion video. And then there are models appropriate for indexing signatures of that video for the purpose of being copyright police. There might even be a model that can accommodate all those requirements at the same time.
  Value judgment of the various data storage and processing solutions without clear notions of the application and required scale, is simply not a conversation we can have.
  
  --
  -fb Everything not expressly forbidden is now mandatory.
57. Re:Call me skeptical by weicco · 2010-11-18 18:41 · Score: 1
  
  If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side
  I think more common case is to read entire table with million records and discard all but 5 of them on the server-side, just because the one who designed the db didn't know a squat about indexing. I've seen databases without a single clustered key. This force a full table scan even if you return just a single record to client-side.
  
  --
  You don't know what you don't know.
58. Re:Call me skeptical by Tablizer · 2010-11-18 19:54 · Score: 1
  
  everyone wants to use NoSQL because it's the hip new thing.
  What usually happens is that somebody is planning on finding a new job in the hot new field, and convinces existing company that Hip New Thing is better than sliced bread. It's not, but at least you now have the experience and resume cred to move into the hot new field. The current company eventually gets the mess to work satisfactorily as it slowly matures, after cursing about you a hundred times a week.
  
  --
  Table-ized A.I.
59. Re:Call me skeptical by the_womble · 2010-11-18 20:35 · Score: 1
  
  I know that Mongo DB is not durable, and they (recommend using replication for reliability ), that sounds like a substantial sacrifice. Do these DBs have equivalents to referential integrity and joins?
60. Re:Call me skeptical by keean · 2010-11-18 20:42 · Score: 1
  
  This is true, NoSQL must be better if you are doing the ORM thing. But that's not how you should use a relational database. Ralational Algebra (of which SQL is an implementation) is a programming language. You construct your data in relations (tables) then write the operations on them in relational-algebra (SQL).
  
  So you can either use object modelling and and ORM/NoSQL, or relational modelling and relational-algebra/SQL. Of the two relational modelling is more powerful (can model more situations), and relational-algebra is a higher level language (because you specify what you want done, not how to do it).
61. Re:Call me skeptical by __aatirs3925 · 2010-11-18 22:13 · Score: 1
  
  I was going to suggest something along the lines of multi-relational databases but your idea solves the problem entirely (very neatly may I add) and the query would probably take a few seconds. Simple is often always better
62. Re:Call me skeptical by greap · 2010-11-18 23:41 · Score: 1
  
  You might choose a RDBMS in that scenario precisely because you don’t know the future. Solution architecture is about fighting fires that haven’t started, designing a system that doesn’t just fit the problem at hand but anticipates how that problem might mutate in the future or how other problems might come in to play. If you are pretty damn sure the data will never have relationships that need to be expressed and enforced and that the schema will remain that simple then a NoSql solution will probably be the best bet. IMHO its fairly rare when this is the case.
63. Re:Call me skeptical by GooberToo · 2010-11-19 01:46 · Score: 1
  
  Over the last twenty years, periodically, a new technology comes out which is to "bring the end of SQL" - or at least offer a serious competitor. Early adopters start new projects or even migrate away from SQL to their new data store messiah. One to five years later, most everyone quietly migrates back to SQL. When asked, they quietly admit they were an idiot following fad.
  Furthermore, EVERYONE I've spoken with who have considered a NoSQL based solution clearly don't even understand what they are giving up in exchange for their new found features. Meaning, everyone I've spoken with, which frankly isn't many, looks fully equipped to be migrating back to SQL in the next one to five years.
  At this point, there is absolutely no evidence we're not looking at yet another group of idiots following the latest anti-SQL fad. Not one bit. None. Nadda. To be clear, there absolutely are cases where NoSQL technology has its place. SQL is not perfect and does not adequately cover every use case well. But the vast, vast, vast majority of people will absolutely be better off with a SQL based solution rather than following the latest, cyclic fad. And frankly, most people are not even remotely qualified to be making such decisions based on technological merit; which only adds to the pool for future migrations, who will likely blame everything but themselves.
64. Re:Call me skeptical by GooberToo · 2010-11-19 02:23 · Score: 1
  
  Yes, because assigning someone homework to cure your ignorance is always a good way to validate your ignorance. Brilliant!
65. Re:Call me skeptical by craftycoder · 2010-11-19 05:44 · Score: 1
  
  duh! 1.21. Who doesn't know that?
66. Re:Call me skeptical by Medievalist · 2010-11-19 06:20 · Score: 1
  
  I just select * from * and then sort it out with grep and cut.
  I just use dd to pipe the database partition through gawk.
67. Re:Call me skeptical by RobertM1968 · 2010-11-20 07:24 · Score: 1
  
  The real problem is scale. Any SQL-DB server will cope with most application fine, but add live data on a public facing site with a decent volume of users, and they're crawl to a slow death. This is why non-crucial sites use denomenilization and do the dastardly deed of data duplication to speed up their bad query and suspect table design.
  So what's the solution? Very large and expensive boxes for the simplest method (no one likes this these days), and then lots of boxes performing certain tasks. Which has its own huge costs, because skilled people in this field are very few and far between, and are already working for Yahoo, Google and Fartybook.
  Wow, you've got no experience in the database world at all, do you? Or perhaps your experience is with MSSQL and their other database "solutions"?
  It really doesnt take massive hardware to serve a ton of connections. There was a time when the patent database, which was (at that time) run by IBM, was on a few small (tiny compared to today's hardware) boxes running OS/2 and DB/2. On an ancient piece of hardware (quad 550MHz CPUs), we've done *INSERTS* (while reads, inserts and writes (modifies) are going on for the web clients) using only ONE CPU for the bulk inserts, at speeds in the high hundreds per second (so close it might as well be a thousand a second).
  Now... with the correct disk and SQL caching set up... or maybe with using a machine that isn't a dinosaur, we'd be able to easily quadruple that performance. Our newer box is: CPU=5x faster, Memory=4x faster, HDD=11x faster, SCSI controller cache=16x larger, SQL cache=4x larger (on memory that's 4x faster).
  I'm not sure what our limits are (actually, I have calculated estimates for such - we're at 15% load during peak usage), but I do know we haven't reached them yet - or even come close. As a matter of fact, we are sooooo far away that we haven't seen the need to migrate to the bigger, newer, more powerful server.
  Here's just a few things you need to learn (enough to get you started):
  -How to make a proper RDB
  -How to properly tune your server (DB, web server, disk cache, file system cache, etc)
  Any idea how many people who think they know what they are doing that I have found who are TRIPLE caching things? That doesnt speed things up - it slows them down and wastes hardware resources!
  -Which hardware components are the vital ones to spend money on upgrading (is it disk? is it memory? is it CPU? is it a combination of 2 or more of those?)
  -How to do proper queries, merges and inserts
  -How to properly index columns (and which ones to index, and whether separate key columns make sense or waste space)
  -How to properly tune fulltext indices.
  -DONT rely on a Microsoft product, otherwise, yes, you will need massive hardware. You have NO IDEA how many MS solutions we've replaced with MySQL to gain a hundred-fold increase in performance on the same hardware.
  While we don't get too much traffic (only a few million requests a day per server), our servers generally run at 3% CPU utilization, and 98% cache hits on the disk.
  There are some good books out there to get you started... you may want to check some of them out.
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
68. Re:Call me skeptical by RobertM1968 · 2010-11-20 07:28 · Score: 1
  
  >Ty very much, but our DBs are running fine with over 100million rows that's almost purely >textual data being searched (relational full text searches) and 500+ q/s
  With all due respect, that's a fraction of the scale that these massively distributed systems are aimed at. Imagine if your load was the next order of magnitude -- you had that much pressure just on your *indexes*.
  Most people can't even think of a problem that fits that solution, which is why the hype is really just that.
  On the other hand there are lots of applications that need datastores that are not a good fit for any conventional relational model, and plenty of things that can be done in the kind of brute-force parallel algorithm where MapReduce is a big win and huge distributed non-relational databases are perfectly suited.
  Table indexes can be cached, you know that, right? That's called learning how to tune your DB, and requires knowledge of how to change the config file. And of course, knowing what hardware you need to allow DB tuning. One for instance wouldn't assign a 1GB cache to indexes on a machine with 1GB of RAM.
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
69. Re:Call me skeptical by RobertM1968 · 2010-11-20 07:37 · Score: 1
  
  Is it people that don't know how to lay out a database or that you need to know how to lay out a database so it does fit with their need?
  I see a lot of hate around alternatives to SQL and most of them blame the design of data retention rather than accepting that there may be another way to achieve what is needed. It sounds to me like people trying to justify their job (which may not be necessary under a different model that doesn't need someone to "design" anything.)
  Honest question there...
  No hate here for non-SQL solutions... Most people buy some crappy (for Enterprise level stuff) tool that makes their databases and have no clue why the things are so slow.
  My point is, many things that people THINK *SQL isn't suited for is simply BULL. The reality is those people aren't suited for using *SQL because they dont know what they are doing.
  That in no way means other solutions aren't better for other problems/situations.
  Example... one of our clients had some "database programmer" (heh) who wrote some software for them. A few hundred requests a day on a small database. Report queries would take (depending on the report) 5-20 MINUTES. The solution isn't getting a non-SQL solution. The solution is that "programmer" should never ever ever ever do anything database related again. Five times the data, and we've got the slowest report running at under 9 seconds... and most of that (6+ seconds) is actually time formatting the output (1 second) and sending the data to the client and the client rendering and displaying the data (5 seconds - that time drops considerably on their one non-ancient client computer). That leaves 3 seconds of actual time being used for BOTH the database queries AND the server side calculations of the data retrieved.
  See the point? Nothing at all against non-SQL stuff. But the problem is, many people who simply have no clue how to implement anything database related think the non-SQL stuff is some magic bullet to fix their own deficiencies. Too many people in the various message boards who are discussing this stuff think they need it for their very tiny, very lightly accessed DB driven stuff. Honestly, if they cant handle a few hundred or even few thousand requests a day, on very small tables, do you really think one of these non-SQL solutions will help them? Or do you think with such a tremendous lack of knowledge on their part that instead, such a solution will make things even worse for them?
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
70. Re:Call me skeptical by RobertM1968 · 2010-11-20 07:55 · Score: 1
  
  I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards.
  Eh, I wonder if you're overstating the performance implications of that. Those are all row-oriented databases. Unless all of the columns your query needs are found in an index, it's going to have to read the whole row from disk anyway; the extra costs from the * then become (a) memory and CPU usage and (b) network bandwidth. In my experience, network bandwidth is usually not a big problem; memory and CPU usage can be an issue, but the big performance killers tend to be inefficient joins (because they don't scale linearly), while scalar stuff (the which the * would fall into) are usually cheap.
  Nope, not overstating anything.
  Let's say it's an ambulance company database that's used to calculate their LOSAP points for the year... that requires calculations from EVERY data parameter input, since everything a member does goes towards their LOSAP points. In that data are things like their PCRs (Patient Care Reports). Each PCR may have .1MB (not counting scans) of data associated with it. Let's say there are 200 members and 24,000 PCRs. Now... let's assume the server has 2GB of RAM. That's 2.4GB of data to read just from the PCR tables alone if one loads all columns. Or a twenty minute report. Even stepping through record by record (due to the overhead of 24,000 individual read requests JUST for the PCR data - even when using the same DB connection).
  Now, selectively loading the 8 small data columns needed from the PCR table, and doing the same with all of the other tables for all of the other parameters needed to calculate LOSAP on the other hand reduces the report to no more than 3 seconds, including data calculations - and that's with loading EVERY needed dataset into server memory for the scripts that make the report.
  Yes, my example was only one of numerous things to consider... but it is the example I've given because it's the biggest no-brainer that anyone who works with a database should know; while yours are a little more complex and may not be understood by the people who think they are database programmers.
  And yes, my example is a real world example. There were a lot of other issues we ran into as well... we scrapped the entire old system and replaced it from the ground up because we ran into so many poor design choices (some of which directly in line to what you mention). On some of the databases (yes, databases), data was spread across 20 tables - for SMALL datasets, while on others (like the PCR data) it was all glommed into one table. 127 (yes, ONE HUNDRED TWENTY SEVEN) tables in total, across multiple databases... we've dropped that down to one database with 9 tables (only 6 of which are actual data used for stuff like LOSAP, while others are for "incidental" data used by the data entry and access rights system).
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
71. Re:Call me skeptical by Estanislao+Mart�nez · 2010-11-22 12:23 · Score: 1
  
  I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards.
  Eh, I wonder if you're overstating the performance implications of that. Those are all row-oriented databases. Unless all of the columns your query needs are found in an index, it's going to have to read the whole row from disk anyway; the extra costs from the * then become (a) memory and CPU usage and (b) network bandwidth. In my experience, network bandwidth is usually not a big problem; memory and CPU usage can be an issue, but the big performance killers tend to be inefficient joins (because they don't scale linearly), while scalar stuff (the which the * would fall into) are usually cheap.
  Nope, not overstating anything.
  Let's say it's an ambulance company database that's used to calculate their LOSAP points for the year... that requires calculations from EVERY data parameter input, since everything a member does goes towards their LOSAP points. In that data are things like their PCRs (Patient Care Reports). Each PCR may have .1MB (not counting scans) of data associated with it. Let's say there are 200 members and 24,000 PCRs. Now... let's assume the server has 2GB of RAM. That's 2.4GB of data to read just from the PCR tables alone if one loads all columns. Or a twenty minute report. Even stepping through record by record (due to the overhead of 24,000 individual read requests JUST for the PCR data - even when using the same DB connection).
  Your explanation doesn't narrow down at all the cause of the speedup that you're experiencing. If that 2.4 GB case refers to the data in one table, the database, unless the query and schema fit some narrow conditions (e.g., all of the 8 columns your query wants are stored in the same index), is still reading the 2.4 gig of data. This is because all of the data for each row is stored together in disk; you can't (normally) just read the 8 columns you want.
  There is at least one more factor missing from your explanation (which, to be frank, I find hopelessly vague) that's just as essential to explain the speedup you're seeing. One (wild) guess: you're doing some large joins, the database needs to materialize intermediate join results, and losing the stars means those intermediate result sets become at lot smaller. Another (also wild) guess: your application is using the database as a dumb data store, pulling rows one by one from the DB, processing them individually in the application server. That's inevitably going to be very slow.
  
  --
  Are you adequate?
72. Re:Call me skeptical by RobertM1968 · 2010-11-22 12:50 · Score: 1
  
  Your explanation doesn't narrow down at all the cause of the speedup that you're experiencing. If that 2.4 GB case refers to the data in one table, the database, unless the query and schema fit some narrow conditions (e.g., all of the 8 columns your query wants are stored in the same index), is still reading the 2.4 gig of data. This is because all of the data for each row is stored together in disk; you can't (normally) just read the 8 columns you want.
  There is at least one more factor missing from your explanation (which, to be frank, I find hopelessly vague) that's just as essential to explain the speedup you're seeing. One (wild) guess: you're doing some large joins, the database needs to materialize intermediate join results, and losing the stars means those intermediate result sets become at lot smaller. Another (also wild) guess: your application is using the database as a dumb data store, pulling rows one by one from the DB, processing them individually in the application server. That's inevitably going to be very slow.
  Let's see... let's say we want Select pcr.dispatch_date, pcr.member number, member_info.member_name (from) where pcr.dispatch_date (is in 2009) order by pcr.dispatch_date...
  Loading the full dataset indeed loads a very tiny amount of information, and the memory used by both the SQL server and the script is minimal.
  Now, same thing with Select * (from) where pcr.dispatch_date (is in 2009) order by pcr.dispatch_date...
  Oddly, it seems to load over 2GB of data - as opposed to a hundred megs.
  Get it now? Been there, done that, tested it, watched (in the previous "programmer's" version) the machine start to churn the disk frantically and allocate non-existent physical RAM to virtual RAM. Gee, I wonder how the hard disk works when pretending to be RAM? [sarcasm]They're about the same speed, right?[/sarcasm]
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
73. Re:Call me skeptical by RobertM1968 · 2010-11-22 12:53 · Score: 1
  
  Another (also wild) guess: your application is using the database as a dumb data store, pulling rows one by one from the DB, processing them individually in the application server. That's inevitably going to be very slow.
  No, our application does not do that. We read the whole needed dataset, nothing less (and definitely nothing more).
  
  --
  StarTrekPhase2 - The Five Year Mission Continues!
74. Re:Call me skeptical by HappyDrgn · 2010-11-24 10:48 · Score: 1
  
  There are certainly many cases where there are advantages of non-relational systems as layers in the application that complement standard relational databases. Generally frequently read data that does not need to be queried at a granular level, like say session data, or geographical mapping tables. Some good complementary examples include memcache, redis or even ruby's starling. I use many of these in my applications, where honestly MySQL would probably work, but these other solutions provide many performance and cost advantages that simply can not be overlooked. Some, like starling, I've used to simply cache data on disk that does not change often, or lists in Redis to store map data.
  IMO it's often easy to say SQL will work so use that, but it's not always the best solution. Sure you can get it to scale; I've used it in very massive petabyte scale without very much issue... but sometimes for smaller sets of data frequently accessed do you really want to invest in the kind of hardware required to make SQL run well, or will an in memory store on commodity hardware work as well or better? Sometimes you have massive data going in where neither SQL nor NoSQL are going to help you, where maybe hadoop or another map-reduce type solution is more appropriate.
  It generally comes down to the questions; what type of data are you storing, how much data will there be, how are you going to use that data and at what levels of latency do you require for reads and writes? Before those are well defined you really are shooting in the dark on solutions to store and access it. This IMO is really the major issue most startups have, no one defined the data strategy, they just build with no conscious effort to examine what the business needs are short and long term.
75. Re:Call me skeptical by maraist · 2010-11-27 02:32 · Score: 1
  
  Not sure what you're saying. Why do you suggest relational models support more situations. You can not model recursive situations effectively. You can not model hierarchical data-structures - at least not ones with cycles. The join syntax itself is very verbose and when there are significant numbers of indexes, the number of permutations of possible join strategies grows exponentially (if you had 200 tables joined in a single query, with each table utilizing 4 indexes, you'd have a nearly impossible to optimize query). Yes this is an odd query, but only because RDBMS does not support this style of data-traversal - many systems would crap out at 1 to 4KB of SQL syntax. Not to mention the locking structure overhead would practically serialize access to the DB (yes, I know, why the hell would a non-transactional read cause locks.. because joins just simply suck in most RDBMS implementations).
  
  Compare that to an OODBM like Objectivity, where joins are replaced with 64bit foreign pointers to virtual addresses in possibly alternate storage spaces. And more importantly, the SQL schema replaces a join with a single dot, which is very familiary to Object Oriented systems, including ORM layers.. So "SELECT person.father.mother.daughters[0].siblings[0].employer.company.website.contact.phone FROM person WHERE id=?" is a legitimate query. It is FAR more expressive than if each was a separate table join. And each traversal requires a potentially uncached pair of page-lookups - one for the virtual mapping table, and one for the actual disk block. Compare that to traditional indexed based foreign keys which require log_base256(n) cold disk hits per join. Both OODBMS and RDBMS support B-Tree and Hash-Map symbolic indexing (e.g. login-name, range searches, etc). But for the simple graph traversal, OODBMS is just hard to beat.
  
  And no-sql solutions are really all about documents in general. complex data-structures which may or may not be hierarchical, yet have schema validation support (see voldemort JSON data-store, or couchDB). Depending on the schema, new inserts can extend the schema on the fly, or via DML statements, you can enforce that all NEW requests have a new schema, while leaving old records with a previously well defined schema. The document would have to retain it's schema id. Certainly an RDBMS could do this as well, though most are optimized to support highly structured rectangular data, with only the use of nullity supporting 'optional' schema additions.
  
  I do, very much like the set-nature of RDBMS, and for large complex cross-table index based queries (e.g. I need an index from tables A, B, C that are not their primary/foreign keys), RDMS supports some pretty damn complex capabilities - I'm specifically thinking of postgres, where you can do hash-joins of 5 separate queries, each with their own index, or covering indexes where you don't even need to access the main row to get the result (which is essentially what most nosql solutions do), or synthetic-function-output indexes (indexes on the output of functions instead of the data itself) or conditional indexes, where you fine-tune which you know you'll search on (create index job_state on jobs(state) where state not in ('COMPLETED','ARCHIVED')). Most nosql solutions are no where ready to support these complex search optimizations - though things like couchDB do allow you to have lazy indexes with user-defined functions - but I think they required indexable data on every row.
  
  But I don't see most of these advantages as being specific to RDBMS - just in their maturity. HBase, cassandra are still in their infancy - not even official releases yet from what I remember.
  
  --
  -Michael
76. Re:Call me skeptical by keean · 2010-11-27 04:01 · Score: 1
  
  Basically "SQL" or relational algebra is a programming language. All the arguments are against weaknesses in the implementation, not problems with the language itself. NoSQL seems better, but does not implement the complete language only part of it. If I try and implement SQL or a relational programming language on top of "NoSQL" databases you will find there are features missing. Eventually NoSQL will need to implement the complete language and will become relationally complete, but the result will probably not be as elegant.
  
  As for the join example, you compare a table join against a single lookup... not the same thing. How many seeks would objectivity need to find the phone number for the employers of all aunts (on the fathers side) of every person in the database (which is what a join is for)? And as for the virtual addresses, Oracle (for example) can use raw disk access to optimise the layout of data on the disk to minimise head movements when executing common queries. It can do this because relational algebra hides all details of the implementation from the user, allowing the database code more freedom to choose how it stores and processes the data).
CAP is fine by Anonymous Coward · 2010-11-18 09:12 · Score: 0

Translattice is not consistent... it is eventually consistent ...
Is it a technical or a budget problem? by ducomputergeek · 2010-11-18 09:12 · Score: 4, Insightful

Given my past 12 years between working at consultancies and start ups, I've seen this a few times. It's usually not a technical hurdle, it's a "We can't solve this problem within our budget" problem. Either by going out and hiring someone who is an expert at performance tuning with their DB of choice or moving from certain db's to real databases that could handle the work like MSSQL, DB2, Oracle, or in some cases Teradata if dealing with Data warehousing.
Because I've worked around some very large database installs in my day. Every time the scaling question/problem came up, it was solvable with RDBMS's, but the solution wasn't cheap.

--
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
1. Re:Is it a technical or a budget problem? by Cylix · 2010-11-18 09:37 · Score: 1
  
  There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.
  However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.
  In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.
  
  --
  "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
2. Re:Is it a technical or a budget problem? by PRMan · 2010-11-18 09:47 · Score: 3, Interesting
  
  My experience is that there is a lot you can do that is very cheap.
  One time, I walked into a mortgage company (I'm a developer, not a DBA) and they were complaining that they couldn't run a required government report breaking down their fee codes because it would time out after 2 minutes. The table had millions of records. I looked at the table and immediately noticed that they didn't have an index on fee code, which the report was trying to sort and total by. I told the manager that I would add an index on the fee code column after hours and run the report. He wasn't sure it would work so he said, "Go ahead and add it now."
  I added the index (which took about 30 seconds) and ran the report again. It finished in 45 seconds.
  I looked at the report. Whoever wrote it for them was concatenating strings all over the place. Millions of them. I switched the app to StringBuilder using a search-and-replace.
  I ran the report again. 8 seconds. In less than an hour I took a report that wasn't finishing in 2 minutes down to 8 seconds. That wasn't expensive for them and it wasn't hard to do.
  At another client, they were complaining about database slowness and the DBA wasn't having much luck fixing it. They fired him and asked me to look at it. I simply recorded a profiler log (a little slower for that day, but it's already dog slow so who would notice), found the longest duration and most common queries and then searched the source code repository and rewrote them. Many of these queries were cross-joins, missing indexes on the joined field or other really obvious problems. One was doing a data conversion on every record instead of data converting the passed in input once. It took me about 2-3 days to solve massive slowness problems. At the end, the employees were saying, "I'm glad they finally bought a new database server." This was at one of the country's largest mortgage companies with tens of millions of records in the database. And the fixes should have been brain-dead obvious to anyone with a few years of SQL experience.
  
  --
  Peter predicted that you would "deliberately forget" creation 2000 years ago...
3. Re:Is it a technical or a budget problem? by Hoi+Polloi · 2010-11-18 10:28 · Score: 2, Insightful
  
  I wish most tuning efforts only required fixing glaring index issues. You eventually find yourself dealing with large dbs with all the basic tuning done and now they want to get app X to return in 8 secs instead of 10. Then you go down the rabbit hole of initialization params, hints, etc. Sadly design considerations are almost always off the plate at this point.
  
  --
  It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
4. Re:Is it a technical or a budget problem? by lakeland · 2010-11-18 10:46 · Score: 1
  
  Interesting that you list only commercial DBs, do you have any trouble using postgres on very large databases?
5. Re:Is it a technical or a budget problem? by doofusclam · 2010-11-18 10:50 · Score: 2, Insightful
  
  There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.
  However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.
  In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.
  In a previous job at the start of my career, my company bought a Teradata system which came with the requisite sharp suited consultant, who told us how to lay out the DB schema.
  Being Teradata all the hashed indexes were in vogue, so it was lightning fast.
  Until the day they realised the users mainly did substring searches, which don't really work on a hashed index. Table scans a plenty = unhappy users.
  It doesn't mean a RDBMS is bad, it means that technology misapplied always sucks.
6. Re:Is it a technical or a budget problem? by ducomputergeek · 2010-11-18 12:00 · Score: 2, Informative
  
  I like PostgreSQL a lot. We use it now as the database that runs all of our company's software and those we deploy to clients. It's overkill for our point of sale product, but it's fast and stable. But PostgreSQL has lacked some features that made deploying it for very large databases not that attractive. There were three features that kept it out of the running: Lack of built in clustering, lack of Hot-Standby, no vender that could support both hardware and software under one roof (and could be sued if shit hit the fan). PostgreSQL 9 just addressed two of these drawbacks.
  That last criteria was probably the single biggest factor for these organizations. Where I went to college and got my first jobs out of school had a lot of AS/400's. Three major Fortune 1000 companies used DB/400, all the colleges used them, all the local hospitals used them, and IBM had an office in the town of 150k people staffed with about 50 AS/400 techs. Most of whom worked on site at the folks who had 200 - 500 AS/400's. (Estimate Total number of AS/400's in the area at the time was something like 1500)
  
  --
  "The problem with socialism is eventually you run out of other people's money" - Thatcher.
7. Re:Is it a technical or a budget problem? by Skal+Tura · 2010-11-18 12:22 · Score: 1
  
  "We can't solve this by adding new hardware" is a technical problem.
  Throwing hardware at a problem most of the time is not the good choice, and the pain now solving it technically pays off dividends for rest of the lifecycle of the software.
  
  --
  Pulsed Media Seedboxes
8. Re:Is it a technical or a budget problem? by Skal+Tura · 2010-11-18 12:29 · Score: 1
  
  Things get interesting when you have already done all that, run benchmark after benchmark for days upon days, even tried to change design and still lacking the last 2-5%, customer rejects the work and no new hardware is allowed.
  What to do then, is a real trick. Well, i managed to finally get more performance out of it, just so that someone else could break it a little bit later on without any sign anywhere in version control about how it was broken.
  
  --
  Pulsed Media Seedboxes
you're doing something wrong by Surt · 2010-11-18 09:13 · Score: 4, Insightful

"I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "
Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
You really need to define your problem with much greater specificity to get a valuable answer.

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
1. Re:you're doing something wrong by Stradenko · 2010-11-18 09:23 · Score: 2, Insightful
  
  Relational databases scale to pretty amazing heights
  Horizontally?
2. Re:you're doing something wrong by camperdave · 2010-11-18 09:30 · Score: 3, Insightful
  
  Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
  You really need to define your problem with much greater specificity to get a valuable answer.
  Given that the title of the story is "Horizontal Scaling of SQL Databases?" the notion that that relational databases are able to scale to pretty amazing heights is irrelevant.
  
  You really need to define your problem with much greater specificity to get a valuable answer.
  That's definitely true. It may be, in fact, that an RDBMS is not what is needed at all.
  
  --
  When our name is on the back of your car, we're behind you all the way!
3. Re:you're doing something wrong by C_Kode · 2010-11-18 09:32 · Score: 1
  
  Some people use sharding to scale horizontally.
4. Re:you're doing something wrong by mlyle · 2010-11-18 09:48 · Score: 3, Informative
  
  And that's what Translattice does, actually: for the database part of the system, we transparently shard large tables behind the scenes, and figure out how to store it to the computing resources available taking into account historical usage patterns and administrators' policies on how data must be stored (for redundancy and compliance purposes). A different population of nodes is used to store each shard and the redundancy is effectively loosely coupled, so when a failure or partition occurs, the work involved in re-establishing redundancy is fairly shared over all nodes. This provides linear scalability for many workloads and better redundancy properties, and can also as a side benefit position data closer to where it's consumed.
  When it comes time to access the data, the query planner in our database figures out how to efficiently dispatch the query to the minimal necessary population of nodes, introducing map and reduce steps to provide for data reduction and efficient execution.
  All of the table storage is directly attached to the nodes, eliminating much of the need for a storage area network and scaling beyond where shared-disk database clusters can go.
5. Re:you're doing something wrong by Anonymous Coward · 2010-11-18 10:21 · Score: 1, Funny
  
  You really need to define your problem with much greater specificity to get a valuable answer.
  The real problem is he lied on his resume, has no idea what he's really talking about, and now they're asking about it at his job...
6. Re:you're doing something wrong by Civil_Disobedient · 2010-11-18 10:41 · Score: 1
  
  You really need to define your problem with much greater specificity to get a valuable answer.
  The OP said they were using NoSQL. That alone explains everything.
  Solution (to the OP, not the parent who clearly understands what they're talking about): go learn how to use relational databases properly. Normalize your data. Nine times out of ten, if you're repeating information in multiple tables, you're doing something wrong. DO NOT USE BUSINESS KEYS. Surrogate keys only. Why? Because you do not own a crystal ball.
7. Re:you're doing something wrong by Surt · 2010-11-18 12:51 · Score: 1
  
  Admittedly, a poor choice of words, but yes.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
8. Re:you're doing something wrong by Surt · 2010-11-18 12:52 · Score: 2, Informative
  
  I meant heights of performance and size, but admittedly, that was a poorly chosen phrase. But yes, you scan scale sql very wide.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
9. Re:you're doing something wrong by Chriscypher · 2010-11-18 16:39 · Score: 1
  
  "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "
  Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
  You really need to define your problem with much greater specificity to get a valuable answer.
  Duh!
  Maybe this guy's startup company specializes in exabyte database optimization... ... which explains why they are constantly struggling with relational database limitations !!!!
  
  --
  "You have liberated me from thought."
10. Re:you're doing something wrong by hesaigo999ca · 2010-11-19 01:37 · Score: 1
  
  I think sometimes these posts get on here just to ask the question to let those who do not know much learn more, this might be one of thoses posts...see how the postee never replies to questions about what his needs or specs are, and we are left giving advice of everykind, yet never really knowing if our suggestions are being heard by the postee or just the rest of /. interested in this topic....
  I hate when they do this on /. I would rather be clustering!
11. Re:you're doing something wrong by Anonymous Coward · 2010-11-19 11:34 · Score: 0
  
  No need to go exabyte.
  Usual relational databases perform badly when you have a few 100s of GB of data and you start doing warehousing-type stuff on it.
  Not every RDBMS is good for data warehousing.
  Scaling on decision-support systems is also very different from scaling OLTP systems. People used to do OLTP will usually say: 100GB isn't big, you should be able to scale way past 100GB with RDBMS X - but fail to consider workload. Your workload is not my workload. Performing the kind of complex data analysis decision support algorithms have to perform on relatively small dbs (in OLTP terms) is bound to kill most RDBMS, because that kind of analysis isn't friendly on OLTP scaling techniques.
  You have solutions for that. Greenplum for instance can handle those kinds of loads, and NoSQL is very popular simply because it has that kind of scaling built-in at no (extra) cost. Only extra development cost, since you have to compensate for all the features you lost vs RDBMS - but developers are already used to compensating at the application level like that.
Every thought of AppEngine by Anonymous Coward · 2010-11-18 09:14 · Score: 0

Let Google worry about it. Pricing is stupid cheap.
Look at your code by Anonymous Coward · 2010-11-18 09:15 · Score: 0

I used to work at a managed service provider and we often had clients complain that the SQL Server was slow or did not scale. 99 times out of 100 the issue was that their code was horribly inefficient. Either it was eating up connections or executing inefficient queries thousands of times more than necessary.
It's often hard to convince the developers that their code is bad, but if you do some profiling, capture the most frequent queries, and show them the results, that may help.
If in fact the code is behaving and you are still having trouble scaling, here are a few hints:
1. See if there is some caching that you can do on the application tier
2. Reorganize and index your data structure to optimize for the queries that you find are inefficient
3. Separate the database logically onto separate servers.
Re:XML Go Diagonal by JustOK · 2010-11-18 09:23 · Score: 1

If it was really good, it would create itself, if it hasn't already.

--
rewriting history since 2109
What company? by MeanMF · 2010-11-18 09:23 · Score: 0, Flamebait

Please post the name of your company so we can learn more about what kind of data you're storing and what kind of issues you are seeing. And so we can avoid using your services until you hire somebody competent. Thanks.
1. Re:What company? by jlusk4 · 2010-11-18 09:51 · Score: 3, Insightful
  
  Geez, you guys. There's a real person behind the question. Do you HAVE to be an asshole?
2. Re:What company? by Anonymous Coward · 2010-11-18 11:35 · Score: 0
  
  Yes.
Wow by mlyle · 2010-11-18 09:24 · Score: 5, Informative

I didn't expect we'd be on Slashdot just yet. I'm Michael Lyle, CTO and cofounder of Translattice.
With regards to the original submitter's question, we'd love to talk to him. How much we can help, of course, depends on the specific scenario he's hitting.
What we've built is an application platform constituted from identical nodes, each containing a geographically decentralized relational database, a distributed (J2EE compatible) application container, and distributed load balancing and management capabilities. Massive relational data is transparently sharded behind the scenes and assigned redundantly to the computing resources in the cluster, and a distributed consensus protocol keeps all of the transactions in flight coherent and provides ACID guarantees. In essence, we allow existing enterprise applications to scale out horizontally while keeping the benefits of the existing programming model for transactional applications, by letting computing resources from throughout an organization combine to run enterprise workloads.
Current stacks are really complicated, multi-vendor, and require extensive integration/custom engineering for each application install. We're striving to create a world where massively performing infrastructure can be built from identical pieces.
1. Re:Wow by Cylix · 2010-11-18 09:39 · Score: 4, Insightful
  
  He posted to slashdot.... do you really think he can afford you?
  
  --
  "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
2. Re:Wow by joib · 2010-11-18 09:49 · Score: 1
  
  So you're claiming ACID; IOW you are saying your system provides consistency as per the definition used in CAP?
  How do you deal with network partitions? That is, per the CAP theorem, if you have C, is your system CA or CP?
  Thanks,
3. Re:Wow by joib · 2010-11-18 09:53 · Score: 1
  
  Replying to myself, TFA contains some info about this. Hey, this is slashdot, who has time to read TFA?
4. Re:Wow by Crimey+McBiggles · 2010-11-18 09:53 · Score: 1, Interesting
  
  The problem with identical pieces, is that in order for them to be interoperable among myriads of applications, they must be very small, and there must be a great number of them. Not one business operates in a manner that is identical to another. If relational databases aren't solving the problem, it is more than likely due to poor data structure. The main difference that NoSQL provides in terms of what is exposed to a novice database administrator, is that NoSQL promotes key-value pairs. This is no different than what exists in a relational database, except that in RDBMS the admin is allowed and often compelled to create tables with multiple fields. More tables with fewer fields is the solution in either case.
  
  --
  Crimey
5. Re:Wow by aclarke · 2010-11-18 09:56 · Score: 1
  
  I think it's hiding behind the giant "I think everything is a conspiracy" badge I just awarded you.
  
  --
  
  www.clarke.ca
6. Re:Wow by Squeebee · 2010-11-18 09:59 · Score: 5, Funny
  
  Congratulations, you just won Slashdot's buzzword bingo, please collect your prize at the cashier window in the back of the hall.
7. Re:Wow by mlyle · 2010-11-18 10:00 · Score: 4, Interesting
  
  The short answer is, CA/CP/AP on a transaction-by-transaction basis depending on application requirements. Also of note: network delay is effectively a special "partition", requiring an engine that can have massive workloads in flight and reconcile/order non-commutative changesets in a distributed fashion.
8. Re:Wow by Anonymous Coward · 2010-11-18 14:32 · Score: 0
  
  Ok, so you work through the CAP theorem to make the system work as a whole. Makes sense to decide which two of C, A & P you give priority based on application needs.
  Now, if we assume that still_sick's issue isn't just an inefficient DB design or inefficient transaction processing (or perhaps a transaction delay / bandwidth limit if the database is distributed) -- all of which are probably more likely, you'll have your work cut out for you demonstrating to still_sick that your solution handles all the things we take for granted in other DB options.
  Nonetheless, I am intrigued as well (and wondering if you've tried or succeeded in patenting an portions).
9. Re:Wow by Anonymous Coward · 2010-11-28 22:42 · Score: 0
  
  Can you use the database stuff without all the J2EE bloat?
Have you tried Perl? by goombah99 · 2010-11-18 09:25 · Score: 1

Perl seems to work well for me. You may want to try it.

--
Some drink at the fountain of knowledge. Others just gargle.
Justification for new toys? by StuartHankins · 2010-11-18 09:25 · Score: 5, Insightful

The post is so vaguely worded, I imagine the author is merely trying to find some justification to purchase some new toys. "See, Slashdot people think this is a good idea!"

I agree with most of the posts so far -- if you're truly hitting a limit, you are most likely doing something wrong. Hire an outside DBA to make recommendations if you don't have the resources in-house. I strongly suspect this is the real issue.
Another Sales Pitch Posing As A ( +1, Helpful ) by Anonymous Coward · 2010-11-18 09:25 · Score: 0

story. It's pretty obvous from "I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases."
Yours In Akademgorodok,
Kilgore Trout
Hire a DBA by knavel · 2010-11-18 09:29 · Score: 1

Most of the time, when someone says they're having trouble scaling their database, it's a case of a developer that has an incorrectly configured database. Installing MySQL is easy, but configuring it is VERY difficult. That's why you need a person with very specialized knowledge to properly configure a database for efficiency or throughput or whatever you're going for in your specific case.
It would be like saying that anyone can go to a hardware store, buy some lumber, and nail them together to create a rudimentary shelter, but if you want a *house*, something that will weather the elements and keep you warm and comfortable and secure, you need to hire a professional carpenter.
We're using MongoDB in production by josef.salyer · 2010-11-18 09:31 · Score: 1

Set up and scaling has been really easy in comparison to similar MySQL clusters I have set up previously.
hbase is an option to NoSQL and Cassandra. by ooglek · 2010-11-18 09:31 · Score: 3, Informative

I recently read that someone moved their large operation from Cassandra to Hbase, a hadoop file system. http://hbase.apache.org/
HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:
Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
Cascading, hive, and pig source and sink modules
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
HBase 0.20 has greatly improved on its predecessors:
No HBase single point of failure
Rolling restart for configuration changes and minor upgrades
Random access performance on par with open source relational databases such as MySQL

--
TossableDigits.com: Temporary Phone Numb
What you should really be doing... by ADRA · 2010-11-18 09:40 · Score: 2, Funny

Is to write better queries, I mean how hard can it be:
select * from (select * from A,B,C,D,E,F,G WHERE A.ID=B.AID(+) AND B.ID=C.BID(+) AND C.ID=D.CID(+) AND D.ID=E.DID(+) AND E.ID=F.EID(+) AND F.ID=G.FID(+) order by F.name ASC) where F.name='zzzzz'
Everything will work out, I swear.

--
Bye!
1. Re:What you should really be doing... by Nerdfest · 2010-11-18 10:17 · Score: 3, Insightful
  
  I think I've seen SQL written by you before. I realize your post is a joke, but I see people aliasing bad table names down to even less readable single letters. It's a maintenance nightmare. Treat SQL like a language and write it so it's readable and maintainable. It even frequently helps when you're trying to resolve performance problems ... they're much easier to spot in well written SQL.
2. Re:What you should really be doing... by ErikZ · 2010-11-18 11:28 · Score: 4, Funny
  
  I could do that, but your tears are delicious.
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
3. Re:What you should really be doing... by Anonymous Coward · 2010-11-20 08:23 · Score: 0
  
  someone has seen MapNocc I see.
Relational DB limitation or app design limitation? by Arawak · 2010-11-18 09:40 · Score: 1

Are you sure you are hitting a limitation of the RDBMS or a limitation in the way your services are built? I'm just a little skeptical that a SaaS startup is already hitting limits with what you "can do with relational databases". How many hundred terabytes are you talking about here?
Usually when I hear this I see a PHP application which hits the database synchronously for every request. Or worse, a Java/Python/Ruby/.NET/whatever application built like it was a PHP app.
Lyle Can Do Anything Better Than You by Anonymous Coward · 2010-11-18 09:43 · Score: 0

http://thedailywtf.com/Articles/Lyle-Can-Do-Anything-Better-Than-You.aspx
*SCNR*, don't take it personally :)
Is this a slashvertisement or so? by guruevi · 2010-11-18 09:45 · Score: 1

What limits are you hitting. And why are you mentioning but one of the many solutions to your problem one which is probably mighty expensive compared to the other solutions.
If you're genuinely hitting a limit, you're doing it wrong. You're probably not Google so most likely you're having issues scaling your proprietary and expensive SQL database (Oracle, MSSQL) but don't want to buy more $10-20k licenses. Most likely you can fix it by simply throwing better and more hardware at it (SSD, more hard drives and RAM) and while you're at it changing to a cheaper database solution (MySQL or PostgreSQL) which you can scale further for less money.

--
Custom electronics and digital signage for your business: www.evcircuits.com
1. Re:Is this a slashvertisement or so? by cheesedog · 2010-11-18 10:34 · Score: 1
  
  Google isn't the only company in the world that has to deal with petabytes of data. It's also not the only company that has to deal with incredibly large volumes of structured data.
  I speak from experience, son. Your relational DB can't handle successful internet-scale loads, no matter how many awesome dbas you hire, and no matter how much money you fork over to Oracle.
MySQL scales just fine. by poptix_work · 2010-11-18 09:50 · Score: 4, Interesting

I work with some very high traffic sites, storing large data sets (100GB+).
Depending on the application (if it allows for different write-only/read-only database configurations) we'll have a master-master replication setup, then a number of slaves hanging off each MySQL master. In front of all of this is haproxy* which performs TCP load balancing between all slaves, and all masters. Slaves that fall behind the master are automatically removed from the pool to ensure that clients receive current data.
This provides:
* Redundancy
* Scaling
* Automatic failover
The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.
(*) http://haproxy.1wt.eu/

--
Just because you disagree doesn't make it offtopic or flamebait.
1. Re:MySQL scales just fine. by cheesedog · 2010-11-18 10:32 · Score: 1, Informative
  
  100GB+ is not a large dataset.
2. Re:MySQL scales just fine. by NoName+Studios · 2010-11-18 10:54 · Score: 1
  
  When you say high traffic, what are the actual numbers? I am curious because I see what you are saying about the number of MySQL boxes for data throughput and replication in your setup. Then I realize we serve several high traffic sites off one MySQL box.
3. Re:MySQL scales just fine. by dkf · 2010-11-18 11:53 · Score: 1
  
  100GB+ is not a large dataset.
  A dataset is large when the quickest way of getting it across the country involves Fedexing a box of harddrives.
  
  --
  "Little does he know, but there is no 'I' in 'Idiot'!"
4. Re:MySQL scales just fine. by clockwise_music · 2010-11-18 12:19 · Score: 1
  
  The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.
  Excellent quote.
  
  Timothy, you're asking the wrong question. "Is anyone using this system in production?" bzzzz, wrong. The correct question is "What systems are people _using_ in production?"
5. Re:MySQL scales just fine. by poptix_work · 2010-11-18 12:20 · Score: 1
  
  By 'high traffic' I mean sites pushing in excess of 100gbit/s. The sites could function fine with 2x Dell PER710 (Quad-core Xeon E5520 2.266GHz 16.00GB RAM 6x SAS 147gb PERC6/i), but we require redundancy and failover capacity.
  
  --
  Just because you disagree doesn't make it offtopic or flamebait.
6. Re:MySQL scales just fine. by Anonymous Coward · 2010-11-19 04:35 · Score: 0
  
  I work with some very high traffic sites, storing large data sets (100GB+).
  Sorry, 100GB+ isn't considered large data sets these days.
  anyways back on topic:
  I have seen some very interesting NoSQL setups. but they all are for cases when people are willing to give up something for speed. For a lot of these massive data sets (hundreds of TB and beyond) that need sub 5 millisecond response time to hundreds of thousands request a second, "eventually consistent" or "mostly right" might be a trade off they are willing to make.
Look at columnar... by Anonymous Coward · 2010-11-18 09:53 · Score: 0

depending on what specifically you're trying to do, it may be the way to go.
Look at ParAccel.
FIFY by Anonymous Coward · 2010-11-18 09:53 · Score: 1, Insightful

"I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what [OUR OUTSOURCED INDIAN DEVELOPERS] can do with relational databases.
FIXED IT FOR YOU MY PRETTY LITTLE OPERATIONS MANAGER. (Just using all caps to make you feel more at home)
MS SQL Server has Horizontal Partitioning by danparker276 · 2010-11-18 10:06 · Score: 1

Save the headaches and just use SQL Server
1. Re:MS SQL Server has Horizontal Partitioning by lakeland · 2010-11-18 10:49 · Score: 1
  
  So does ever other database nowadays, even MySQL.
2. Re:MS SQL Server has Horizontal Partitioning by Anonymous Coward · 2010-11-18 21:39 · Score: 0
  
  Before you scale horizontally, buy some SSDs.
Voltdb by Anonymous Coward · 2010-11-18 10:21 · Score: 1, Informative

Have you looked at voltdb ? http://www.voltdb.com .
My 2 cents.
1. Re:Voltdb by Anonymous Coward · 2010-11-18 10:49 · Score: 0
  
  From what I read there are limits to the size of your DB
InfiniDB? by Anonymous Coward · 2010-11-18 10:27 · Score: 0

Depending on your intended application this may help: http://www.infinidb.org/
Losers mock by cheesedog · 2010-11-18 10:31 · Score: 0, Flamebait

You guys posting that traditional relational databases can handle the load of internet scale applications kill me. You mock this guy who has a legit problem that everyone who has ever run an internet scale technology is very familiar with.
NoSQL isn't some passing fad invented by high school kids.
Luckily, most of you will probably never discover that fact for yourselves, because you'll never have experience with a successful internet-scale architecture. Relational DBs are just fine for internal "enterprisey" apps, or for your hobby website that drives an astounding 1200 page views/month, or for your failed attempt at launching a web service that only ever garners 300,000 users, so you can continue to delude yourselves that there just isn't a problem here, and SQL is the only skillset you'll ever need.
For the elite few who actually achieve success, you'll totally know where the OP is coming from. Intimately. And you'll either be very glad that there is a path (hadoop, cassandra, mongodb, etc) to migrate to that solves your problems, or you'll be very glad that you started with one of those solutions in the first place.
fast and extremely scalable by bhcompy · 2010-11-18 10:42 · Score: 1

The fastest DB I've ever used is based on PICK OS/DB. Reality is the retail name for it now(essentially an emulator with an API for *nix/Windows). The military used it for inventory tracking and various companies still use it today for a great deal of things. ADP uses it for extremely large databases with tons of history for accounting, financials, inventory, etc. Even very old systems with 20+ years of data are very responsive/quick(these systems are running Digital Unix 4 with Alpha processors) Pick/Reality is a hashfile oriented multivalue database. Wikipedia has a pretty good explanation and I believe Northgate Systems markets Reality today
HandlerSocket by Anonymous Coward · 2010-11-18 10:52 · Score: 0

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL
Before migrating to NoSQL... by DiogoBiazus · 2010-11-18 10:56 · Score: 1

Take a look at SQL alternatives. There's an interesting PostgreSQL use case which uses only open source tools to achieve a good horizontal scaling solution. This post tells a little about how they did it: http://highscalability.com/skype-plans-postgresql-scale-1-billion-users The post also says that there is a very similar approach using MySQL.
TokuDB by keean · 2010-11-18 10:56 · Score: 1

Have you looked at TokuDB for MySQL (http://tokutek.com)
You're hitting the 4th normal form limit by crovira · 2010-11-18 11:18 · Score: 1

You're going to stay stuck there (and getting progressively worse) until the designers of your database start to implement 5th normal form.
That means taking into account the relationships between data elements and implementing them as something other than aggregated tuples.
The aggregation problem is getting worse as you try to implement new relationships.

--
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
General answer to a vague question by Scin · 2010-11-18 11:22 · Score: 1

The real question is *what* you are doing with relational databases and what are you trying to accomplish. There is rarely a magic new buzzword or hip new thing that magically solves all your problems.
Are you doing important financial calculations, or anything that requires ACID? If the answer is yes, then you may want to investigate sharding and figure out ways to safely split your data amongst multiple databases while still insuring that you can do a transaction on one system for important situations.
If you are serving up read heavy web content that doesn't need any fancy transactions or specific versions you can easily set up an intermediary cache with something like memcache, or with many of the NoSQL storage systems available (which typically act like memcache with disk backing and persistence). This might mean that you have your primary source in a SQL database or perhaps just the portions that require ACID and then you periodically sync changes from your SQL to memcache or your NoSQL system.
If you are doing lots of writes you may want to consider a system of local storage per node, whether it be SQL, NoSQL or something else and then aggregate and synchronize that at a later point for reporting or some sort of centralized usage.
SQL and NoSQL are just tools, not a religious philosophy. If you need to screw in a screw, don't use a hammer. The same goes for using or not using a relational database, I find that most business need a relational database somewhere in their technology stack, this is because businesses ultimately deal with things that relate to making money... and dealing with money properly often requires ACID features and transactions.
Lastly, remember that disks are slow and memory is fast... sounds silly but people seem to forget where their data is coming from and bottleneck the disk on just about any web application that is slow.
Ouch... by Anonymous Coward · 2010-11-18 11:32 · Score: 0

Luckily, we won't become so condescending, I hope ;-)
Really, NoSQL, MongoDB, Cassandra sounds neat. However, I've yet to encounter a design where they are truly superior. However, they would be neat to use in the right project..
Answer... by TheSync · 2010-11-18 11:35 · Score: 1

MSSQL...Oracle....
Terracotta? by Anonymous Coward · 2010-11-18 11:51 · Score: 0

I've been working on a highly scalable system where we have managed to alleviate the load on the database to near zero and hence save some big bucks on Oracle licensing.
The application is java based using an ORM (hibernate) with L2 caching. The cache is distributed using an open source technology called terracotta. The performance we have achieved is through the roof.
www.terracotta.org
build it custom by MichaelKristopeit191 · 2010-11-18 12:03 · Score: 1

look at the specific requirements of your systems and build a custom solution.
The question is so uselessly phrased by obarthelemy · 2010-11-18 12:16 · Score: 1

that I have trouble imagining you have any kind of skill for tackling the issue.
You do realize you give us ZERO info on what the problem is, but do push a very specific (if not fringe) approach to your non-question ?
If your problem-solving skills are on a par with your problem-describing skills, you're in for hard times.

--
The Cloud - because you don't care if your apps and data are up in the air.
CRAP theorm explained... by Anonymous Coward · 2010-11-18 12:23 · Score: 0

Relationahl databases do not scale when the person(s) working schema and systems design have not put sufficient effort into resolving the underlying flow/concurrency hotspots inherent in their design.
At the end of the day design trumps selection of underlying data management technology when viewed at scale.
Plenty of that around here (not all here though) by Anonymous Coward · 2010-11-18 12:52 · Score: 0

See subject. Like other sites, there are a-holes around here with nothing better to do than harass or bother others, just because their lives are f'd up, so they want others' to be as well. (This doesn't go for all /.'ers, there are some truly nice folks here and greatly informative posters too, but again there's also what you noticed: a-holes. Most of them are probably hormonally imbalanced teens I'd guess, but others are just losers that want others to feel as they do, miserable. Hilariously, that kind always wonders why their lives are so messed up. If they could only look at their negative attitude from someone else's view of them, they'd understand then I think). Anyways, if the poster of the original article reads this, I am certain he will understand, that this site, like any other? Has its share of permanently f'd up a-holes that post here regularly to try to harass and annoy others.
No offence intended, but... by Anonymous Coward · 2010-11-18 12:57 · Score: 0

SeriouslyI mean I’m used to the EDITORSnot reading the articles, but at least the SUBMITTERS used to read them, then this one said "Doesn't this violate the CAP theorem?" Hmm, in the first page of the blog, it sayd "... It's a distributed relational SQL database that supports eventual consistency..." Eventual Consistency means NOT CAP.. sheesh.
Argh-stories by Steeltoe · 2010-11-18 13:03 · Score: 1

Heh, just wait until you try SugarCRM.. Reading your post made me realize there are other projects out there with the _exact_ same flaws / annoyances as Sugar, love it or hate it.
I'm sure Sugar is nice if you have the 6-12 months pouring over all the code and making design for new modules and layout / DB. But for _efficiency_ it's a mirage in the desert of hopeless "open source" projects, which in reality are paywalled spagghetti monsters.
Same with Typo3.
Don't get me wrong. It CAN work. I've been on successful projects with these. Just don't underestimate the complexity of already-written code, especially when they require you to commit to SO MUCH.. I mean who else uses Typoscript or that internal MVC templating engine? Yeah right, nobody. Guess why...
It's a nightmare.

--
http://www.debunkingskeptics.com/
This is important... by Anonymous Coward · 2010-11-18 13:05 · Score: 0

This has struck me numerous times of late. One of the problems of SQL culture and conflict is the idea that schemas are defined by a separate caste of user and/or that they are fixed.
There are application domains where end users need to be involved in schema creation/maintenance, and many SQL methodologies have failed here. Too many applications and/or "mapping layers" do not grant schema management rights to end users, and people go off to badly reinvent RDBMS when they really ought to just take the RDBMS out of its glass room.
If you use RDBMS properly, at any point during the "runtime" lifecycle of an application where there is a need for a new type of relation, you just create a new table! There is no reason this should be more difficult than inserting rows into existing tables, except traditionally narrow-minded programming tools.
Rick Cattell's work on scalable datastores by MoxFulder · 2010-11-18 13:19 · Score: 5, Informative

I recently came across Rick Cattell's site which addresses just the questions you're asking.
Rick Cattell has written an excellent comparison guide of horizontally scalable datastores of different types (RDBMS as well as a variety of NoSQL systems).
Cattell has also written an academic paper with database expert Mike Stonebraker, which weighs the system design factors required to make a datastore scalable.
Executive summary of Cattell's work: although NoSQL may be a huge fad, the things that make a datastore scalable can be implemented in SQL RDBMS systems as well. Also, implementing do-it-yourself ACID in NoSQL systems is extremely difficult and error-prone, and is a significant advantage of most RDBMS systems. Stonebraker is the author of VoltDB, which is an open-source RDBMS designed for horizontal scalability, but they give a very fair and thorough look at competing datastores as well.

--
My bicyles
1. Re:Rick Cattell's work on scalable datastores by ThePhilips · 2010-11-18 23:37 · Score: 1
  
  the things that make a datastore scalable can be implemented in SQL RDBMS systems as well.
  
  CACM paper is interesting read, though unsurprisingly it starts with "shared-nothing scalability" and later on goes onto "avoid mulit-node operations".
  So what is the point of using an ACID capable RDBMS in horizontally scaled deployment when it is literally advised (if not forbidden due to "shared-nothing") to not to take advantages of the ACID?
  I'm tad bit disappointed, since from the parent post description, I have expected to see a mention of major breakthrough in distributed transaction processing performance: an atomic commit across several nodes in the cluster which doesn't suck. But all I see there between the lines is the old "don't do it."
  
  --
  All hope abandon ye who enter here.
Greenplum by Anonymous Coward · 2010-11-18 13:22 · Score: 0

What about something like Greenplum (http://www.greenplum.com)? It is supposed to be able to optimize a query (optimize...I hate that word) and then execute it in parallel among multiple machines.
Me too by Steeltoe · 2010-11-18 13:25 · Score: 1

If people are getting inconsistencies, they're not following good CS practices in the first place. Any proper course on the subject will teach algorithms and database-design, among other subjects that alleviate problems before they arise. This is nothing new, but has been maturing for the last 30-40 years.
I don't think people will automatically "get it", by just doing an exam on the subject. However, respect and a bit of dedication for such knowledge is required, to further refine one's skills.
I believe it's because people are _not_ trained properly, or their brains are unable to comprehend sound principles, they're getting inconsistencies in the first place. That, or the tight timelines which are expected to become tighter and tighter, until people are out of jobs..
"NoSQL"-practices has a place too, but not for most projects. If you're using NoSQL to "save time" not thinking about schemas, you're most probably inviting a world of hurt later, when inconsistencies, vertical scaling and immature support/reporting-tools catch up to you. The more value of the information in the system, the more hurt.
Of course, information on Facebook and its equivalent has almost 0 value, and don't really need to be that consistent or up-to-date. Although it has interesting scaling challenges, facebook is hardly the standard to meet for most serious projects out there.
This is coming from someone who has reviewed numerous "nosql" solutions, but not yet found something more compelling than RDBMs for general projects. I would very much like to use them for "fun" however, but seem unable to give up on so many sound practices, just to squeeze a bit more juice out of the system. Often, I find parallellizing the task gets the job done just as quick, with a "relational" solution, and with less headaches and support-nightmares later..
Just being able to use some mature support-tools is enough to make my decision. Hopefully, "nosql" will mature and become a viable alternative. Right tool for the right job and all that..

--
http://www.debunkingskeptics.com/
KX Systems by Anonymous Coward · 2010-11-18 15:14 · Score: 0

KDB+ - http://kx.com/ will handle it.
Application dependent. by fishbowl · 2010-11-18 15:34 · Score: 1

There's no universal general purpose answer to your question, especially not with the level of detail you have posted. To suggest otherwise would be irresponsible.
For some applications, db scaling is easy. For others it may require some enormously complicated considerations about things like indexing and transactions.

--
-fb Everything not expressly forbidden is now mandatory.
Re:Relational stuff scales - not around the world! by mikehoskins · 2010-11-18 15:54 · Score: 2, Interesting

Can you shard the same SQL data store in Chicago, London, and Tokyo? Not with standard SQL databases, unless you write your own complicated replication techniques or pay through the nose. (See CAP Theorem).
Yes, the company I work for has expressed the world-wide SQL database need, so this is not just a thought experiment.
Have you heard of GemFire/GemStone, VoltDB, or Xeround?
If you can get rid of the SQL requirement, try
XML (or other format) on Amazon's S3
or try one of the NoSQL databases, such as MongoDB, Riak, or CouchDB.
All of the above scale horizontally, most even scale in a geographically diverse environment.
It's true, SQL is getting old, time to re-think by Anonymous Coward · 2010-11-18 16:00 · Score: 0

Yeah, SQL will show limits at some point. You'll hear otherwise from people who think it'll work for you because there's about 1000x as many people who are experts in SQL vs. NoSQL. I am an expert in NoSQL, and suggest you look at MarkLogic, MongoDB or something, depending on if you want a document store, key/value store or what. Be aware that your architecture and lessons learned will be different, so some consulting will be a good idea.
In particular, development and CPU costs of all that relational mapping just doesn't support all business processes anymore.
Let's put it this way -- Facebook and Google ain't runnin' on Oracle.
1. Re:It's true, SQL is getting old, time to re-think by fishbowl · 2010-11-18 19:09 · Score: 1
  
  >Let's put it this way -- Facebook and Google ain't runnin' on Oracle.
  Last time I checked, Facebook was still running on MySQL with Cassandra, where Cassandra's role is basically a dimension cache. Considering their recent growth spurt that may have changed.
  Google BigTable/GFS is a poor fit for a lot of applications.
  What I really want to know is what Google uses internally for HR, Accounting, and facilities management software (and questions like that seem far more relevant to the typical small business -- your startup does NOT have the web traffic or DB write load of Google or Facebook, sorry).
  
  --
  -fb Everything not expressly forbidden is now mandatory.
Stonebreaker's VoltDB (new & open source) by geoffrobinson · 2010-11-18 17:14 · Score: 1

http://voltdb.com/
This database project tries to eliminate a lot of the bottlenecks that cause poor scalability. Here is a talk Stonebreaker gave (requires registration): http://voltdb.com/content/mike-stonebraker-sql-urban-myths-webinar-recording
From what I know of it, it puts everything into memory and uses a cluster to distribute the load. It's more interesting my description, but people should really look up the product.

--
Except for ending slavery, the Nazis, communism, & securing American independence, war has never solved anything.
Re:Relational DB limitation or app design limitati by fishbowl · 2010-11-18 18:54 · Score: 1

>I'm just a little skeptical that a SaaS startup is already hitting limits with what you "can do with relational
>databases".
Or if they are, then it will be relatively easy to find people willing to help out with the cash situation.

--
-fb Everything not expressly forbidden is now mandatory.
Use brains not products by Anonymous Coward · 2010-11-18 19:25 · Score: 0

If you really use the right hardware, the right database server parameters, the right indexes, the right queries and the right isolation level, you shouldn't care about performance. You should try to read database performance books and the apply the knowledge.
Anyway, have you tried to use any cache library for your software?
Re:XML Go Diagonal by davester666 · 2010-11-18 20:30 · Score: 1

XML is THIS CLOSE to becoming sentient...

--
Sleep your way to a whiter smile...date a dentist!
Ebay isn't Internet scale? by butlerm · 2010-11-18 20:44 · Score: 1

I guess if you are building a business bigger than eBay, then relational databases may not do the trick anymore. If you lack imagination anyway.
See here for more information. And eBay is not the only one. I wouldn't put mission critical data on a garden variety noSQL database unless I really really hated my customers and planned to go out of business fast.
Garden variety NoSQL is great for data you can mostly throw away and no one is going to sue you as a result. Facebook, Twitter, Google. Perhaps not so much for financial transactions. If the knowledge leaked out that any bank was using a typical not-particularly consistent, nor particularly durable NoSQL system for transaction data, a run on the bank would soon ensue, at least from anyone advised by anyone who had a clue.
1. Re:Ebay isn't Internet scale? by cheesedog · 2010-11-19 06:05 · Score: 1
  
  eBay's multi-petabyte relational database is massively sharded. Once you partition, you lose a most of your "relational" advantages anyway. This is how virtually all "large databases" end up, and once you go there, you are essentially nosql anyway (but with a much hairier and harder-to-incrementally-scale architecture).
  It's for exactly this reason that smart players recognize that a traditional relational database approach doesn't really buy you anything in the eventual case when you need to scale. It's why google/facebook/et al have pioneered nosql approaches, and it's why Amazon uses Dynamo for their shopping cart app (and many others) instead of oracle/sqlserver/postgres.
2. Re:Ebay isn't Internet scale? by butlerm · 2010-11-19 07:30 · Score: 1
  
  It depends on your data. If you don't really care about ACID all that much, or are willing to do it manually, or are storing immense quantities of largely read only data then "NoSQL" is fine. There are a number of areas where it is not fine.
  Serious relational database have distributed transaction coordination so you can atomically commit transactions on multiple databases. When "NoSQL" databases add that sort of thing then perhaps accounting or ERP system developers might be able to use them without committing professional suicide.
  To say nothing of the vastly superior query capability of nearly any relational database, which are reason #1 why most large line of business systems cannot be written (or rewritten) using NoSQL without more than doubling the development effort required. Nobody is going to write something like PeopleSoft or SAP using a "NoSQL" database anytime soon.
  What is much more likely to happen is that horizontally scaling relational databases will become much more common, and much more cost effective. There isn't _any_ principled reason why a relational database cannot be designed to run just as effectively as any NoSQL database, under similar constraints. It is just that people like Oracle and IBM currently want to charge the GDP of a small country to do so.
SSDs are your friend. by Anonymous Coward · 2010-11-18 21:15 · Score: 0

Buy some SSDs. OCZ's RevoDrive X2 can do 120k IOPS for just $3-4/G. RAID a few of them together for 240k IOPS, which will fix most any bottleneck in your your database. There are very few applications that need that much performance.
Re:XML Go Diagonal by JustOK · 2010-11-18 22:05 · Score: 1

it's still just a parsing fancy

--
rewriting history since 2109
DB2 9.8 pureScale by Anonymous Coward · 2010-11-18 22:34 · Score: 0

If you want horizontal scalability with near linear progression, take a look at DB2 pureScale. This scales (in the first release) to 128 member nodes, and the demonstrations at this level of scaling are very impressive. Oh, and you don't need to keep making changes to your application to get this level of scaling - unlike Oracle RAC (which can't really climb much past 8 nodes anyway without dying a horrible death from the law of diminishing returns).
You can have both No+SQL - though only GPL by Anonymous Coward · 2010-11-18 23:30 · Score: 0

I dunno what Translattice does - no source available.
The infrastucture description on its website looks kind of simillar to Askemos.org
(which is GPLed).
Recently I wrote this note, which shall become kind of a tutorial.
I hope it will give you an idea, whether ot not this could be for you:
using SQL
(So far I'm working in two environments with it: a) public (cutomer+ours) websites run from a mixture of Linux and FreeBSD on mixture of 32bit, 64bit Intel and ARM; and b) a node 5 Segate Docstar network in a suitcase. And without breaking CAP: 3/2n+1 nodes required.)
Oh, that's easy... by Robert+Zenz · 2010-11-18 23:59 · Score: 1

I think you should use Mauve, it has the most RAM.
Startups should probably scale vertically by lastrogue · 2010-11-19 02:38 · Score: 1

If you're just a start up I would suggest that, without me knowing the size of your client base using these databases, you should probably use a vertical scaling database situation. If you're using application servers or web servers that connect to that SQL database. you would want those application servers horizontally scaled in a load balanced environment. But in every scenario I can think of, horizontally scaling a Database would have a detrimental effect on performance.
RAM and DISK by mr_java66 · 2010-11-19 03:43 · Score: 0

Have you tried more RAM? How about 15 Disk arrays. I take care of 10 TB (yes Tera) with good old SQLSERVER 2005 on 6 machines. 4 Million books on 25 market places means 50,000,000 prices/listings that must be worked on constantly. Full/Diff Backups every day, tran logs every 15 minutes. Oh, and hardware, even dell has stuff >twice as strong as what I'm using. sure >$100K in licenses, but don't for 1 second try to tell me about LIMITS. go double digit on your T, reprocess most of it every couple of days, and then talk to me about how big you is.
It's all about provisioning by utoddl · 2010-11-19 08:11 · Score: 1

I think he's talking about a different problem altogether.
We have about 40,000 people, any one of which may want a web site. They've all got storage in our campus storage. Provisioning our web servers for all those people is just a matter of
<Directory "/home/*/public_html">
AllowOverride [some options]
Options [some other options] ....
</Directory>
All they have to do is put some html pages in place.
Now, if some subset of those users wants to put a MySQL backed blog or some other low traffic app in their html space, they've got to stand up their own MySQL, or talk to a dba and do a lot of hand holding. This doesn't scale horizontally -- lots of users doing basically the same thing. You can't say
<Directory "/home/*/public_dbspace">
[appropriate database defaults]
AllowOverride [some other defaults] ...
</Directory>
and if the users put the right stuff in their ~/public_dbspace, they get data base service. We're not talking about high performance or large data. We're talking about large number of mostly very small users being provisioned with very little intervention on our part.
If you think about it, web servers and data bases _in_this_application_ do the same thing: they provide highly specialized interfaces to data in specifically provided files. There's no inherent reason providing the M to a LAMP stack should be any harder than the A, but configuration for the masses is clearly easier for the latter.
Mint.com talk by Anonymous Coward · 2010-11-19 09:15 · Score: 0

I recently attended an excellent mint.com talk about scaling beyond 2,000,000 users. Their setup: apache/spring/hibernate with mysql backend.
They engaged in some interesting tweats to get the multiple MySQL boxen to cope with the load but they basically come down to 'customised sharding'. Each new users data (all of it) was set to store on a random (I think) mysql box in the cluster. When the user logged in, they were connected to the correct mysql db (on one of many mysql servers) and their data was retrieved and loaded fast using some tricks with varchar keys (apparently, in mysql, they load all together in concurrent blocks).
This provides sufficient data access speed for 2,000,000 users. I've not got the number of mysql servers handy but from memory it was less than 20.
The talk was very impressive and heartening as we also deploy similar architecture (except we love Postgresql instead of mysql) and would love to have this particular user problem.