Digg Says Yes To NoSQL Cassandra DB, Bye To MySQL
donadony writes "After twitter, now it's Digg who's decided to replace MySQL and most of their infrastructure components and move away from LAMP to another architecture called NoSQL that is based in Cassandra, an open source project that develops a highly scalable second-generation distributed database. Cassandra was open sourced by Facebook in 2008 and is licensed under the Apache License. The reason for this move, as explained by Digg, is the increasing difficulty of building a high-performance, write-intensive application on a data set that is growing quickly, with no end in sight. This growth has forced them into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead."
Cassandra is basically a sloppy implementation of UniVerse and elated products. Why sloppy? Because the idea of a separate file access for each column sucks - use a union or struct as necessary, people!
In other news, Cassandra developers are celebrating the fact that their database is now used to store the largest amount of worthless information in history.
Negative moral value of force outweighs the positive value of good intentions.
Reddit also recently switched to Cassandra.
I imagine with the continual growth of these social networks, high performance DB methodologies will experience tremendous growth, and perhaps even paradigm shifts in the way we logically think and design database architectures. Instead of this flat 2D table mentality, imagine n-dimensional matrices of data, scaling dimensions instead of table and rowcounts.
I bet if you converted Facebook to this n-dimensional 'table' model, and did a couple inner-joins and unions, you could rip space-time wide-open!
'We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.' RPF
Or away from MySQL? There is a difference.
From the Digg blog - http://about.digg.com/node/564
"And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP."
Cassandra Linux Apache PHP?
creation science book
This sad thing is that Monty's MySQL fan boys will blame this on Oracle when in reality the move to Cassandra (or other NoSQL databases) is what a lot of web sites should be doing regardless of who holds the MySQL reins.
Why not lighttpd?
and why wouldn't a relational database system not be perfect for facebook?
If you mod me down, I will become more powerful than you can imagine....
I too have a site running on MySQL and I am thinking of switching.
Can anyone tell me if there is any "comparison chart" listing the various features / usability of the various OSS DB packages available so I can make a better educated decision?
Please help !
Thank you !
Muchas Gracias, Señor Edward Snowden !
100% of hosting companies do not have twitter, facebook, reddit, or digg as their clients. Its a different market. Mysql does have a competitor in this space called PostgreSQL. Its pretty good. Pretty much every hosting company I would consider doing business with also offers it. But again, PostgreSQL wouldn't have saved the day for these companies, they've reached a different sector of the market due to their enormous scale.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Will Slashdot switch?
You couldn't even be bothered to read up on what ANT actually was, could you...
"Ant is a Java-based build tool. In theory, it is kind of like Make, without Make's wrinkles and with the full portability of pure Java code."
I know exactly what ant is. I use it on a regular basis. I was pointing out that cassandra uses Java to at least some extent, which is disgusting (which is proven by the fact that jdk is part of the dependencies for it with apt).
Well, I don't know too many people who program in C and use Ant. And a glance at the FAQ implies it's Java-based (it talks about the JVM a bit).
I guess Cassandra just isn't really targeted at the market segment where the overhead of a JVM would make much of a difference, even if it would make redundancy easier.
The World Wide Web is dying. Soon, we shall have only the Internet.
MongoDB is another "NoSQL" solution. You can still have LAMP. I think they do a disservice to the LAMP stack when lumping it in with their issues with MySQL. (unless of course they really are getting rid of Linux, Apache and PHP too.
So what's the advantage of switching?
I have a policy of if it ain't broke don't fix it
These slides present a balanced and comprehensive overview of the current state of free databases. Whether you're in the NoSQL camp or not, they're worth reading.
That said, here's my take:
It's currently fashionable to replace MySQL with some "NoSQL" database or other. This trend is driven by two factors:
I haven't seen any consideration from potential "NoSQL" adopters of the benefits of using a good relational database like PostgreSQL. There's a world of difference between it and MySQL, and condemning all relational database systems because of bad experiences with MySQL is like condemning all sandwiches because McDonalds once made you sick. In giving up RDBMSes entirely, these developers lose quite a bit of safety, flexibility, an convenience. It's a huge over-reaction.
This field should not be about following trends, though unfortunately, that's how most people choose which technologies to use: it should be about choosing the best tool for the job. And I believe that in the vast majority of cases, the advantages conferred by a relational system --- enforced integrity, interoperability based on SQL, query flexibility, storage flexibility --- make an RDBMs the best choice for almost any job. If you need sloppier semantics for some cases (for example, "eventual consistency"), you can layer that on top of a robust RDBMs.
MySQL has never been a good example of a relational database, the underlying implementation is limited. Its MySQL that is the problem here, not relational databases.
I suspect here that it is not the relational model at fault here, but the lack of creativity and competence in implementing a relational database technology. MySQL perhaps has never been a particularly scalable platform, it has a number of severe limitation and does not seem to be designed with a lot of thought for a distributed environment. Its developers seem to have developed it for small scale webpages, and have been notorius on leaving out many advanced features, and thus have limited its effectiveness to small, low powered pages.
Its all in implementation, its not the relational database model that needs fixing, it is the underlying implementations.
Don't be too quick to put Java down.. it's slower but it scales fairly well.
On a related note, Reddit's performance and reliability has dropped off significantly since switching to Amazon's "Cloud", and dropped off even further after this switch to Cassandra.
The constant 503 errors, plus horrendous load times when it does manage to work, have driven me and many others away from Reddit. That's why I'm posting here on Slashdot.
Cloud hosting is a stupid idea for anything beyond a blog getting 10 hits per date. All the talk about scalability is pure bunk. I mean, even with the extensive knowledge and infrastructure of Amazon, the Reddit site is slow (and it wasn't like that before they switched).
But since most developers model their domains Object Oriented, why is MySql the default choice for any small application? Why not a document database or a native oo one?
If you're trying to run a site on a $15/month hosting account, then no, this is probably not for you. But if you're at the stage where MySQL isn't able to handle all the data you're throwing at it, then chances are you won't care about the extra few MB of memory that the Java runtime requires.
> But if you're at the stage where MySQL isn't able to handle all the data ...it's time to move up to PostgreSQL.
> you're throwing at it...
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
The relational model is consistent and easy to work with. It's easy to specify constraints that describe what the data should look like, and to allow several applications to interact with the data. It's also easier to optimize a database when you can describe discrete queries instead of directly following links from program code as you would in a navigational/object/document/etc. database.
Furthermore, application data models aren't all that object-oriented. Most of the time, the manipulated data types (say, "story", "post", and "user") fall into well-defined categories that correspond well to rows in a table. The few mismatches are easily dealt with in application code.
Sure, using an object database might be "easier" for the first 15 minutes, but you'll kick yourself when you have to manipulate it in any kind of sophisticated fashion.
/etc/init.d/cassandra stop /etc/init.d/cassandra start
free -m
-/+ buffers/cache: 213 1259
free -m
-/+ buffers/cache: 308 1164
Note that memory usage increases by 100MB, and that's immediately after installing it.
Sites with such large volume could easily benefit from a low memory usage configuration.
This isn't your grandfather's JVM.
These days, Java is quite fast and efficient, and there are even a lot of different alternative VMs you can try. Sure, startup time isn't the best, and Swing is still a lumbering, over-engineered, ill-fitting albatross: but these problems don't matter for server applications.
IMHO, the best part is that you can write programs that run on the JVM in a dialect of Lisp and interact seamlessly with other code on the JVM.
And even after that stop it left 100 tasks in 2 processes running jsvc. This on a server that used to only have 50 processes at a time (with both apache and mysql running with about 10 processes each).
Am I the only one who frowns at this moniker?
First, it creates a false premise where people need to pick "SQL" versus "no SQL", while many real-world systems intelligently combine relational and non-relational data storage for their needs. There is no conflict.
Second, there's nothing wrong with SQL as a language in particular, and in fact many of the "noSQL" engines are starting to support and extending basic SQL queries, instead of reinventing their own query language for the same purpose.
I suppose "lessRDBMSabuse" was less catchy...
There is this thing, it's called archiving. Sounds like another example of software developers pretending to be DBA's, if you ask me.
Bullshit. Languages don't scale: programs do.
Writing a program in Java makes is scalable in the same way that painting a car red makes it fast. The JVM is quite good these days, but don't make up advantages that don't exist.
1999 called. They want their bitching about Java back.
so does ASP.net and C#.
Totally!!! It scales from slow to glacial with no effort at all!!!
I'm sorry, but Java still doesn't compare to C, and those differences *especially* apply to high load server applications.
You'll have to live with it. "It scales", and it scales great in "the cloud". I know, I cringe too.
Java the language isn't scalable on it's own.. there's no magic scaling technology built into the jvm.. but the general Java "culture" tends to (in my opinion) achieve at least medium scalability.
When judging a language, you _have_ to look at the culture around it. These days nothing is 100% custom build.. a sizable project is going to import a wide variety of 3'rd party libraries. The general attitude of the community is going to determine how suitable these libraries are for whatever scale you will be using them at.
Same as how languages like perl on their own don't produce unmaintainable code.. it's the perl "write once, read never" culture that leads to so much unreadable code.
---
Databases Feed @ Feed Distiller
No DBM will save the day if you aren't very careful about how you design your usage of very large databases.
In the case of SQL, that would be things like Schema, choice of Index columns, views, stored procs, joins, SQL statement contents.
In some cases, the performance of a SQL statement can be horrible, but can be rewritten in a different way to answer the same question but have stellar performance.
Not necessarily true - simple things like the standard algorithm for matrix multiplication could easily scale if most matrix operations are done on explicit and well defined matrices rather than by someone who re-invented the wheel. True, we should distinguish between the language and the library (it seems more of the library is built into Java than with C, but that is natural as Java is a much younger language). Still there are certain tendencies that one develops in programming in different languages that can alter performance and even internally, and of course the nature of garbage collection can always affect things.
In my experiences developing applications in both the business and gaming industries, most applications beyond a simple cookbook app/crappy blog are highly object oriented. How else can you explain the wealth of approaches like ORM mappers, the repository and active record patterns, etc ? They are just patches on the relational model to make them friendly to application code. If your domain objects are consistently flat, you are probably doing something wrong. I for one do not want to use an API with Address1 - Address5 string properties. What you just listed as story, post, etc are all just objects, usually with nesting. Relational databases suck at dealing with complex object hierarchies, hence all the joins just because object A has a collection of object Bs which contain an object C.
Can you please define what a sophisticated fashion means? Unless you are a DBA and love SQL/config work, it is far easier to write constraints using an object database. You simply use the same validation and rules you should already be using in your application. If you rely on your database along to enforce things like required fields, atomicity, etc, then you have failed at creating a good application and likely are ripe for exploits, security holes, bad data, etc anyway. It is true that relational DBs provide certain easy facilities, but any decent Object Database provides most if not more of these same constructs in another form through its API. For instance, most object databases I have used provide some sort of transactional data structure that supports far more types of locking and concurrency/conflict management than any relational DBs I have ever seen. Further, since most object databases are defined and consumed in the languages you develop against with them, the sophistication is limited to the language. I'd say you can do a lot more in Smalltalk than SQL for instance.
If you're referring to querying, apparently you've never queried in Smalltalk, C# with LINQ, LISP, or even just using lambdas in python or ruby. Querying using the actual object is typically far easier than writing a SQL query. These days it is becoming increasingly rare that someone rolls all their own queries in your average app anyway (see ORMs). You'll often end up with something like an ORM translating some things from the UI into a boat load of queries, then you'll have to go and find fixes for the ORM to avoid making the application grind to a halt due to all the chatter. Although a lot of that is often the function of UI elements, ultimately there is a lot of overhead created by patching the relational and object disconnect.
I am wondering how you think going from relational back to objects, even flat ones is somehow easier and more consistent. You're adding an extra language, more layers, and more configuration/management for what gain? Object databases hold records for things like throughput for transactions, data population, etc. The performance thing is a myth of the past. I'd say the stumbling block if anything is simply bad developers. An RDBMS does add some what of an idiot proof layer, but really in the end you just end up with even crappier code in other spots.
Finally, you mention that discrete queries are easier to optimize. I again must disagree. If you want discrete queries, you could describe each query on an object with another object. This is exactly what any good developer should be doing with an ORM anyway. For instance, you could use the specification pattern with the repository pattern to describe and issue your queries, object db or rdbms. Secondly, instead of some crappy tools from the maker of the RDBMS, using an object DB I now have the full facilities of the language to do performance optimization, profiling, logging, etc rather than what a vendor provides. MSSQL provides some great tools for example, but most other DBs while nice implementation wise, provide horrific tool chains.
It is true there are some problems an RDBMS is good for, but your post comes off like someone who has never really use
Thanks for the comprehensive reply.
ORMs are syntactic sugar for the underlying database operations. It's possible to bypass them when you need SQL's full power and access the same data store.
So create a table of addresses and use foreign keys to connect them to whatever other table you'd like. Since when does a relational structure require a garbage schema like your example. But surely you know all that.
But doesn't that then preclude accessing the same data set from programs written in other languages? The beauty of SQL is that it's language-agnostic.
You also make several points relating to toolchains and testing: sure, some databases have better tools than others. But we're talking about differences between models, not differences between particular tools.
So much horseshit in just one slide deck. No matter what you do, unless you have at least a hundred machines at your disposal, Hadoop won't be faster than a single box grep from SSDs. LucidDB is excruciatingly slow for all but tiniest datasets. I've tried a good half dozen "solutions" from this slide deck (including Aster), and other than Postgres all of them suck ass, more or less. If you see ANYTHING other than Nutch with Hadoop as a backend, head for the hills right away.
It's kind of like Make, but with a lot more XML
Climate Progress - Hell and High Water
Mysql sucked for many years but is getting better with each release. It was never designed to be a fully RDBMS .
In Japan people use PostgreSQL and I am surprised that its not common among geeks. Many ISPs now offer it as well as MySQL. The problem is the trendy word is Nosql and mostly non database programmers are promoting the movement due to bad experiences of trying to learn mysql to do things that are very complicated.
PostgreSQL is very easy to switch your existing code too if you used SQL compliant code in languages such as Php. WIth triggers, views, stored procedures, and abilities of self repairing in case of a power failure make postgreSQL an easier platform to develop for.
http://saveie6.com/
Languages with namespaces scale better than languages without namespaces.
http://michaelsmith.id.au
Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.
I suppose you could write your own threading and processes code but most Java developers just use whats built into the api.
http://saveie6.com/
Try putting petabytes of data on SSDs and let me know how that works out for you.
Which is more expensive, a few extra machines or developer time? (I'm assuming a solution that scales properly here, you write scalable solutions in any language.)
HAND.
Search has nothing to do with "relational". And SQL is a query language and also has nothing to with how well/badly a given model (relational/whatever) scales.
You are talking out of your ass.
Moving from LAMP to CLAP sounds like a new STD stack for open source develeopment
Many thanks for the explanation ! :)
Muchas Gracias, Señor Edward Snowden !
Okay, I keep hearing about these noSQL solutions, but I can't find a single example!
For example, how do you do some SELECT with couple of JOINS? How do you do SUM over GROUP of things etc. ... or for that matter how one creates table?
And, yes, I have searched for Hadoop and others but all I get these odd pages with no examples.
I'm probably too damn idiot for these NoSQL solutions, since can't find a good tutorial for converting SQL app to one of these.
The 'n' stands for 'Not' and the 'o' stands for 'Only', so it's wrong to read it as NO SQL, it should be seen as Not Only SQL. I.o.w.: not a move away from sql, but exploring other options besides SQL
Never underestimate the relief of true separation of Religion and State.
What large scale website is running some retarded JVM and Java? I can tell you because they are all slow as hell.
If you get to the point where mysql won't be able to manage all that data then Java overhead would be probably way more than few megabytes.
There seems to be this angry pushback from a core of dedicated SQL programmers, acting as if someone had insulted their tin god and wanted to invalidate their lives' work. Not at all. All that has been developing is the realization that RDBMS's are not the best fit for all applications, and that other storage schemes might have a better impedance match with the needs of a particular design. RDBMS's are still robust and reliable and useful for (maybe most) applications. Only some apps' data does not fit nicely into rows and columns. And you should design your code around the data, not try to morph the data to your software.
You just don't get it. Yeah, anyone can use the threading code built INTO java. It doesn't stop anyone from writing a pile of shit, or a masterpiece. Nor does it prevent someone from creating the worst layout in JPA, or the best one.
Great, a number of sites have switched to Cassandra, that's an interesting social benchmark. What about some real engineering benchmarks? I'd like to consider Cassandra but where is the objective data?
Cassandra's data model page states that "Cassandra is much, much faster at writes than relational systems". Great, so how about some comparative data? There is a slide show on the main Cassandra page with a snippet of data about read latency. Reads range from 7 ms to 44 ms. That's pretty anemic in the RDBMS world. There is a statement that writes are limited by network bandwidth.
There is also a presentation from IBM that shows reads ranging from 25 to 900 ms, but with no write data. The fact that read latency gets worse (increases) by a factor of 2 or more when you go from a 3 node to 6 node Cassandra cluster would seem to be worrisome on the surface.
The Facebook Engineering Notes presentation has almost nothing quantitative (only two sentences have numbers) and nothing is documented about read or write performance.
Java is a whole platform that is scalable. Its not just about using identifiers and objects but using the vast API's. Some would Java is even an OS as it has its own I/O, threads, etc.
OMFG! The amount of fanboyism is amazing...
Java libraries may be good but they, IN NO WAY make a program 'automatically scale'
You can't just write a non-trivial program in java and have it be automatically scaling horizontally .
Don't take away the merits of Cassandra developes saying it was easy because of java.
how long until
I think the entire platform is the issue, on language level you get threading,vast concurrency apis etc...
on platform level you get cloud features like distribution over multiple servers in realtime, transactional locking over an entire cloud etc...
It really depends on which features of the stack you choose but the scaling features of Java and JEE are phenomenal without too much effort.
Some other bloke in another part of the discussion said, Third, PostgreSQL has excellent performance, and PostgreSQL does, in fact, scale horizontally
Can't say I know which of you is right.
Uh - these DBs have been around for years and years. C-Tree, Raima, and other DBs are non-SQL and FAST. No DB server involved, so high concurrency wasn't a good idea. We used them for individual per-user DBs. Also, they are fine for write-once, read-many data needs.
I had 10+yrs using them before I was introduced to SQL-based DBs. Back then, MySQL, msql weren't mature enough to trust any data. The only other options were the expensive SQL vendor DBs. Those didn't work for our world-wide royalty free software distribution requirements. We started with Raima (before Velocis) and ended up migrating to C-Tree. BLAZINGLY FAST doesn't describe how fast it was. I think Raima could have been fast too, but that part of the program was written by an engineer, not a CS artist. The Engineer left our team and the CS guy rewrote everything in C-Tree. I think it costs 20x less that way too.
Bullshit. Languages don't scale
If that language is PHP then you are correct, it doesn't scale. Which is why people have to make heavy use of memcached and similar. Before you argue with me wannabe coders, how many thousand hits per second does your site get? Thats what I thought. Facebook etc took the more hardware, less dev time approach and thats how they scaled with PHP. It's pretty pathetic.
Java all the way.
No idea where you got that particular piece of misinformation. :)
Amazon, eBay, Gmail and Google Search to name a few.
If you rely on your database along to enforce things like required fields, atomicity, etc, then you have failed at creating a good application
At some point, I don't want to create a good application. I want to create a good model for the business, and then create applications that interact with that model. If I enforce constraints only in an application, someone else will forget to enforce the constraints when she writes a different application written in a different language sitting on the same database. So I enforce some constraints, such as not null or foreign key, in the table definitions. In tricker cases, I write test case reports to make sure no other application interacting with the same database has made it less than consistent. Where I denormalize to work around limitations in the RDBMS's query optimizer *cough*MySQL subqueries with GROUP BY*cough*, I put the denormalized columns into a separate table whose name ends with "cache", with a table comment pointing to which module of which app is responsible for maintaining it.
ORMs are not syntatic sugar. Reading code, yes, but they are doing a lot of work in the background and therefore are far more than sugar. They are often the first place to look when you have a slow, data-driven application. It is true you can bypass them as you mention, but that negates all the benefits. Instead you are left with tedious programming tasks they are some of the most error prone and boring parts of the application. I also don't think you answered how they are not patches on the relational model. The truth is often for serious applications, you will have to bypass an ORM, and then what have you gained? Why not use an object database and build an application with a consistent API? Building exceptions around an ORM only leads to code that is hard to test, debug, and read.
Regarding the address fields, I think you missed the point. When you get those fields into code, you are left with two choices. The first choice is having Address1-5 as I mentioned. The second choice is having a 1:many or many:many object. Neither make sense in application terms. What would make more sense for example is Customer.Addresses collection for example, or something else more specific to the objects. My point being not that relational databases cause a garbage structure because of normalization, but rather cause bad objects because relational tables do not match the structure of object oriented languages and simple application logic.
I don't believe object databases preclude access to other languages. If you want other languages to access the data, you can simply build web services on top of the data. IMO, for key functionality in anything but a small application, you should be doing this anyways regardless of using SOA or another architecture. It is easy enough to build a generic service layer for example that simply serializes objects to XML, JSON, or SOAP without actually having to build a sophisticated service layer. This is exactly what some CRM software today does to allow people to program against the internals (ironically, on top of a relational store). The database is not a great integration point and there is a lot of hard evidence that shows this. It is a tempting and easy one, but most likely two applications have different requirements. If you try to make your data a jack of all trades, youi will likely fail at everything. Data warehousing is a good example here because without wildly warping the schema, doing reports on the relational model becomes nearly impossible.
Finally, the point about different tools is that languages tend to have far better tools and more of them than databases. It eliminates having to deal with the burden that some systems place on the developer due to their pool tool chains. Frankly, there is not a DB tool chain that can compete with a language tool chain for debugging, optimization, development environment, and consistency. MS SQL as I mentioned is the closest, and I'm guessing how people around here feel about that.
With respect, I am getting the feeling you have never programmed a serious application against a relational model. There is evidence with ORMs, design patterns, and the wealth of other kinds of databases that the relational model is not good at solving many problems. The fact is that most real applications that aren't some crappy ruby on rails demo are highly object oriented and require modeling the domain as it exists, not to match a database. Read up on domain driven design and that approach alone should make the points clear, and in particular some of the benefits of using proper objects. Your application and your data are two different things. The application should not morph to the data and vice-versa. Object databases solve most of this problem, but even in those cases at some point you will need to do things like use DTOs and presentation objects to get away from the disconnect between data, domain, and presentation. The fact is I can't remember the last time I simply had an RDBMS that only consistent of simple relationships that didn't necess
Another example of technically knowledgeable people picking a really bad name.
If you want to create a good model for your business, why not write an API? An API is a lot better as an integration point. You can then build libraries you can reference and your model will always be consistent. Data and model are two different things often as well. You probably need more than one layer to properly enforce things and expose the correct functionality across applications. Maybe you're not aware, but object databases do offer the data to multiple channels, giving you the objects themselves as the model. I think that's a lot better than some crappy relational column store. Most also offer integration for multiple languages, for example Gemstone now supports Java, C, C++, Smalltalk, and Ruby. You could build more if you wanted, but I think services is the way to go anyway. Pick the language that deals with the Object Database best, then build your UI or other layers in the language of your choice (ideally the same, but not necessarily).
Different languages require different database drivers, adapters, ORMs, code, etc. This opens or for a myriad of mistakes and inconsistencies. The constraints you speak of might be honored, but you can't possibly make all your DB constraints as smart and bullet proof as application constraints. This is again less of a data store issue and more of a design issue. You could solve this by building what I speak of above on your relational store. My point is just that the relational store is not a crutch for bad programming and not a place to enforce business logic or otherwise. I understand the need for data logic, but the data store itself should be protected by most of these things long before the data ever reaches it. Is it better if my app tells me it is the wrong data type or the db spits out some ridiculous error that I then have to propogate back to the application anyway? Then I also have to hope that everything was wrapped in transactions because god forbid (MySQL especially) the database should try to perform an operation and then fail half way through, requiring someone to manually untangle the mess. If a bad object makes it in, just delete the object (equiv in most implementations to dereferencing a pointer). Simple, effective, and easy to be atomic and consistent.
Your example of implementing a cache with some table comments exactly proves my point. That is error prone, far from consistent, and ensures some idiot will overlook your careful work. This is less likely in an object database as what you get out already has everything you need. It is true that if you have more layers, you'd run into the same problem, but again, an API will solve this. Simply reference the API or web services and you're good to go across multiple applications. Is it any wonder some of the largest web sites are integrating via service APIs and not directly the database? You could have multiple stores, shards, whatever, and a service will hide that complexity.
Although it's not an Object Database, look at Azure Tables or even the relational SQL Data Services from Microsoft. I think it's a crap implementation, but if it demonstrates one thing, it is that integration via an API possibly built on services is the way to go to build massively scalable databases and provide integration points.
The term shared-nothing architecture dates back to 1986. The concept goes back further; for example, the Teradata RDBMS dates back to 1976-1983.
So ignorance about database technologies and products is one issue, and the one hardest to excuse. There are however other issues that are more understandable:
The existing solutions are proprietary, and web startups tend to prefer open source solutions.
The existing solutions are biased toward OLAP, analytics and data mining, i.e., taking large volumes of data and analyzing large sets of it at a time. Some of the "NoSQL" products are built for this (e.g., Hadoop + HDFS), but there are others that are more aimed toward simple transactional processing.
So it really is the case that there do not exist good relational products that tackle something like Digg, Facebook or Twitter, which want to use commodity hardware in geographically dispersed locations. However, it is also the case that the "NoSQL movement" that has sprung up to fill this gap has a combination of ignorance and animosity towards the relational model, and are just not thinking the problem through.
Are you adequate?
http://www.firebirdsql.org/
Firebird is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms. Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers. It has been used in production systems, under a variety of names, since 1981.
Slower as what, C++ definitely in most szenarios, PHP and other scripting languages, it runs circles around them.
Name one Web 2.0 application that is able to properly manage multiple MySql? I am aware of only one (http://novaquantum.com) but the point is that MySql is not gaining any ground on 2.0 frontend!