Moving From CouchDB To MySQL
itwbennett writes "Sauce Labs had outgrown CouchDB and too much unplanned downtime made them switch to MySQL. With 20-20 hindsight they wrote about their CouchDB experience. But Sauce certainly isn't the first organization to switch databases. Back in 2009, Till Klampaeckel wrote a series of blog posts about moving in the opposite direction — from MySQL to CouchDB. Klampaeckel said the decision was about 'using the right tool for the job.' But the real story may be that programmers are never satisfied with the tool they have."
Of course, then they say things like: "We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails' ActiveRecord::Store. It’s not super well integrated with MySQL's feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs."
And in another three years they will switch to whatever is the coolest up-and-coming storage solution. Incompetent developers will always be incompetent developers.
If you want to hire twentysomething engineers who have party during the week and have gadgets falling out of their pockets, go with CouchDB or Cassandra or MongoDB.
If you'd rather have people in their 30's and 40's with families, then go with MySQL or Oracle.
PostgreSQL 9.2 beta improves scalability, adds JSON
http://www.h-online.com/open/news/item/PostgreSQL-9-2-beta-improves-scalability-adds-JSON-1573815.html
You can get json support using the PLV8 extension http://code.google.com/p/plv8js/wiki/PLV8
or altenatively you can use the hstore data type.
This makes me want to avoid your suggestion. If you presently don't have the Larry problem it's best avoided.
Help stamp out iliturcy.
MongoDB is Webscale. MySQL is not Webscale, because it uses joins. SQL also has impetus mismatch.
But the real story may be that programmers are never satisfied with the tool they have.
Ah typo
But the real story may be that programmers don't know how to store data
They many not know because no one knows the business needs, but more often because they have no idea what they're doing WRT to data storage.
IT training tends to cover data manipulation pretty well "how to add two numbers'
IT training gets shakey on data structures "So, in junior level class we will talk about data structures, which is too bad because you've already developed at least two years of bad habits first"
IT training tends to pretty much skip data storage "In a senior level class, you might talk about scalability, maybe in an optional class. Or maybe you'll take a semester of cobol instead"
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
It seems to be a knee jerk reaction amongst a lot of developers and designers that as soon as your app starts requiring persistent data beyond ini values a database is needed. Why? For large but simply structured data something like json or XML or even a flat csv file is perfectly adequate. Performance can be an issue during searches but if for example you have a fixed record size with key sorted data then finding a given key is simple (binary chop or similar).
It seems to me that reaching for a DB is the easy way out taken by a lot of oders and they end up paying for it in maintanability, bugs and support.
MySQL is not Webscale, because it uses joins.
Then how does a non-webscale database power popular web sites such as Wikipedia and Slashdot? If you don't do joins in the database, you'll probably end up doing the equivalent of joins (using one value as the key in another table) in your application.
do they have an XML field type? MS SQL Server does [...] which allows you to essentially keep the table schema-less but still allows you to perform complex queries on the contained data.
But how does it index the data in the XML or JSON fields? How does it, say, tell an element containing a number from an element containing text? Does it act like SQLite, which is dynamically typed (and thus can store text in any field) but can be told to prefer to compare and index certain columns as numbers, dates, text with Unicode collation, or binary data?
with big data comes great responsibility.
PostgreSQL 9.2 (now in beta) includes native JSON fields:
http://www.h-online.com/open/news/item/PostgreSQL-9-2-beta-improves-scalability-adds-JSON-1573815.html
It's also available as an extension for the current 9.1 release:
http://people.planetpostgresql.org/andrew/index.php?/archives/255-JSON-for-PG-9.2-...-and-now-for-9.1!.html
http://www.glasswings.com/
I hadn't read the article this was based on before, thanks for the laugh. I encourage others to Google "webscale" :^)
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
In NoSQL systems such as MongoDB and CouchDB, what do you call the operation where you retrieve one document, pull an identifier out of that document, and use that identifier as the key to retrieve another document?
Hop into the wayback machine and fire up any flavor of PICK. The database where schema is applied on use, not on storage. No length limits on fields and very fast on old hardware (really fast on new). Storing bits of xml and code are no problem. And for those users who simply must have SQL, many versions will support that too (UniData and UniVerse are two examples). It's not cool, not new, but it does work.
MongoDB is Webscale. MySQL is not Webscale, because it uses joins.
I see this comment a lot and I don't get it. Somehow Map/Reduce is considered *webscale* and joins are not. But I can implement a join with Map/Reduce so why don't the webscale databases provide join functionalty that is built on top of Map/Reduce and save us all the trouble of reinventing the wheel?
We're using a four node Riak cluster, in a production environment spanning two datacenters, and have zero downtime because we can lose two nodes (for example, a whole datacenter) and still read/write. Maybe CouchDB can do that; maybe it can't. Maybe our setup isn't even the best. But, as many people have already pointed out, there is a time and a place for a NoSQL solution, and you can have "near zero" (or, so far for us, zero) downtime. Here's the catch: We are using Secondary Indexing and MapReduce, in Riak, but for searchable analytics we fire off a stripped dataset to a smaller, less-redundant MySQL store.
Urban Airship went PostgreSQL to MongoDB to Cassandra to PostgreSQL. http://wiki.postgresql.org/images/7/7f/Adam-lowry-postgresopen2011.pdf
It's a good presentation because they're in love with none of them and are moving for specific reasons each time, handling different issues. It's not coders chasing the new hotness.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
So they love NoSQL but had a bad experience with CouchDB. And the solution is... move away from NoSQL??? Ridiculous. CouchDB is certainly NOT the only solution out there, and they all have their strengths and weaknesses.
I've recently switched from MS SQL Server to RavenDB http://www.ravendb.net/ and I am never going back! Many of the things they found wrong with CouchDB and MongoDB are superior in RavenDB.
If all your application is ever going to do is read and write to fixed sized record structured data with little relational (or any) attributes then COBOL will suit you fine as that's what it was designed for. Unfortunatly those sorts of apps are few and far between these days, but in its ever decreasing niche COBOL is still good.
Yes it makes sense up to a point , but it starts to suffer from the law of diminishing returns and at some point having to do complicated multi-table joins actually slows down your queries so much that it becomes simpler and faster to suffer duplicate data than normalise to the Nth degree.
I read this entire thread wondering if "Webscale" is really some kind of valid term or not. I'm not convinced that it has any valid meaning whatsoever.
I'm god, but it's a bit of a drag really...
... the joy of schemaless DBs.
You mean working with a file system and not using a DB at all, not needing to pay a DBA, not dealing with corrupted databases, not using arcane tools, etc.?
I jest, but not entirely. Clearly there are purposes for which databases are the right toold for the job. I'm most definitely not convinced that big blob storage is one of them.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Clues to the source of some of Sauce Labs' problems can be gleamed from their list of Maintenance headaches:
Oh please doing an HTTP GET periodically is tricky ??? No it's not. With couchdb if your database is very dynamic you should either index periodically and very frequently. This creates a quantified and controlled performance demand on the server. Ideally read servers should never be the write server and replication should be filtered. Using Couchdb naively will lead to failure. Don't use javascript views, python views are 3-4 x faster, erlang views are 7-10 faster. Used in the right way and following the many tips that you can get from the Couchdb community will make Couchdb not just a great database but a great application platform.
And yes 1.2 is a great improvement.
Currently I'm using coucdb, mongodb, and MySQL all in one high profile project handling terabytes of data and millions of hits. Each has it's use. When it comes to reliability and performance all three DBs are NOT my problem.
Yes, traditional joins run in a single process but AFAIK there no reason why a join could not be distributed across many processes by implementing the join with Map/Reduce. Right?
It's a quote.
I see this comment a lot and I don't get it.
Here you go. Now you'll get why it's funny, and not serious.
a majority of our unplanned downtime was due to CouchDB issues
Nowhere on the CouchDB home page is reliability even mentioned. And that's the real issue. Developing a reliable database system is a difficult design and programming task. It requires real software engineering. The hacks who write PHP and use JSON aren't up to a job like that. The "aw, we'll fix it in the next release" attitude doesn't cut it in databases.
They do cross-browser testing. Right? I can't really figure out what they sell or why I would want it. The real question, "Why haven't I started a mismanaged company with ambiguous products?"
Sig. Sig. Sputnik
Typing is just another constraint, like foreign keys, and various other domain constraints. I cannot see any valid argument for having foreign keys but not type constraints. It jest seems bizarre to me. It's not like they could be optional or anything.
They are optional. It appears you can enforce static typing for a column with constraints like CHECK(typeof(x)='integer'). I'd give more details, but the document that Wikipedia cites about such constraints is a printed publication of which I happen not to own a copy.
In what other language would this statement compile without error:
PERFORM makemoney UNTIL rich.
(Note the the full stop at the end)
And someone with experience in the entity-relationship modeling that underlies a relational database schema is likely to see everything as "a lot of related data". For example, a Slashdot comment is related to a parent (a story or another comment), the user who posted it, and the moderations done to that comment.
So the thing is, traditional joins (on, say, Postgres or MySQL) aren't blocking operations. You can run more than one at a time. MapReduce (as well as writes, any aggregation, and any use of JavaScript) are blocking operations on Mongo. They block the entire mongo process. The MapReduce case gets around this with a bit of cooperative multitasking (yielding every few hundred or thousand rows), but writes, aggregation, and other use of javascript do not. So there's already a much bigger need to distribute MapReduce on Mongo than there is to distribute a JOIN on an SQL database.
Plus, MapReduce on Mongo is painfully slow, so you'll need to break things down into really small partitions to scale at all. Aggregating 800,000 documents (group by + sum) took me about 20 seconds with Mongo using the existing aggregation framework (which is universally credited with being faster than the MapReduce case). Porting the whole thing over to an SQL database allowed me to a.) not block the entire freaking process and b.) run the query in about 800ms.
So, sure, we could have partitioned the data and spread it across multiple nodes. Maybe that would have been faster (but you can only run MapReduce across multiple nodes, using the existing aggregation framework you can only operate on one node). Dunno. But it would have been a lot more expensive since we would have required more hardware to accomplish the same thing that an SQL database is optimized for.
The reason that "they" mock JOINs is because you simply can't do that efficiently with Mongo.
The revolution will be mocked
That's an informative reply, thanks for taking the time to post it.
I thought the article referred to here particularly interesting because it puts forth the idea that, essentially, NoSQL and Relational DB's are just two ways of looking at the same thing, and thus are both equally valid. Right now they are used situationally (at best) but I think you will see a convergence at some point with something "new" that can do both equally well using a common core and different wrappers around it.
-- Senior Software Engineer, Attorney appearance services, locallawyerapp.com.
It is kind of a weird word, isn't it? Almost as shemales.
like not having to declare varchar sizes versus text versus CLOB versus BLOB.
What's the difference between a text column and a character large object column? MySQL doesn't appear to have distinct types for the two.
They're all stored and indexed and searchable in the same way (not that you'd want to index any sort of LOB.)
If CLOB is the same thing as text, then why wouldn't you want to support a full-text index on a CLOB? That said, I tend to write my own full-text indexing tools to work around the limitation in MySQL's full-text index: no words shorter than four characters unless you're root, no common English words unless you're root, etc.
XML field type [in] MS SQL Server [...] allows you to essentially keep the table schema-less
shutdown -p now wrote:
it also permits you to specify an XML Schema for the field
But what's the advantage of XML with a schema over just parsing the XML in the application and putting each value in a column? Perhaps I'm having trouble seeing it because my mind is trapped in a relational paradigm.