Canonical Drops CouchDB From Ubuntu One
rsk writes "Since the Ubuntu One desktop synchronization service was launched by Canonical it has always been powered by CouchDB, a popular document-oriented NoSQL data store with a powerful master-master replication architecture that runs in many different environments (servers, mobile devices, etc.). John Lenton, senior engineering manager at Canonical, announced that Canonical would be moving away from CouchDB due to a few unresolvable issues Canonical ran into in production with CouchDB and the scale/requirements of the Ubuntu One service. Instead, says Lenton, Canonical will be moving to a custom data storage abstraction layer (U1DB) that is platform agnostic as well as datastore agnostic; utilizing the native datastore on the host device (e.g. SQLite, MySQL, API layers, 'everything'). U1DB will be complete at some point after the 12.04 release."
Our structured data sync service is CouchDB, except for tomboy notes. Syncing files is a completely separate stack.
dropped Ubuntu due to unresolvable issues with the way that they handle desktop environment migration. Liking Mint much better. Hope others are able to manage or migrate. Ubuntu is otherwise a very nice OS.
It would be interesting to hear more from Canonical about what specific issues they ran into. They say that they worked with "the company behind CouchDB." While Couchbase is one company "behind" the project, CouchDB itself is an Apache project. Did they reach out to the Apache project itself? Also, why build something completely new rather than provide patches to existing software? I'm sure they had good reasons, but I'd like to hear some more details about what did and didn't work for them.
Bradley Holt
From the first days of Ubuntu One, before we were even in Ubuntu, we've
had a structured data storage sync service based around CouchDB.
For the last three years we have worked with the company behind CouchDB
to make it scale in the particular ways we need it to scale in our
server environment. Our situation is rather unique, and we were unable
to resolve some of the issues we came across. We were thus unable to
make CouchDB scale up to the millions of users and databases we have in
our datacentres, and furthermore we were unable to make it scale down to
be a reasonable load on small client machines.
Because of this, we are turning off most of our CouchDB-related
efforts. The contacts, notes and playlists databases will continue to
exist on our servers to support the related services, but direct
external access to the underlying databases will be shut off. Any other
databases will be deleted from our servers entirely.
For these same three years we have created and maintained desktopcouch,
which is a desktop service (and related library) to access CouchDB more
conveniently. Because we are no longer going to pursue CouchDB, we will
no longer be developing desktopcouch; in fact, if anybody wants to take
over, we'll be happy to work with you to make that official. For the
upcoming 12.04 the Ubuntu One packages will not depend on desktopcouch
nor couchdb in any way, and we'd recommend the distribution seriously
consider whether they want to continue having the package in main,
especially if no maintainer shows up.
Because we still believe there is a lot of value to our users in the
service we wanted to offer based on CouchDB, we're building something
new, based on what we've learned. It's very small, merely a layer of
abstraction and the definition of an API that will allow us and others
to build what is needed ontop of existing tools. We're calling it U1DB
for now, until it comes of age. If you're interested and techincally
inclined you can follow our progress on lp:u1db; unfortunately our
timing and resources are such that we can only promise the reference
python implementation will be ready in time for 12.04, and thus 12.04
will ship without Ubuntu One having a solid story around synchronizing
arbitrary structured data.
Thank you for reading.
https://lists.ubuntu.com/archives/ubuntu-desktop/2011-November/003474.html
I'm sorry about that misstatement; thank you for the correction.
People use ubuntu one?
You did read the same thing I did, right? They *tried* using someone else's solution, and the solution did not fit their needs. If the existing solutions don't fit your needs, what else can you do other than roll your own? I guess you could drop the service/product altogether and just call it a day, but that doesn't sound like a great business model.
If CouchDB doesn't work, there are at least six other competing solutions that will tell you how they are a better fit for your needs than all the others. Cassandra, Hadoop/HBase, MongoDB, and more. If one doesn't work for you, you can waste as much time as you like trying the others!
I too dropped Ubuntu for Mint. Much happier now.
Huh? They are swapping their database abstraction layer so they can more easily use other backends like MySQL, SQLite, etc. instead of being tied to only couchdb. How exacty is that a bad thing? They aren't "rolling their own" since they are wanting to use already existing databases.
Or they could use MySQL, SQLite, etc. instead like the summary mentions?
I wonder what became so difficult about syncing data that it has to be re-invented all the time?
I was happy using tools like rsync, diff and unison for a long time, until the moment when even Linux desktop software is too posh to store their data in files.
Now every software uses another database, at one time even Amarok used a MySQL backend. What is better about this than just putting the data in a file? Or at least making this file the Single Point Of Truth? If you need the database for speed, you can check if the file changed since the last time and then update the database from the file's contents. But simple files have been syncing and merging and everything perfectly for ages. Now it seems like every software needs its own syncing service.
Is there any reason for this, except brading the most simple things (like copying a file), or making money with cloud storage?
Do those other systems provide for arbitrary peer-to-peer data exchange/sync networks? Last I checked Couch was the only product in the NoSQL line up that provided robust support for distributed data networks.. Maybe I'm wrong or out of date.. Most of the others when I looked were able to sync data but it was for controlled sync among known peers, much the way MySQL and Postgres handle things -- not true/messy master-master replication among disorganized nodes spread around the internet.
This being the open-source world, you work with them to improve it, or worst case fork it to add the functionality you need. It's hard to say exactly what's going on without more details, but it seems like that should be easier and better than reimplementing it all from scratch.
I am trolling
There are far more than 6 other vendors who will be glad to tell me how they are better and fit my needs better than the other guys ... and much like you, I'd be an idiot if I thought any of them had a clue.
Just because someone thinks their stuff kicks all ass for every situation doesn't make it actually true.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Cassandra can do it but it doesn't have the ability to deal with arbitrarily large binary objects.
NO,NO,NO! Don't ever "roll your own" if there is something feasible already available. Too often you will end up making the same mistakes others did, possibly as long ago as 30+ years. Whenever possible take advantage of others experience (and blood, sweat, and tears) by using something existing so you don't do a poor job reinventing the wheel. And the advantage of open source is you can take someone elses work and build around it.
OK, I'll get off my soap box.
putting the 'B' in LGBTQ+
why does it have to be NoSQL?
putting the 'B' in LGBTQ+
It doesn't - you could accomplish this with NNTP if you put enough work into it (we looked at that possibility for our implementation). Certainly you could do this with a SQL system or an rsync file distribution with custom file indices. I'm just saying that in my experience couchdb gives you more out of the box to accomplish this type of set up than anything else I've found. You can get further faster with this approach if you have the kinds of requirements I was describing, IMO.
Of course I'm open to something else if it's better for this, but up to now I haven't seen anything that comes close for this, including other NoSQL databases.
Thanks - good tip on Cassandra. Riak has some capabilities that are close too. I can say that trying to do couch replication with large binary objects across unreliable networks (that is not in the same data center/peer network) is probably not a good idea anyway, even though the spec does support it..
Canonical would be moving away from CouchDB due to a few unresolvable issues
Of course they'd want to drop CouchDB. It's clearly not web scale.
I'll second that. I needed to reinstall Linux on my notebook and after having fits with the most recent Ubuntu release, I was very much ready to give Linux Mint a try.
But, on all of my single user machines (desktop, notebook, netbook), I always use root-on-LVM-on-crypt (using LUKS for encryption) for my hard drive setup. This way, everything except for the small boot partition is encrypted. It works great and I've personally found the performance hit to be negligible. I even have my file server set up this way and it does much more I/O than any of my other machines without any noticeable problems.
Anyway, both Debian and Ubuntu support this type of configuration directly in their installers. Ubuntu also supports (or did, at least) a per-user home directory encryption. I gave it a try once, but I didn't care for it and it even broke simple things, like using 'du' to see how much space was being used by a file or directory.
Unfortunately, Linux Mint does not support this configuration. I found this to be rather odd, since Mint is based on Ubuntu, which supports it, and Mint Debian edition is based on Debian which also supports it. I found a forum thread where somebody had managed to get it to work, but it seemed like an awful lot of hoops to jump through. In the end, I just went with Debian/testing and called it a day. Hopefully the Mint people will add this ability soon because I'd really like to give it a try.
Elrond, Duke of URL
"This is the most fun I've had without being drenched in the blood of my enemies!"-Sam&Max
Lotus Notes does (maybe better). If you ignore the hate about its user interface, the server component (Lotus Domino) is very robust and scalable as a NoSQL provider.
Actually, Damien (author of CouchDB) is a former Lotus engineer and modeled his creation similar to Notes Storage File (NSF) structure.
Search RapidShare and MegaUpload!
Yeah - that's a good point. Damien specifically modeled Couch off of the good parts of Lotus Notes distribution (remember when IBM kept saying Lotus notes isn't really an email product - it just does email as a side effect? I now know why they were saying that). That said, it would be hard for my project to choose closed-licensed COTS when an OSS alternative like couch exists. Recognizing Lotus is almost certainly many times more robust and better engineered than couch for all these purposes. Thanks for pointing this option out.
I think Canonical suffers a nasty case of Not Invented Here.
While not open source, it is now very competititvely priced. You might check out the licensing terms of the new IBM XWork server, which is nothing but Domino server with a fixed annual cost of around $2K:
Search RapidShare and MegaUpload!