Domain: git-scm.com
Stories and comments across the archive that link to git-scm.com.
Comments · 54
-
I'm Still Fuzzy on NoSQL
I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open). It's also long been known that, for example, Berkeley DB can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.
-
Re:Has there never been a non-cloud data loss?
Heck, I know folks who've lost entire well-known (hobbyist) web-portals some years back due to provider server failures. It was a harsh lesson for those involved. So much for the provider's backup policies. The real solution is to have multiple copies of the data, ideally in different formats. For example, when I was in grad school the University had (for the time) a huge email installation, basically full email hosting for the entire institution. The server and storage spec was excellent -- a big SAN-like dual storage array that could handle failures at multiple levels, including one entire half of the storage system. Turns out they got hit by a nasty filesystem corruption bug, which nuked the whole array. Oops. Their bacon was saved because they also had regular verified tape backups (IIRC, it took many, many weeks to fully restore archived mail to the cluster).
These problems really have little to do with the computing models involved. There's a misperception that the "cloud" provides some sort of data robustness beyond what mere mortals can accomplish, but the reality is that valuable data just needs more copies. Perhaps their backup strategies are layered and awesome, but you never really know where the weak links are. One remote service provider really only ever counts as one copy. And so it's useful to consider a service like GitHub. The fundamental model of the service is to encourage folks to share and copy their data around, because that's a prime goal of the supporting software: git. If a git-based service goes down, there should be many copies of the repository data, and the various users will regroup, republish, and move on. No single user has to be overly conscious of maintaining lots of backups, because copying is the basic working model.
There's a lesson there for those of us working in software: design for subversive backup, where critical data is backed up/synced/secured as a normal part of day-to-day workflow. Make sure that failure in any one point doesn't induce the others to similarly fail or become corrupt. Think through and verify the recovery schemes. Imagine that it's your data going down the tubes...
-
Re:free software and open source
I actually have it on good authority that Microsoft employees are a bunch of GITs...
Try the veal! -
Different systems for different files
What system do you use to manage your home directories, and how have they worked for you for managing small files (e.g. dot configs) and large (gigabyte binaries of data) together?
I don't know that managing them *together* is all that useful. What I have been doing (and what I think is a more flexible way to manage stuff), is to divide the stuff in your home directory into independent 'projects' (e.g. financial documents, stuff for work, source code of my website, project X, project Y, my photo collection...) and manage each project separately in a way that lends itself well to the kind of file being stored. For a directory of small files that are frequently updated, Git is a great way to go. For synchronizing and backing up large collections of large files (like an MP3 or photo collection) you might try something like ContentCouch (disclaimer: I wrote this tool).