Slashdot Mirror


Open Source Moving in on the Data Storage World

pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."

6 of 169 comments (clear)

  1. Think RAID5, only way better by El+Cubano · · Score: 4, Interesting

    Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data.

    It seems like this can be tuned to provide varying levels of fault tolerance. According to the abstract (I don't have an ACM web account, and I couldn't find the full text), it seems like I can take a file and make it so that any four chunks can be used to rebuild the file. I can then take those chunks and distribute them eight times to different machines. Thus, five of the eight machines would have to be rendered inoperable before I were unable to retrieve my data.

    If I understand it correctly, then this is really slick.

    1. Re:Think RAID5, only way better by dracken · · Score: 4, Interesting

      Rabin's algorithm relies on a nifty trick. If you take a k dimensional vector and store the dot product with k orthogonal vectors then the vector can be reconstructed using just the dot product. This is a fancy way of saying any point on the x-y plane can be located if you have the x-coordinate and y-coordinate. However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector.

      Rabin has shown how to come up with l vectors of which k are mutually orthogonal.

  2. Rar + Par + BitTorrent? by DigitalRaptor · · Score: 4, Interesting

    This sounds like Rar, Par, and BitTorrent got merged in some freak transporter accident...

    Par files (for use with QuickPar, etc) are great, saving all sorts of extra posting on binary newsgroups.

    --
    Lose Weight and Feel Great with Isagenix
  3. You mean Shamir, not Rabin by Anonymous Coward · · Score: 5, Interesting

    While R in RSA stands for Ron Rivest, it is Adi Shamir (S of RSA) you have in mind. He came up with a wonderful secret sharing scheme which allows a bunch of folks or computers to keep pieces of secret in such a way that no N of them have any idea what the secret is, even if they collude. OTOH N+1 of them can easily figure out the secret. RSA can help you keep important secrets safe this way: if the owner is OK, the secret cannot be recreated; if the owner quits or dies, all-important secret holders can recover his password and unencrypt critical company data. And if a couple of them cannot participate, you still can get your secret back.

    Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!

  4. Virtual file server -- was a program for old Macs by dfloyd888 · · Score: 5, Interesting

    In the early 90s, a company made a virtual file server for networked Macs. Each client Macintosh had a file on its hard drive, and when a request was made through the driver, a number of Macs were contacted, and files were read and written to in a fairly load balanced fashion. I'm pretty sure it used some decent (think single DES) encryption at the time too, so someone couldn't just dig through the server's file on their Mac's hard disk and glean important data. It also added some redundancy, so if a Mac or two wasn't up on the network, it wouldn't kill the virtual Appleshare folder.

    By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.

  5. Sounds familiar. Like my master's thesis. by Saturn49 · · Score: 4, Interesting

    This can be done quite easily with Reed-Solomon coding. In fact, you don't need the majority of the nodes, but simply an arbitrary N set of nodes, with an arbitrary M nodes as redundancy. N=1 and M=1 is basically RAID1. N = n and M = 1 is simply RAID5, N=n and M=2 is RAID 6.

    In fact, I wrote a RSRaid driver for Linux for my thesis and did some performance testing on it. I'll save you the 30 pages and just tell you that the algorithm is far too CPU intensive to scale up very well for fileserver use (my original intent,) but I did conclude it could be used as a backup alternative to tape. Hmmmm.

    Direct Link
    Google Cache
    Please forgive the double brackets, I fought witH Word and lost.
    Contact me if you'd like to play with the code. I never did any reconstruction code, but the system did work in a degraded state, and was written for the Linux 2.6 kernel.