Slashdot Mirror


Open Source Moving in on the Data Storage World

pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."

6 of 169 comments (clear)

  1. Editors, please note! by Anonymous Coward · · Score: 5, Informative

    Editors please note!

    Editors, please note that there is some incorrect information in this post. Firstly, the original concept of the IDA was designed by Shamir of RSA fame, not Rabin.

    Also note that the Cleversafe IDA is a custom algorithm, and is only similar to Shamir's initial concept.

  2. The 'R' stands for Rivest, not Rabin by Durindana · · Score: 5, Informative



    While Michael Rabin was inventor of the Rabin cryptosystem in 1979, it was Ronald Rivest, Adi Shamir and Len Adleman behind RSA two years earlier.

  3. Not a new idea by D3viL · · Score: 5, Informative

    so it's sort of like parchive http://parchive.sourceforge.net/ which is software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data

  4. You mean Shamir, not Rabin by Anonymous Coward · · Score: 5, Interesting

    While R in RSA stands for Ron Rivest, it is Adi Shamir (S of RSA) you have in mind. He came up with a wonderful secret sharing scheme which allows a bunch of folks or computers to keep pieces of secret in such a way that no N of them have any idea what the secret is, even if they collude. OTOH N+1 of them can easily figure out the secret. RSA can help you keep important secrets safe this way: if the owner is OK, the secret cannot be recreated; if the owner quits or dies, all-important secret holders can recover his password and unencrypt critical company data. And if a couple of them cannot participate, you still can get your secret back.

    Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!

  5. Virtual file server -- was a program for old Macs by dfloyd888 · · Score: 5, Interesting

    In the early 90s, a company made a virtual file server for networked Macs. Each client Macintosh had a file on its hard drive, and when a request was made through the driver, a number of Macs were contacted, and files were read and written to in a fairly load balanced fashion. I'm pretty sure it used some decent (think single DES) encryption at the time too, so someone couldn't just dig through the server's file on their Mac's hard disk and glean important data. It also added some redundancy, so if a Mac or two wasn't up on the network, it wouldn't kill the virtual Appleshare folder.

    By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.

  6. Notes from lead Cleversafe designer by mengland · · Score: 5, Informative

    (This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)

    Hello-

    I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

    If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

    Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

    The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

    We feel this system provides a powerful combination of reliability, scalability, economy, and security.

    The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
    http://wiki.cleversafe.org/Grid_Design [cleversafe.org]

    There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

    I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org]
    Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

    -Matt
    Cleversafe project lead