Slashdot Mirror


A Move to Secure Data by Scattering the Pieces

uler writes "The NY Times has an article about an interesting new open source storage project. Unlike data storage mechanisms today that work 'by making multiple copies of data,' the Cleversafe software takes an 'approach based on dispersing data in encrypted slices.' It's an elegant solution and one that's been a long time coming: the software uses algorithmic techniques known by mathematicians since the 70's. Adi Shamir (of RSA) first wrote of information dispersal is his 1979 paper 'How to Share a Secret (pdf).'"

14 of 141 comments (clear)

  1. Hmmm.... by Anonymous Coward · · Score: 4, Insightful

    PAR? PAR2?

    1. Re:Hmmm.... by Disoculated · · Score: 4, Informative

      Whomever marked this as offtopic was a little quick on the gun. I believe the coward is referring to PAR files, a method of breaking up data and reassembling it commonly used on newsgroups.

  2. Wasn't this Al Gore's idea? by andrewman327 · · Score: 5, Insightful
    Although the goal was different, this is in the spirit of the creation of the Internet. DARPAnet was designed to scatter information to maintain communications. to use a different example it reminds me of RAID.


    With all of this encryption technology, people still need to remember basic security tips. Use good passwords ("password" could be cracked very quickly even with 128 bit AES), maintain physical security (hardware keyloggers can find out about the manifesto you're writing before you even save the file) and use common sense.


    Before you all ask, yes it does run Linux. The company was actually at Linuxworld.

    --
    Information wants a fueled airplane waiting at the hangar and no one gets hurt.
  3. Windows ME: Most Secure OS Ever? by nick_davison · · Score: 5, Funny

    Storing data in random locations, often garbled beyond all recognition?

    Clearly Windows ME's memory -l-e-a-k-s- management made it the most secure OS ever. If only they had some way of reconstructing that data when you wanted it back again.

  4. Number 1 of 4 by Red+Flayer · · Score: 4, Funny

    This concept just adds another layer

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  5. Freenet? by BigZaphod · · Score: 4, Interesting

    Isn't this basically what freenet does? It encrypts the data into chunks and spreads it around all over the place.

    I was working on a p2p system that worked in a similar manner. I was even thinking of repurposing it for the sake of doing online backups - but frankly the bandwidth just doesn't seem to be there yet to do that sort of thing in a practical manner. That, and I got bored with the project... (but nevermind that). :-)

    1. Re:Freenet? by mrogers · · Score: 5, Informative

      Freenet uses forward error correction, which guarantees that the original data can be reconstructed given a sufficient number of pieces. Shamir's information dispersal algorithm makes the additional guarantee that nothing can be learned about the original data unless you have enough pieces to reconstruct it.

  6. 6 of 11 by Bonker · · Score: 4, Informative

    After RTFA, it occurs that this is mostly a research project. The goals (and downloadables) include libraries that allow PCs to mount a distributed encrypted filesystem and others.

    In a business example where you know that you can ultimately control the sites where you're storing your partial data, this would be a very good thing.

    For the single user attempting to secure his information by using the existing network, there are some downfalls. 6 of 1l slices of the data are needed to recontstruct the whole. Therefore if a party intent on obtaining secret data obtains the majority of the servers, he has the data.

    Also, if a disaster wipes out the majority of the servers, leaving five or less of the eleven, the data is gone.

    This is a very, very important concept for business storage, but I have to wonder if it scratches any geek itches not already soothed by Truecrypt and Par2.

    --
    The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
  7. Number 2 of 4 by Red+Flayer · · Score: 4, Funny

    See Comment 15948676

    Of complexity, but also adds

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  8. Number 3 of 4 by Red+Flayer · · Score: 4, Funny

    See Comment 15948695Another layer of inefficiency and

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  9. Like mnet? by haeger · · Score: 4, Insightful
    If I'm not mistaken, this was one of the goals with the (now dead?) mnet project.
    From what I remember they split up data into multiple pieces, encrypted it and distributed it over a number of nodes, with some redundancy in it. If you know python and are intrested in p2p I'm sure there's a lot to be learned from that project.

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
  10. Number 4 of 4 by Red+Flayer · · Score: 5, Funny

    See comment 15948718

    an increased risk of loss of data.

    Burma Shave.

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  11. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Red+Flayer · · Score: 4, Interesting
    It's '70s not 70's.
    Not really -- it should be '70s' in all likelihood. The first apostrophe is to represent the missing "19", the second is to denote the possessive that is implied. The term "the 1970's" is a shortening of "the years of the decade we call the 1970s," or "the 1970s' years."

    This gets messy, however, since the word 'years' is implied, and to say during the '70s' will make people wonder which 70 seconds you're talking about, and why it needs to be encapsulated with apostrophes -- is it an idiomatical 70 seconds? Kinda like the Biblical '40 days'?

    For that matter, if you really want to get pedantic, what's the use of referencing the 70s at all if you're not going to bother denoting the scale? I mean, surely not mentioning that it's AD (or CE) is going to confuse people using other calendars... more so than misusing an apostrophe, right?

    Along the same lines, it's just horrific that they'd abbreviate the decade anyway, how are we to know that the writer didn't intend the 1870s, or the 2070s even, if he happens to be living backwards in time?

    Bah, there are grammatical rules, and it's great if everyone follows them, but really, it makes no difference if he spelled it 70's, '70s, or seventies (which is the proper spelling, btw).
    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  12. Notes from the Cleversafe lead developer by mengland · · Score: 5, Informative

    (Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)

    I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.

    I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.

    Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.

    We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)

    Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.

    Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.

    A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.

    I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased).