A Move to Secure Data by Scattering the Pieces
uler writes "The NY Times has an article about an interesting new open source storage project. Unlike data storage mechanisms today that work 'by making multiple copies of data,' the Cleversafe software takes an 'approach based on dispersing data in encrypted slices.' It's an elegant solution and one that's been a long time coming: the software uses algorithmic techniques known by mathematicians since the 70's. Adi Shamir (of RSA) first wrote of information dispersal is his 1979 paper 'How to Share a Secret (pdf).'"
After RTFA, it occurs that this is mostly a research project. The goals (and downloadables) include libraries that allow PCs to mount a distributed encrypted filesystem and others.
In a business example where you know that you can ultimately control the sites where you're storing your partial data, this would be a very good thing.
For the single user attempting to secure his information by using the existing network, there are some downfalls. 6 of 1l slices of the data are needed to recontstruct the whole. Therefore if a party intent on obtaining secret data obtains the majority of the servers, he has the data.
Also, if a disaster wipes out the majority of the servers, leaving five or less of the eleven, the data is gone.
This is a very, very important concept for business storage, but I have to wonder if it scratches any geek itches not already soothed by Truecrypt and Par2.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Lotus Notes doesn't really do this on purpose, but an artifact of the system architecture and user behavior (they want local copies of everything) seems to combine to provide a rudimentary capability of data recovery from widely distributed stores such as you describe. I observed a client once restore all the data on a Lotus Notes server following a catastrophic data loss (short chain of events meant no backups could not be recovered following the loss of a RAID filesystem, many GB of data gone). They put a call out their user community for the replicated copies of the various "database" thingies which existed on user laptops and desktop systems. They were able to recover all of the data that anyone cared about, anyway, if not actually all of it.
If you mod me down, I shall become more powerful than you could possibly imagine.
Ross Anderson of the Computer Security Group at Cambridge University wrote a paper called the Eternity Service. It has had a few different attempts at implementation, as well as some reworks in terms of design. The primary difference is in the Eternity Service - you had no idea what data you had, nor did you have access to the keys. This new concept/design seems to provide more control/granualirity for the user. Given the new proposed encryption laws in the UK, I'm not sure this is a good idea.
"Omnis tuus capsa sunt inesse nos"
Freenet uses forward error correction, which guarantees that the original data can be reconstructed given a sufficient number of pieces. Shamir's information dispersal algorithm makes the additional guarantee that nothing can be learned about the original data unless you have enough pieces to reconstruct it.
Copy
Is it, though?
According to Lynne Truss's Eats, Shoots & Leaves:
Having said which, 1980s clearly makes more sense. It's a plural, not a possessive, innit?
http://savingiceland.org
Not quite, but the coding scheme that makes CDs and DVDs resistant to dust and scratches works much like that. Big blocks have an error correcting code appended, and then the bits of the data plus error correcting code are rearranged and spread widely across the block. So when you lose a contiguous set of bits, you can replace it by using data distributed across the block.
It's a good error correction scheme, but it's not exactly new. Every CD player in the world has this. CDs aren't encrypted (there's no key, just an well-known algorithm), but you could mix encryption in if you wanted. This wouldn't help the error recovery.
This is so not-new it's not even funny. I've already seen FreeNet and MNet mentioned as precursors, which is appropriate. Dozens of other P2P "filesystems" (in quotes because I don't believe it's truly a filesystem unless it's fully integrated into the OS) and block-level data stores have done this. Probably the one that most thoroughly examined the inherent tradeoffs, and that's most directly based on Shamir's IDA work, is PASIS at CMU. Presenting Cleversafe as the first to move in this direction is an insult to those who have gone before.
Slashdot - News for Herds. Stuff that Splatters.
Whomever marked this as offtopic was a little quick on the gun. I believe the coward is referring to PAR files, a method of breaking up data and reassembling it commonly used on newsgroups.
Need to type accents and special characters in Windows? Use FrKeys
The Information Dispersal Algorithm is due to Michael Rabin.
Shamir's secret-sharing algorithm uses a similar idea (it's
essentially the same as Rabin's algorithm, except that the
data is padded with random gibberish).
Am I part of the core demographic for Swedish Fish?
Here's a reg-free link.
Courtesy of the New York Times Link Generator.
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased).