Domain: cleversafe.org
Stories and comments across the archive that link to cleversafe.org.
Comments · 26
-
Re:Information Dispersal Algorithms
Unfortunately there don't seem to be many tools or libraries available, so you have to implement the IDA yourself and this requires a bit of math.
Actually there's a company called Cleversafe that's building an open source, large scale storage system that's based on IDAs like Reed Solomon. I haven't tested the beta versions, but it definitely sounds promising.
-
Re:Is there a distributed file storage system
http://cleversafe.org/
I did some benchmarking of it for a project a while ago. While it is slow and still appears to be in its infancy, the product does work. -
Cleversafe sounds like a possible solution
http://www.cleversafe.org/ It's open source, dispersed storage, encrypted, redundant... seems like it's worth giving a try. I haven't used it personally but it had been around a while. The more machines using it, the better a solution it is from what I can tell. The windows support may be the big question... but the project seems worth keeping an eye on.
-
Options to check out...
NovaBACKUP (PC World Best Buy; offers tape encryption)
http://www.novastor.com/
Cleversafe (GPL'd)
http://www.cleversafe.org/
Genie Backup Manager
http://www.genie-soft.com/products/gbm/default.htm l?AfID=13778
SyncBack (freeware)
http://www.2brightsparks.com/downloads.html
EMC Insignia Retrospect (formerly Dantz Retrospect; PC Magazine Editor's Choice)
http://www.emcinsignia.com/products/
-
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Notes from the Cleversafe lead developer
(Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)
I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.
I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.
Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.
We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)
Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.
Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.
A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.
I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased). -
Cleversafe vs. copying and other projects
In reading through a lot of the posts, I thought it might be useful to elaborate on how Cleversafe compares to current copy-based data storage systems as well as previous projects using similar techniques for data storage and communications...
Effectively all digital data storage in use today works by making copies of data and redundant copies of data with the use of parity bits when stored on a RAID array. Cleversafe does not store a copy of the original data and definitely does not store copies of data. Cleversafe 'disperses' data which is different than copying data. Original data files are turned into a set of 'dispersed files' or 'slices' -- each of which contains too little information to be useful on its own. These slices are then stored in different locations. On the current Cleversafe test grid, each file is dispersed into 11 slices which are each stored by a separate storage hosting providers in separate geographic locations as shown at http://www.cleversafe.org/wiki/Cleversafe_Research _Storage_Grid.
In order to ensure ultra-high availability, the dispersal algorithms are designed in such a manner that any majority of these slices can be used to perfectly recreate all the original data. This technique is similar to methods often employed in data communications where data is broken up into some number of packets by the sender in such a manner that the receiver does not need all the packets to recreate all the original data.
Over the past 25 years, a number of projects have looked at storing data using information dispersal or similar techniques. Many of these projects have used Reed Solomon or similar encoding / decoding techniques, including OceanStore, PAR/PAR2 and others. The Cleversafe project is not only developing algorithms for information dispersal, but is also creating a complete system to enable the benefits of Dispersed Storage to be practically used on a generally-available-to-everyone scale. So in addition to creating new, computationally-efficient algorithms for information dispersal, the Cleversafe project includes:
- a metadata management system for managing files stored on the grid
- grid management tools, including 'rebuilder' processes that enable the grid to 'self-heal'
- interfaces to enable dispersed storage to work in various existing environments, including a general API, a command line interface, a file system interface (which we began demonstrating at Linux World last week) and an upcoming GUI interface
- integration with existing methods for encryption
- live dispersed storage grids running on nodes operated by various storage hosting companies in various locations
- etc.
So, the focus of Cleversafe is to build on the previous work in dispersed storage (which has mainly been academic research) to create a practical and complete open source system to better store the world's data.
Chris Gladwin -
Re:Aimed at who ?
Also, for what it's worth:
The Cleversafe devs internally use the Windows binaries quite a bit (our current build process uses the CodeBlocks IDE--see the CodeBlocks project/.cbp file in the repo; we haven't yet totally ported the gnu-make-based process to mingw/msys yet, in part due to problems with mingw/msys gnu make); we might even have more test-usage history with the Windows client than we do the Linux client.
However, the reason we chose not to release the Windows binary had little to do with the technical issues, and much more to do with legal ones. Our open-source project is released under the GPL v2. The code uses OpenSSL and Xerces-C (and XML parser) libraries. We can not distributed neither OpenSSL nor Xerces-C source or binaries in our binary or source distributions because their respective licenses are GPL incompatible, even though we can distribute them separately (although our current windows-mingw builds statically link openssl libs in the the executable, which is an additional hurdle to overcome). The packaging and installation for Windows binaries made it more difficult to handle this separate then it did on the Linux (and other future Unix/BSD systems), so we just decided to get the release out there sooner rather then later and not wait for our Windows package/installation management to get done.
Bottom line: we are fully "plugged in" to Windows systems, and you should see a Windows binary release in the near future.
-Matt -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Dispersal is not encryption; Cleversafe uses both
Anonymous writer writes:
Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.
To be clear:
Dispersal is not encryption. (Cleversafe uses both.)
While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.
Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.
This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)
Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.
Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.
One can read more about the open-source flavor of the Cleversafe grid design.
-Matt
ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums). -
Comparing Cleversafe IDA algorithms with others
More notes on our IDAs compared with others:
The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.
Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:
http://wiki.cleversafe.org/Turbo_IDA_Technology
On can also read an Excel spreadsheet (found in the above wiki page) and C++ source code that represents the "guts" of our 11-Pillar IDA code module.
For more info about Cleversafe contributors:
http://wiki.cleversafe.org/Cleversafe_Contributors
You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.
-Matt England
ps: We are finishing up our project announcement at this week's MySQL User's Conference where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours. -
Comparing Cleversafe IDA algorithms with others
More notes on our IDAs compared with others:
The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.
Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:
http://wiki.cleversafe.org/Turbo_IDA_Technology
On can also read an Excel spreadsheet (found in the above wiki page) and C++ source code that represents the "guts" of our 11-Pillar IDA code module.
For more info about Cleversafe contributors:
http://wiki.cleversafe.org/Cleversafe_Contributors
You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.
-Matt England
ps: We are finishing up our project announcement at this week's MySQL User's Conference where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours. -
Comparing Cleversafe IDA algorithms with others
More notes on our IDAs compared with others:
The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.
Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:
http://wiki.cleversafe.org/Turbo_IDA_Technology
On can also read an Excel spreadsheet (found in the above wiki page) and C++ source code that represents the "guts" of our 11-Pillar IDA code module.
For more info about Cleversafe contributors:
http://wiki.cleversafe.org/Cleversafe_Contributors
You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.
-Matt England
ps: We are finishing up our project announcement at this week's MySQL User's Conference where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours. -
Comparing Cleversafe IDA algorithms with others
More notes on our IDAs compared with others:
The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.
Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:
http://wiki.cleversafe.org/Turbo_IDA_Technology
On can also read an Excel spreadsheet (found in the above wiki page) and C++ source code that represents the "guts" of our 11-Pillar IDA code module.
For more info about Cleversafe contributors:
http://wiki.cleversafe.org/Cleversafe_Contributors
You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.
-Matt England
ps: We are finishing up our project announcement at this week's MySQL User's Conference where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours. -
Notes from lead Cleversafe designer
(This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)
Hello-
I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).
If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.
Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.
The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.
We feel this system provides a powerful combination of reliability, scalability, economy, and security.
The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design [cleversafe.org]
There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...
I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org]
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.
-Matt
Cleversafe project lead -
Notes from lead Cleversafe designer
(This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)
Hello-
I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).
If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.
Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.
The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.
We feel this system provides a powerful combination of reliability, scalability, economy, and security.
The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design [cleversafe.org]
There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...
I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org]
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.
-Matt
Cleversafe project lead -
Re:redundancy = your secret is safe (with us)
Hello-
I am the chief designer of the Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).
If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.
Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.
The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.
We feel this system provides a powerful combination of reliability, scalability, economy, and security.
The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design
There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...
I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.
-Matt -
Re:redundancy = your secret is safe (with us)
Hello-
I am the chief designer of the Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).
If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.
Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.
The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.
We feel this system provides a powerful combination of reliability, scalability, economy, and security.
The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
http://wiki.cleversafe.org/Grid_Design
There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...
I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/
Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.
-Matt