Slashdot Mirror


A Move to Secure Data by Scattering the Pieces

uler writes "The NY Times has an article about an interesting new open source storage project. Unlike data storage mechanisms today that work 'by making multiple copies of data,' the Cleversafe software takes an 'approach based on dispersing data in encrypted slices.' It's an elegant solution and one that's been a long time coming: the software uses algorithmic techniques known by mathematicians since the 70's. Adi Shamir (of RSA) first wrote of information dispersal is his 1979 paper 'How to Share a Secret (pdf).'"

141 comments

  1. Hmmm.... by Anonymous Coward · · Score: 4, Insightful

    PAR? PAR2?

    1. Re:Hmmm.... by Disoculated · · Score: 4, Informative

      Whomever marked this as offtopic was a little quick on the gun. I believe the coward is referring to PAR files, a method of breaking up data and reassembling it commonly used on newsgroups.

    2. Re:Hmmm.... by Anonymous Coward · · Score: 1, Insightful

      PAR/PAR2 takes data, chunks it up, and adds RAID-like redundancey. Add a layer of compression for space savings, and these files can be scattered around the net (in fact, already are), and the original data can be reconstructed with a sub-set of the data, even if the sub-set is damaged. (within calculable limits of course)

      (wow, unintentional FP even...)

  2. Doesn't FreeNet do this? by mb10ofBATX · · Score: 2, Interesting


    I've been out of the freenet loop for a long time, but I thought I remembered reading in its documentation a few years ago that it did this same kind of encrypting and dispersing chunks of data.

  3. Wasn't this Al Gore's idea? by andrewman327 · · Score: 5, Insightful
    Although the goal was different, this is in the spirit of the creation of the Internet. DARPAnet was designed to scatter information to maintain communications. to use a different example it reminds me of RAID.


    With all of this encryption technology, people still need to remember basic security tips. Use good passwords ("password" could be cracked very quickly even with 128 bit AES), maintain physical security (hardware keyloggers can find out about the manifesto you're writing before you even save the file) and use common sense.


    Before you all ask, yes it does run Linux. The company was actually at Linuxworld.

    --
    Information wants a fueled airplane waiting at the hangar and no one gets hurt.
    1. Re:Wasn't this Al Gore's idea? by Detritus · · Score: 2, Insightful
      Although the goal was different, this is in the spirit of the creation of the Internet. DARPAnet was designed to scatter information to maintain communications.

      Cite? From what I've read about the original Arpanet, it was designed to allow the sharing of computer resources and data among DoD researchers. It wasn't designed to be a failure-tolerant network, although DARPA funded quite a bit of research in that area.

      --
      Mea navis aericumbens anguillis abundat
    2. Re:Wasn't this Al Gore's idea? by zacronos · · Score: 1
      I thought that was common knowledge that they wanted to allow sharing of resources in a failure-tolerant way -- after all they didn't want to become reliant on a communication and collaboration technology that could be easily disrupted in wartime. That's just common sense.

      Since you demand a citation -- from the textbook Understanding Computers: Today & Tomorrow, 10th edition, by Charles S. Parker, page 365 (under Evolution of the Internet), emphasis mine:
      One objective of the ARPANET project was to create a computer network that would allow researchers located in different places to communicate with each other. Another objective was to build a computer network capable of sending data over a variety of paths to ensure that network communications could continue even if part of the network was destroyed, such as in a nuclear attack or by a natural disaster.
      If you prefer an online source, take a look at this "Internet basics" site.
    3. Re:Wasn't this Al Gore's idea? by Mister+Whirly · · Score: 1

      Sharing data was the function of the Arpanet that the academics were most interested in. The military was interested for different reasons. Remember, this was developed when the cold war was raging full force and the US was worried about a Soviet strike knocking out our military communication grid. It was designed to be failure-tolerant so if one hub was taken out, communication could still happen due to it's packet switching method of transferring data.
      To this day most military/government information is shared over private LANs and not over the internet in general.

      --
      "But this one goes to 11!"
    4. Re:Wasn't this Al Gore's idea? by Detritus · · Score: 1
      I thought that was common knowledge that they wanted to allow sharing of resources in a failure-tolerant way -- after all they didn't want to become reliant on a communication and collaboration technology that could be easily disrupted in wartime. That's just common sense.

      You need a better textbook. The idea that the Arpanet was designed to be a survivable network is a particularly persistent myth.

      It was from the RAND study that the false rumor started claiming that the ARPANET was somehow related to building a network resistant to nuclear war. This was never true of the ARPANET, only the unrelated RAND study on secure voice considered nuclear war. However, the later work on Internetting did emphasize robustness and survivability, including the capability to withstand losses of large portions of the underlying networks.

      A Brief History of the Internet, version 3.32
      Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, Stephen Wolff
      http://www.isoc.org/internet/history/brief.shtml

      See also:

      Charles Herzfeld on ARPAnet and Computers:

      ARPAnet - A Network for Sharing Computer Resources

      Why was the ARPAnet started? Most of the early "history" on the subject is wrong. As Director of ARPA at the time, I can tell you our intent. The ARPAnet was not started to create a Command and Control System that would survive a nuclear attack, as many now claim. To build such a system was clearly a major military need, but it was not ARPA's mission to do this; in fact, we would have been severely criticized had we tried. Rather, the ARPAnet came out of our frustration that there were only a limited number of large, powerful research computers in the country, and that many research investigators who should have access to them were geographically separated from them.

      http://inventors.about.com/library/inventors/bl_Ch arles_Herzfeld.htm

      --
      Mea navis aericumbens anguillis abundat
    5. Re:Wasn't this Al Gore's idea? by zacronos · · Score: 1

      Well damn. I guess that's another example of when "common knowledge" is wrong. And the worst part is that I've been perpetuating it in the "Introduction to Computing" course I teach (which is why I had that textbook on hand).

    6. Re:Wasn't this Al Gore's idea? by Anonymous Coward · · Score: 0

      Al Gore just wrote about which appeared in several magazines sometime in late 1980's but is not the creator of this. The technology was already there but just need an more easier method to access it. The TCP/IP protocol was envision by DARPA there is no centralized hub so that if one node was destroyed by either an bomb or a cable cut the traffic would be re-routed to another direction so network traffic flow would not be impeded. ARPAnet which was the base for the TCP/IP protocol and what he saw at the University of Tennenese while he was an US Senator at that time. However the government and DOD used the ARPAnet as an tool to what worked and what didn't for their own network and ARPAnet eventually became the "Internet" was we know it. Like anyone that has been network or system administrator there is much left to be desired and much to fixed in IP4. IP6 is supposed to fix most of these problems but no one willing to fork out the core infrastructure for this.

  4. An important detail by slapyslapslap · · Score: 1

    From the article: Cleversafe is significant because it is an open-source project -- that is, the technology will be freely licensed, enabling others to adopt the design to build commercial products. This could be a very important OSS tool.

  5. The RIAA is so going to hate this... by Anonymous Coward · · Score: 0

    Think about it: The storage provider doesn't know what he's storing and the user doesn't need any incriminating data on his machine. It's a DRM nightmare...

  6. Windows ME: Most Secure OS Ever? by nick_davison · · Score: 5, Funny

    Storing data in random locations, often garbled beyond all recognition?

    Clearly Windows ME's memory -l-e-a-k-s- management made it the most secure OS ever. If only they had some way of reconstructing that data when you wanted it back again.

    1. Re:Windows ME: Most Secure OS Ever? by Anonymous Coward · · Score: 0

      Yes, Windows is certainly an homage to unintentionally secure hashing.

    2. Re:Windows ME: Most Secure OS Ever? by Anonymous Coward · · Score: 0

      It's good that Firefox is now following those hallowed traditions, with its own version of "memory -l-e-a-k-s- management."

  7. ...like network RAID? by xxxJonBoyxxx · · Score: 2, Informative
    The Project uses information dispersal algorithms (IDAs(TM)) to separate data into 11 unrecognizable DataSlices(TM) and distribute them, via secure Internet connections, to 11 storage locations throughout the world, creating a storage grid. With dispersed storage, transmission and storage of data is inherently private and secure. No single entire copy of the data is in one location, and only 6 out of the 11 nodes need to be available in order to perfectly retrieve the data.
    ...like network RAID? The site needs spellchecking - badly - but the encryption seems to be based on a key derived after you do some kind of RSA public/private key sign on.
    1. Re:...like network RAID? by Gary+W.+Longsine · · Score: 2, Funny

      Interesting. Why 11?
      "It's one louder."

      --
      If you mod me down, I shall become more powerful than you could possibly imagine.
    2. Re:...like network RAID? by hey · · Score: 1

      They don't need spell checking they simply need to add in the version of their
      site where the spelling in "incorrect" in the complementary ways which result in
      conventually correct English.

    3. Re:...like network RAID? by HoboMaster · · Score: 1

      It's because 11 is one more securer than 10.

      --
      Remember kids, tin foil doesn't work, so use LeadHat.
    4. Re:...like network RAID? by jimktrains · · Score: 1

      Is it like RAID? Can one remote location be reconstructed from the other 10?

      --
      "You will do foolish things, but do them with enthusiasm." - S. G. Colette
  8. Number 1 of 4 by Red+Flayer · · Score: 4, Funny

    This concept just adds another layer

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    1. Re:Number 1 of 4 by the+phantom · · Score: 1

      This got modded troll? This is one of the greatest /. posts of all time!

    2. Re:Number 1 of 4 by Red+Flayer · · Score: 1

      Why thanks, but I think the early troll modder mistook it for a fp troll, the mod was slapped on before the joke was completed...

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    3. Re:Number 1 of 4 by the+phantom · · Score: 1

      Indeed, you are probably correct.

    4. Re:Number 1 of 4 by complete+loony · · Score: 1

      That looks like

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  9. I had a similar idea in the mid 90s by GMontag · · Score: 0, Redundant

    When I was mobilized by the Army to develop a database for Personnel/HR use in the mid 90s, I thought of something similar for data backup. Was not really thinking of it as a security system, more like an 'insurance' system.

    Problem was, I did not know enough about developing systems like that, nor did I know enough about getting the idea in front of the people who could make it happen.

    The basics were when users in the field made queries the returned data would be stored for some period of time and a separate server would record who had what and be able to retrieve the data in case the backups were destroyed or inaccessible.

    The main thing was that if it were recently downloaded data then it was more relevant than older data, which could wait to be reconstructed but newly queried records were more important to current operations.

    Also, since the data was scattered about, it would be of less interest to a party wanting to grab info about soldiers.

    Obviously the idea needed more thought by more brains than mine.

  10. Freenet? by BigZaphod · · Score: 4, Interesting

    Isn't this basically what freenet does? It encrypts the data into chunks and spreads it around all over the place.

    I was working on a p2p system that worked in a similar manner. I was even thinking of repurposing it for the sake of doing online backups - but frankly the bandwidth just doesn't seem to be there yet to do that sort of thing in a practical manner. That, and I got bored with the project... (but nevermind that). :-)

    1. Re:Freenet? by mrogers · · Score: 5, Informative

      Freenet uses forward error correction, which guarantees that the original data can be reconstructed given a sufficient number of pieces. Shamir's information dispersal algorithm makes the additional guarantee that nothing can be learned about the original data unless you have enough pieces to reconstruct it.

    2. Re:Freenet? by Xenna · · Score: 1

      Funny, I've been thinking of doing the same. P2P encrypted backups. Shouldn't be too difficult. I have a healthy suspicion towards systems that try to be too smart. I don't want my pieces scattered over various systems.

      For me a system that alows me and a buddy to backup each other's data without getting access to it would be ideal. Too much work to do myself, though, and apparently you weren't going to do it for me, so I got myself a colo machine instead and use it as an rsnapshot server as well as an openvpn hub.

      It's an idea whose time has come, although ADSL upload speeds get somewhat in the way, my rsnapshots seem to work fine as long as I don't suddenly decide to move my entire music of photo trees to a different location. ;)

      I tried freenet once, it felt like going back to the times of 14.4k modems :(

      X.

    3. Re:Freenet? by Anonymous Coward · · Score: 0
      P2P encrypted backups. Shouldn't be too difficult.

      Yes, cryptography is always easy to get right, so that part should be a breeze.

    4. Re:Freenet? by Anonymous Coward · · Score: 0

      Yes, this is similar to Freenet. And almost exactly what FreenetFS does.

    5. Re:Freenet? by rplacd · · Score: 1

      Funny, I've been thinking of doing the same. P2P encrypted backups.

      I give you... All My Data. Its distant cousin (Mnet) is still around, but sorta moribund.

    6. Re:Freenet? by tomjen · · Score: 1

      Yes it is - just download the OpenSSL source code and use the already developed source. Takes no time and we all know how good the OpenBSD is with regards to security.

      --
      Freedom or George Bush
    7. Re:Freenet? by Panaflex · · Score: 1

      Sorry.. but dream on.

      There's a lot of issues besides just "use openssl." Granted - that's a GREAT way to get started but there are a lot of issues to take care of like secure buffer management and usage, protocols, key management to name a few.

      --
      I said no... but I missed and it came out yes.
    8. Re:Freenet? by Xenna · · Score: 1

      That's exactly what I would have used, of course. The cryptographic difficulties are further simplified because we only require a symmetrical encryption algorithm like Blowfish or Rijndael. The other side should *not* be able to decrypt, so key distribution is no issue.

      X.

    9. Re:Freenet? by Xenna · · Score: 1

      I get it, you're trying to insult me into writing it ;)

      That strategy has worked with me before, but sorry, not this time. I have too many jobs already. :(

      X.

    10. Re:Freenet? by AndroSyn · · Score: 1
      Yes it is - just download the OpenSSL source code and use the already developed source. Takes no time and we all know how good the OpenBSD is with regards to security.


      Again...the only thing OpenSSL has in common with OpenBSD is the word Open. That is it. OpenSSL is not developed by the OpenBSD people at all. I wish people would stop saying this...
    11. Re:Freenet? by Xenna · · Score: 1

      Well it's so easy to get one letter wrong:

      http://www.openssh.org/

      That's what makes programming such a hassle ;)

      X.

  11. aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Anonymous Coward · · Score: 2, Funny

    It's '70s not 70's.

    1. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Anonymous Coward · · Score: 0

      ...and while we're on the subject, it's lose not loose (unless you're talking morals). But do ./ers actually care about accurate communication?

    2. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by joebutton · · Score: 2, Informative
      It's '70s not 70's

      Is it, though?

      According to Lynne Truss's Eats, Shoots & Leaves:

      Until quite recently, it was customary to write "MP's" and "1980's" - and in fact this convention still applies in America. British readers of The New Yorker who assume that this august publication in in constant ignorant error when it allows "1980's" evidently have no experience of how that famously punctilious periodical operates editorially.

      Having said which, 1980s clearly makes more sense. It's a plural, not a possessive, innit?

    3. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Red+Flayer · · Score: 4, Interesting
      It's '70s not 70's.
      Not really -- it should be '70s' in all likelihood. The first apostrophe is to represent the missing "19", the second is to denote the possessive that is implied. The term "the 1970's" is a shortening of "the years of the decade we call the 1970s," or "the 1970s' years."

      This gets messy, however, since the word 'years' is implied, and to say during the '70s' will make people wonder which 70 seconds you're talking about, and why it needs to be encapsulated with apostrophes -- is it an idiomatical 70 seconds? Kinda like the Biblical '40 days'?

      For that matter, if you really want to get pedantic, what's the use of referencing the 70s at all if you're not going to bother denoting the scale? I mean, surely not mentioning that it's AD (or CE) is going to confuse people using other calendars... more so than misusing an apostrophe, right?

      Along the same lines, it's just horrific that they'd abbreviate the decade anyway, how are we to know that the writer didn't intend the 1870s, or the 2070s even, if he happens to be living backwards in time?

      Bah, there are grammatical rules, and it's great if everyone follows them, but really, it makes no difference if he spelled it 70's, '70s, or seventies (which is the proper spelling, btw).
      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    4. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Anonymous Coward · · Score: 0

      Your during the '70s' did not make me think of a 70 second period. I assumed the time frame 1970-1979. Like you stated, referencing a decade time period is a common thing and I assume a majority of people would assume the same thing I did and almost no one would assume a 70 second period provided even if no further context was given. When speaking the same exact thing, there is no use of quotes or apostrophes and no one gets confused there either. You need to look at the context something like that is used in. If a bunch of technicians are working on some response time for a piece of equipment and one states or writes, the lag time is in the 70s, everyone else involved would also assume a unit they are working with. You and I do not know if that is 70 seconds, 70 msec, or 70 microseconds but they do. If I was playing WOW and I stated my pings are in the 70s, many people I was speaking to would understand and know what I meant. Yes we should always use full descriptions and display the units as well but there are times and areas where many of this can be assumed and it is still proper use of the launguage. I think the overall context of the sentense would greatly influence your interpetation of the meaning more then any incorrect or use of an apostrophe would.

      I was reading slashdot today. We all know what that means but my grandmother does not. Should we all be stating that I was using a computer and at the internet site with the address of http://slashdot.org/ and reading the content?

    5. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Pollardito · · Score: 1
      Along the same lines, it's just horrific that they'd abbreviate the decade anyway, how are we to know that the writer didn't intend the 1870s, or the 2070s even, if he happens to be living backwards in time?
      well, from context one could assume that "the software uses algorithmic techniques known by mathematicians since the 70's" is not referring to knowing something "since" the future, and it's pretty rare to run across data storage algorithms from the 1870s or earlier
    6. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Red+Flayer · · Score: 1
      well, from context one could assume that "the software uses algorithmic techniques known by mathematicians since the 70's" is not referring to knowing something "since" the future, and it's pretty rare to run across data storage algorithms from the 1870s or earlier
      Sure, and from context, one could assume that 70's means '70s. That's the point I was making -- pedantry based on the apostrophe is wasted, since the context makes what the author intended absolutely clear.
      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    7. Re:aaaaaaaaaarrrrrrrgggggggggghhhhhhh! by Anonymous Coward · · Score: 0
      ...the second is to denote the possessive that is implied.
      [Emphasis Mine]

      Umm - No. The 1970s (note that lack of apostrophe), denotes the years between 1970 & 1979 inclusively; that is to say it is a set of years. As such, this is a plurar form not a possessive so the apostrophe is not appropriate. Your later reference to the seventies in your closing is quite accurate in that it is not the "seventy's".

  12. 6 of 11 by Bonker · · Score: 4, Informative

    After RTFA, it occurs that this is mostly a research project. The goals (and downloadables) include libraries that allow PCs to mount a distributed encrypted filesystem and others.

    In a business example where you know that you can ultimately control the sites where you're storing your partial data, this would be a very good thing.

    For the single user attempting to secure his information by using the existing network, there are some downfalls. 6 of 1l slices of the data are needed to recontstruct the whole. Therefore if a party intent on obtaining secret data obtains the majority of the servers, he has the data.

    Also, if a disaster wipes out the majority of the servers, leaving five or less of the eleven, the data is gone.

    This is a very, very important concept for business storage, but I have to wonder if it scratches any geek itches not already soothed by Truecrypt and Par2.

    --
    The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
    1. Re:6 of 11 by darkmeridian · · Score: 0

      I believe that distributing sensitive information over different servers is a great idea in security in that it allows variety in platforms. Some of the nodes can run Linux, others BSD, and some can run Windows. The servers would be in physically disparate places on different power grids/internet lines. You don't have to put all your eggs in one basket. A hacker would have to compromise many more operating systems in order to get your information. The same is true of any flaw that could kill your data: it has to run across different locations and platforms. Hey. Security through diversity!

      --
      A NYC lawyer blogs. http://www.chuangblog.com/
    2. Re:6 of 11 by Anonymous Coward · · Score: 0

      Which is better than storing it on only one server, then you have a 6/11 chance of losing all the data.

  13. Number 2 of 4 by Red+Flayer · · Score: 4, Funny

    See Comment 15948676

    Of complexity, but also adds

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    1. Re:Number 2 of 4 by complete+loony · · Score: 1

      15952723
      a great way

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  14. As it applies to my sex life by sjonke · · Score: 1

    In this paper we show how to divide data D into n pieces in such a way that D is easily reconstructable
    from any k pieces, but even complete knowledge of k - 1 pieces reveals absolutely no information about D

    I use this approach in my sex life, however, rather than obscuring information about D, even knowing one "piece" p reveals way more information than I'd like to have out there. Hell, ever since k-1 got a page on myspace, every potential n+1 knows about me before we even get started.

    --
    --- What?
  15. dispersion, section one by thrillseeker · · Score: 1

    I'm ...

  16. Distributed Pointers Too? by G4from128k · · Score: 1

    I can only hope that this scheme includes distributed storage of the pointers to all the fragments, too. Distributed data is only as reliable as the metadata that record where the data fragments are located. If the user of the system loses their only copy of the map to their fragments, the data is lost. If, on the other hand, each fragment also includes encrypted pointers to a few other fragments, then decrypting any fragment lets one bootstrap recovery of the entire network of fragments (a good thing if you want reliability, maybe less desirable for those seeking security).

    --
    Two wrongs don't make a right, but three lefts do.
  17. Number 3 of 4 by Red+Flayer · · Score: 4, Funny

    See Comment 15948695Another layer of inefficiency and

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    1. Re:Number 3 of 4 by complete+loony · · Score: 1

      15952732
      to get lots

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  18. Like mnet? by haeger · · Score: 4, Insightful
    If I'm not mistaken, this was one of the goals with the (now dead?) mnet project.
    From what I remember they split up data into multiple pieces, encrypted it and distributed it over a number of nodes, with some redundancy in it. If you know python and are intrested in p2p I'm sure there's a lot to be learned from that project.

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    1. Re:Like mnet? by Jim+McCoy · · Score: 1
      Actually, mnet was based on mojonation, which used Rabin's IDA for splitting the data in a distributed filesystem. While I created the mojonation architecture (and can actually say "been there, done that, printed the t-shirts...") I can't actually claim precendence on the idea -- the real first ideas for this space came from the Intermezzo system and also from Mark Lillibridge's work at DEC.

      The current incarnation of these ideas can be seen in the Allmydata service, which uses Tornado/Raptor codes (very advanced forward error-correction) to split up the data. There are several problems with using Rabin's IDA for this purpose, ones which bit us in the ass several times and which I think might end up causing problems for this system...

  19. dispersion, section two by thrillseeker · · Score: 1

    ... unsure ...

  20. but. but, but... by Anonymous Coward · · Score: 2, Funny

    Secure Data by Scattering the Pieces

    You mean to tell me that all those hours of defragging my HD's on Windows 98 were actually a waste of time?? ;-)

  21. Pick up the Pieces by pete-classic · · Score: 1

    Sure, this will work until someone comes up with an Average White Band exploit. Then it's useless.

    -Peter

  22. Number 4 of 4 by Red+Flayer · · Score: 5, Funny

    See comment 15948718

    an increased risk of loss of data.

    Burma Shave.

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    1. Re:Number 4 of 4 by Anonymous Coward · · Score: 0

      that was celver and hilarious!

    2. Re:Number 4 of 4 by complete+loony · · Score: 1

      15952743
      of funny mods.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    3. Re:Number 4 of 4 by Chacham · · Score: 1

      Burma shave.

      heh!

  23. dispersion, section three by thrillseeker · · Score: 1

    ... of ...

  24. Don't understand the "IDA" trademark either... by xxxJonBoyxxx · · Score: 1
    information dispersal algorithms (IDAs(TM))
    I'm not sure what they think they're trademarking here. "IDA" is an abbreviation used by many other things, some of them overlapping computer or storage technology . "Information dispersal algorithms" is a term already in common usage. (Just do a search for either term...)
  25. dispersion, section four by thrillseeker · · Score: 2, Funny

    ... the ...

  26. I thought of this a few years ago by greg_barton · · Score: 2, Interesting

    I thought about a system to do this a few years ago, but with a little twist: distribution of the pieces would be via computer virus. The pieces would be stored in user's computers, but more importantly in intrusion logs of "secure" systems as well. Retrieval would be a social act, kind of like a treasure hunt. "Hey, geeks of the world, there's this important information out there. Go figure out how to get it!"

    This system could be used for high profile secrets, like government whistle-blower data and the like. Storage would be secret and nearly undetectable because of all the other virus noise. Retrieval would be highly public by necessity, both to make retrieval possible and to publicize the contents of the data.

    1. Re:I thought of this a few years ago by Ginger+Unicorn · · Score: 1

      that sounds like that episode of star trek where they discovered a message hidden in the DNA of all humanoid species, that explained why all the aliens in star trek look like humans with little bits of rubber glued to their faces.

      --
      (1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
  27. dispersion, section five by thrillseeker · · Score: 2, Funny

    ... novelty.

    1. Re:dispersion, section five by Anonymous Coward · · Score: 0

      Congratulations, you ripped off another poster that did a better job. You suck.

  28. You invented Lotus Notes by Gary+W.+Longsine · · Score: 2, Informative

    Lotus Notes doesn't really do this on purpose, but an artifact of the system architecture and user behavior (they want local copies of everything) seems to combine to provide a rudimentary capability of data recovery from widely distributed stores such as you describe. I observed a client once restore all the data on a Lotus Notes server following a catastrophic data loss (short chain of events meant no backups could not be recovered following the loss of a RAID filesystem, many GB of data gone). They put a call out their user community for the replicated copies of the various "database" thingies which existed on user laptops and desktop systems. They were able to recover all of the data that anyone cared about, anyway, if not actually all of it.

    --
    If you mod me down, I shall become more powerful than you could possibly imagine.
    1. Re:You invented Lotus Notes by GMontag · · Score: 1

      See? You are one of the brains I could have used for this!

      Very informative.

    2. Re:You invented Lotus Notes by Anonymous Coward · · Score: 0
      short chain of events meant no backups could not be recovered following the loss...
      So... all backups could be recovered?

      I'm sorry, but what's the problem there?
    3. Re:You invented Lotus Notes by Gary+W.+Longsine · · Score: 1

      Heh. Yes, you correctly identified my grammar error. Naturally I meant either, "backups could not be" or "no backups could be". Sunrise over turquoise mountains, Disk kaput. Backups kaput. Llama taboot taboot. : )

      --
      If you mod me down, I shall become more powerful than you could possibly imagine.
  29. Transposition ciphers by redelm · · Score: 1
    Bah, I'm not interested. Moving data around is just another form of transposition cipher. Proven-good cipher systems use both transposition and substitution, preferably on compressed data.

    This only works if the distance between the moved elements is greater than the attacker can cross. Not much different than sending reset passwds unencrypted through emails.

  30. new implimentation of an old idea by bingbong · · Score: 3, Informative

    Ross Anderson of the Computer Security Group at Cambridge University wrote a paper called the Eternity Service. It has had a few different attempts at implementation, as well as some reworks in terms of design. The primary difference is in the Eternity Service - you had no idea what data you had, nor did you have access to the keys. This new concept/design seems to provide more control/granualirity for the user. Given the new proposed encryption laws in the UK, I'm not sure this is a good idea.

    --
    "Omnis tuus capsa sunt inesse nos"
  31. Re:dispersion, section six (final) by Anonymous Coward · · Score: 0

    Man, that sucks. 1 minute later on the initiation of the joke, 7 minutes later on the completion... and you get a redundant mod. I guess that's the way the cookie crumbles.

  32. FTA: by Anonymous Coward · · Score: 0

    But he had been reading histories of early encryption research, and he saw a germ of an idea in the work of cryptographers who kept information secure by dividing it into pieces and dispersing it.
    Germ?

    1. Re:FTA: by Anonymous Coward · · Score: 0
      But he had been reading histories of early encryption research, and he saw a germ of an idea in the work of cryptographers who kept information secure by dividing it into pieces and dispersing it.
      Germ?
      Yup, germ.

      Sometimes words have multiple meanings, sometimes the meaning in use (as with this time) is the most proper use of the word. Google "define: $WORDWITHUNFAMILIARUSAGE" next time and you'll be doing yourself a favor.
    2. Re:FTA: by Anonymous Coward · · Score: 0

      Or it could be that the GP is using the word in an incredulous manner. As in "He saw a "germ of an idea" in earlier research?! That "germ" is more like a raging infection of bacterial growth that has been thriving for many a year now."

  33. I don't see whats so new by Alistar · · Score: 2, Interesting

    I've been doing something like this for years.

    First I would encrypt the original file, split it up into 10-100 pieces, encrypt those, hide them in other files, encrypt those, then store them in random locations around the internet either by emailing a piece to a webmail or uploading to a server somewhere, posting the binary or hex sequence to a forum, things like that.

    Heck sometimes I'd repeat the repeat the encrypt/split/hide process several times, or even put the last step as hidden. Yes I realize anyone with any computer talent could find a file hidden in another one, but it keeps it out of plain sight.
    I also remove any identifiable information on what order the pieces go in, I rely on myself to remember. Or leave clues elsewhere.
    I'll admit sometimes it takes like 3 days to gather and assemble them if I need them, though.

    I use it for things that are better off gone forever than being leaked.

    1. Re:I don't see whats so new by Anonymous Coward · · Score: 0

      Damn kids and your new-fangled algorithms. Back when I was your age, we set fire to the drive if we wanted to erase our data, and we liked it that way!

    2. Re:I don't see whats so new by LaughingCoder · · Score: 1

      I use it for things that are better off gone forever than being leaked.

      You mean like the details of your encryption/splitting/hiding algorithm?

      --
      The more you regulate a company, the worse its products become.
    3. Re:I don't see whats so new by Nerdfest · · Score: 2, Funny

      Osama?

    4. Re:I don't see whats so new by Alistar · · Score: 1

      Heh,

      While, for one, the individual steps may not be perfectly secure they are certainly far more complex and involve several expert and natural language systems.
      But besides that, I figure if you can find the pieces, put them together in the right order (several times) and decrypt them, then my hat's off to you and I deserve whatever I get for my arrogance in my security.

    5. Re:I don't see whats so new by Anonymous Coward · · Score: 0

      A,
      Maybe I misunderstood the GP,... but I thought he was implying that any "arrogance" probably applied as much to the material you chose to splicrypt (TM,... nah) rather than the actual method.

      I'll admit to being curious as to what you wrote that you felt the desire/need to go through more than a single encryption,... but I may be showing what a fool I am in not knowing enough to hide from trouble,... or not knowing something someone else might really want to know.

      I'll live with the curiousity,... and ignorance,... I hope.

      regards,
      g

    6. Re:I don't see whats so new by version2 · · Score: 1

      Don't want those Vince Foster memos to fall into the wrong hands now do we?!

  34. Ignorance is no excuse by megaditto · · Score: 1

    Ignorantia legis neminem excusamus That's Latin for "ignorance of the law is no excuse": a principle recognized by the U.S. Courts.

    Now consider what happens when RIAA figures out that every linux user may store copywrighted tunes in their /dev/random
    (Put a million computers to cat /dev/random for 100 years, and who knows what they'll come up with...)

    Homework: test how long it takes for your /dev/urandom to come up with the beginning of Star Wars' music score: DDDGDCBA

    --
    Obama likes poor people so much, he wants to make more of them.
    1. Re:Ignorance is no excuse by growse · · Score: 1

      You jest!

      I'm actually going to go do this. I'll do Shakespeare quotes, music tunes, Futurama quotes. Any others?

      --
      There is nothing interesting going on at my blog
    2. Re:Ignorance is no excuse by psmears · · Score: 3, Informative
      Ignorantia legis neminem excusamus [sic] That's Latin for "ignorance of the *law* is no excuse"
      (emphasis mine) That legal principle only means that you can't murder someone, and then claim “...but I didn’t know that was against the law” and hope to be let off. In order to commit a crime, you have to have mens rea (to use yet more Latin)—that is, a “guilty mind”—so storing data that you genuinely didn't know was illegal isn’t a crime. And in this case, there are plenty of possible, perfectly legal reasons for wanting to store encrypted data (for example, there’s a huge market for offsite backup for corporations, who tend to want to keep their stuff secret), so the storage providers are probably OK (disclaimer: the above is about criminal law, civil law is a bit murkier, IANAL etc). On the other hand, it would be very hard to be certain that no trace of the illegal data remained on the client machine (especially if you don’t want to type a twenty-word passphrase every time you want to listen to a new track ;-)
    3. Re:Ignorance is no excuse by megaditto · · Score: 1

      How about "but she told me she was 21" as a defence for anybody who genuinely didn't know the girl was 13?

      GPP was also a joke, by the way.

      --
      Obama likes poor people so much, he wants to make more of them.
    4. Re:Ignorance is no excuse by megaditto · · Score: 1

      No, it's not as farfetched as one might think. For example, the score "DDDGDCBA" can be stored in an int (32 bit with some space left over), so randomly generating that is rather trivial. Of course you can also treat the tune as an ASCII string, but generating that will take much longer.

      With English prose, a good assumption is 2 bits per letter.

      --
      Obama likes poor people so much, he wants to make more of them.
    5. Re:Ignorance is no excuse by Anonymous Coward · · Score: 0

      That legal principle only means that you can't murder someone, and then claim "...but I didn't know that was against the law" and hope to be let off. In order to commit a crime, you have to have mens rea (to use yet more Latin)--that is, a "guilty mind"--so storing data that you genuinely didn't know was illegal isn't a crime.

      mens rea isn't required for strict liability offences. In the US, direct copyright infringers are strictly liable, but contributory infringers (like your ISP) are not. You may be able to use the mens rea argument for data slices that are part of someone elses infringing copy, but not for your own data.

      (IANAL, but the mens rea issue came up in the news a lot when the RIAA wanted it to apply to service providers)

  35. Do not mod up by Anonymous Coward · · Score: 2, Informative
    1. Re:Do not mod up by thePowerOfGrayskull · · Score: 1

      Wow. This copy-pasting seems to be a habit for TSP.

  36. The problem... by Fulkkari · · Score: 2, Interesting

    The problem with this idea is bandwidth and speed. You think your broadband is fast, but if you have to download the 27 gigabytes of photos, music and stuff, it won't be exactly fast on a 8 Mbps DSL, not to talk about 1 Mbps or less. You might wait a couple of hours, but you won't wait a couple of days.

    Okay. So you tell me that amount of available bandwidth will increase? But so will the amount of data that needs to be backed up. And it will grow faster than the bandwidth. Think of homemade movies. You can already fill up your average drive in no-time. What do you then do, when you get a HD camera?

    Although the idea isn't a new one, I think it is still neat. It might work for some stuff, but I don't see this becoming mainstream with technologies like Time Machine coming to the end-users.

    --
    I demand the Cone of Silence!
    1. Re:The problem... by imsabbel · · Score: 1

      Not to mention that you have to upload it, too, which is usually a order of magnitude slower.

      Plus you have to upload it more than once (a LOT more than once if you want to be sure) to avoid emberassing "the important last piece of my backup was on the old 486 of a hobo that got thrown away" situations.
      Learning from normal P2P, if you want to get it back after a year, there should be at least a 10-20 factor of redundancy.
      Which leads to another point: those redundancy of course is incredibly wasteful. Just imagine if everybody used 10Gbyte for backups, he would need to provide 100Gbyte for others to keep up the ratios...

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    2. Re:The problem... by Fulkkari · · Score: 1
      Not to mention that you have to upload it, too, which is usually a order of magnitude slower.

      True. Uploading will be very slow and you would have to consider the fact that depending on the system you might need to upload the same data more than once. However, uploading backups would not be as a big priority to users as restoring them. It could happen all the time slowly in the background. Once all the data, say 80 GB, is uploaded you would only need to update the changes. Say you changed an average of 1 GB of your data in a week, it would be enough to have 512 kbps upstream to update the changes to one destination without interrupting your websurfing. Thus it would be possible, but unless your files are just plaintext, the backups would most likely lag from the local copy and updating all the time might interfere with your work (or entertainment).

      those redundancy of course is incredibly wasteful. Just imagine if everybody used 10Gbyte for backups, he would need to provide 100Gbyte for others to keep up the ratios...

      True. This would be a limiting factor in distributing large backups to other clients. The network would almost need to run on professional servers with dedicated resources, if you really wanted it to be redundant. Why they should choose this instead of the usual RAID is beyond me.

      --
      I demand the Cone of Silence!
  37. Sorry, old news by inviolet · · Score: 1
    Adi Shamir (of RSA) first wrote of information dispersal is his 1979 paper 'How to Share a Secret (pdf).'

    Oh come on, a paper?

    Everyone knows that if you want to share a secret, you just tell it to a -- eh, never mind. :P

    --
    FATMOUSE + YOU = FATMOUSE
  38. That's how CDs work - distributed data by Animats · · Score: 2, Informative

    Not quite, but the coding scheme that makes CDs and DVDs resistant to dust and scratches works much like that. Big blocks have an error correcting code appended, and then the bits of the data plus error correcting code are rearranged and spread widely across the block. So when you lose a contiguous set of bits, you can replace it by using data distributed across the block.

    It's a good error correction scheme, but it's not exactly new. Every CD player in the world has this. CDs aren't encrypted (there's no key, just an well-known algorithm), but you could mix encryption in if you wanted. This wouldn't help the error recovery.

  39. Ancient by Salamander · · Score: 2, Informative

    This is so not-new it's not even funny. I've already seen FreeNet and MNet mentioned as precursors, which is appropriate. Dozens of other P2P "filesystems" (in quotes because I don't believe it's truly a filesystem unless it's fully integrated into the OS) and block-level data stores have done this. Probably the one that most thoroughly examined the inherent tradeoffs, and that's most directly based on Shamir's IDA work, is PASIS at CMU. Presenting Cleversafe as the first to move in this direction is an insult to those who have gone before.

    --
    Slashdot - News for Herds. Stuff that Splatters.
    1. Re:Ancient by NittanyTuring · · Score: 1
      Probably the one that most thoroughly examined the inherent tradeoffs, and that's most directly based on Shamir's IDA work, is PASIS at CMU.
      PASIS is not based on IDA. PASIS is not a mechanism for data dispersal. It is a family of storage protocols that make efficient use of data dispersal mechanisms. Any mechanism that satisfies the m-of-n condition (where of n data fragments, m are necessary to reconstruct the orignal data item) can be used. This can be mirroring, striping, erasure coding, IDA, and anything else researchers will cook up in the future.

      PASIS extends combines data dispersal with checksumming to not only deal with lost data, but bad data. Additionally, it can be tuned to address different fault models, such as fail-stop (where machines that go down never come back up), or synchronous timing (where working machines are guaranteed to respond within a predefined timeout period).

      The motivation behind PASIS is to expose all options and possibilties. Few questions are actually answered. It shows you all the knobs and switches, but you still have to dial in the right setting. Any practical implementation of a distributed storage system (RAID, Google FS, Cleversafe, etc) has to answer these questions.
    2. Re:Ancient by Salamander · · Score: 1
      The motivation behind PASIS is to expose all options and possibilties.
      ...which is exactly what I would want from a research project, and why I cited it. When I saw it at a PDL open house as a representative from EMC I was very impressed with how thoroughly some of those tradeoffs had been examined, and couldn't have cared less that it was nowhere near being a completely functional system. Producing a completely functional system is what I was doing, in a commercial context, though it turned out that my employer was too short-sighted to make use of the result. Even then there are plenty of precedents for what Cleversafe is doing, though I'd say GoogleFS is not as close as HiveCache (based on MojoNation, as is MNet) or Mango's Medley (which I also worked on). I apologize if you feel I misrepresented PASIS, which I meant to praise, but I stand by my point that the field is full of both commercial and academic precedents for what has been presented here as though it were innovative. Everything I see on their site indicates that it's a pretty straightforward synthesis of existing ideas.
      --
      Slashdot - News for Herds. Stuff that Splatters.
  40. A clever, efficient approach by meese · · Score: 1

    While secret sharing is cool, one of its primary drawbacks is that it's usually built using asymmetric crypto (as in, based on number theoretic assumptions and the like). That means it's potentially quite slow. Ross Anderson wrote a paper on a cool alternative which uses only symmetric primitives to achieve the same result. (In fact, he's able to build a lot of different things by combining symmetric primitives in the right way.)

  41. The Judge by wonkavader · · Score: 1

    I'm a little reminded of the Judge from Buffy. Pieces scattered around the world. For security. This seems like a better application of the technique.

  42. Sharing a secret in the offline world by davidwr · · Score: 3, Interesting

    A friend taught me this. The secret in his case was a proprietary industrial process.

    You take the secret and divide it into 3 pieces. You have a team of 3 people to each carry or memorize two of the 3 pieces.

    Amy carries pieces 1 and 2
    Bob carries pieces 2 and 3
    Charlie carries pieces 3 and 1

    If any one of them is compromised by bribery or other means, 1) the information is not lost and 2) the enemy has only an incomplete picture of what is going on.

    This can be extended to more people to achieve greater redundancy or less exposure:

    More redundancy: 4 people with 4 peices, each person knows 3 elements. Any 2 of 4 people needed to put the pieces together.

    Less exposure: 4 people with 4 pieces, each knows 2 elements. Any 3 of 4 people needed to put the pieces together. Loss of 1 person exposes 1/2 of the total secret.

    There's no reason to stop with 4 people and 4 pieces.

    Think of this as RAID for human-knowledge.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Sharing a secret in the offline world by MultisSanguinisFluit · · Score: 1

      Works well for passwords, by the way, since it can be extended to many characters. The problem is that you are giving away some information. In this example, where you need 2/3, one may be able to make enough sense of the whole to make the missing pieces irrelevant. In the case of passwords, you wind up having to brute force a lot less.
      As several commenters noted, however, There are secret-sharing techniques that do not allow any information to be gleamed from possessing k-1 parts of the secret, where k is the minimum number of parts required to reconstruct the whole.

      --
      > get tea
      No Tea: dropped.
  43. Performance and Backups? by Anonymous Coward · · Score: 0

    This will only move forward if they can make it perform as well as traditional disk access, and what do they plan for a backup strategy?

  44. Pfft security schmerity by ET_Fleshy · · Score: 1
    This concept just adds another layer Of complexity, but also adds Another layer of inefficiency and an increased risk of loss of data..
    Juu b33n h4x3d bY el1t35, f001

    ~Teh Def1c4t05S~
  45. Shamir? No by pedantic+bore · · Score: 3, Informative

    The Information Dispersal Algorithm is due to Michael Rabin.
    Shamir's secret-sharing algorithm uses a similar idea (it's
    essentially the same as Rabin's algorithm, except that the
    data is padded with random gibberish).

    --
    Am I part of the core demographic for Swedish Fish?
  46. Crypto system for human rights watchers by wwphx · · Score: 1

    This was several years ago, but I read a paper, I believe on Slashdot, about a crypto system intended for people like human rights observers working in the field. Basically you would write up your report, call up this program, pass your report to it, and the program would write it in crypto to uninitialized blocks of the file system so that it appeared to be random noise.

    The concept was that the watcher's laptop was likely to be inspected when they left the country. The inspectors wouldn't find anything since they wouldn't know how this program was started, much less the keys required to make it work.

    Does anyone else remember this? I've searched Slashdot with zero success, I even emailed Bruce Schneier but he hadn't heard of this.

    --
    When you sympathize with stupidity, you start thinking like an idiot.
    1. Re:Crypto system for human rights watchers by andrewman327 · · Score: 1

      You're thinking of Steganography. Somewhere in here is the story you mentioned.

      --
      Information wants a fueled airplane waiting at the hangar and no one gets hurt.
    2. Re:Crypto system for human rights watchers by wwphx · · Score: 1

      No, it wasn't steganography, at least in the conventional useage of the term. Basically when you installed your *nix distro, you would not partition the entire disk. This program would write your files to the uninitialized portion of the disk. Since the data was written outside of the partitioned area, it would not be seen in a casual search. Additionally, the data was encrypted in such a way as to appear no different than uninitialized drive space.

      --
      When you sympathize with stupidity, you start thinking like an idiot.
    3. Re:Crypto system for human rights watchers by Anonymous Coward · · Score: 0

      Additionally, the data was encrypted in such a way as to appear no different than uninitialized drive space.

      It was encrypted as zeros?

    4. Re:Crypto system for human rights watchers by epee1221 · · Score: 1
      No, it wasn't steganography, at least in the conventional useage of the term. Basically when you installed your *nix distro, you would not partition the entire disk. This program would write your files to the uninitialized portion of the disk.

      I'm a little fuzzy on terms here. Remind me why this isn't steganography?
      --
      "The use-mention distinction" is not "enforced here."
    5. Re:Crypto system for human rights watchers by wwphx · · Score: 1

      Steganography is normally hiding a message within a file. It can be detected without too much difficulty (file doesn't compress as much, etc) even if you can't retreive the message. This system did not do this, you wouldn't even see the files since it was in the backgroud noise of the disk clutter, for want of a better term.

      --
      When you sympathize with stupidity, you start thinking like an idiot.
    6. Re:Crypto system for human rights watchers by richie2000 · · Score: 1

      a crypto system intended for people like human rights observers working in the field.

      That would be Rubberhose.

      It scares me that Bruce said he didn't know about it. That means he doesn't want anyone to know. Please tell my kids to be good to their mom and that I love them.

      --
      Money for nothing, pix for free
    7. Re:Crypto system for human rights watchers by epee1221 · · Score: 1
      Steganography is normally hiding a message within a file.

      Ok, there's the clarification I needed.
      --
      "The use-mention distinction" is not "enforced here."
  47. Re:Shamir? No by mrogers · · Score: 1

    Oops, need to update my literature review in that case. Thanks!

  48. I...welcome...encrypting by Pasquina · · Score: 1

    for...our...overloards

  49. reg-free link to NYT article by Anonymous Coward · · Score: 1, Informative

    Here's a reg-free link.

    Courtesy of the New York Times Link Generator.

  50. Cleversafe vs. copying and other projects by cgladwin · · Score: 1

    In reading through a lot of the posts, I thought it might be useful to elaborate on how Cleversafe compares to current copy-based data storage systems as well as previous projects using similar techniques for data storage and communications...

    Effectively all digital data storage in use today works by making copies of data and redundant copies of data with the use of parity bits when stored on a RAID array. Cleversafe does not store a copy of the original data and definitely does not store copies of data. Cleversafe 'disperses' data which is different than copying data. Original data files are turned into a set of 'dispersed files' or 'slices' -- each of which contains too little information to be useful on its own. These slices are then stored in different locations. On the current Cleversafe test grid, each file is dispersed into 11 slices which are each stored by a separate storage hosting providers in separate geographic locations as shown at http://www.cleversafe.org/wiki/Cleversafe_Research _Storage_Grid.

    In order to ensure ultra-high availability, the dispersal algorithms are designed in such a manner that any majority of these slices can be used to perfectly recreate all the original data. This technique is similar to methods often employed in data communications where data is broken up into some number of packets by the sender in such a manner that the receiver does not need all the packets to recreate all the original data.

    Over the past 25 years, a number of projects have looked at storing data using information dispersal or similar techniques. Many of these projects have used Reed Solomon or similar encoding / decoding techniques, including OceanStore, PAR/PAR2 and others. The Cleversafe project is not only developing algorithms for information dispersal, but is also creating a complete system to enable the benefits of Dispersed Storage to be practically used on a generally-available-to-everyone scale. So in addition to creating new, computationally-efficient algorithms for information dispersal, the Cleversafe project includes:

    - a metadata management system for managing files stored on the grid

    - grid management tools, including 'rebuilder' processes that enable the grid to 'self-heal'

    - interfaces to enable dispersed storage to work in various existing environments, including a general API, a command line interface, a file system interface (which we began demonstrating at Linux World last week) and an upcoming GUI interface

    - integration with existing methods for encryption

    - live dispersed storage grids running on nodes operated by various storage hosting companies in various locations

    - etc.

    So, the focus of Cleversafe is to build on the previous work in dispersed storage (which has mainly been academic research) to create a practical and complete open source system to better store the world's data.

    Chris Gladwin

    --
    Cleversafe is improving how the world stores its data. Join us at www.cleversafe.org.
  51. Like Linus said... by rice_burners_suck · · Score: 1
    It's like Linus Torvalds, creator of the free operating system Linux, said once:

    "Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it." So this idea is similar to some degree.

    1. Re:Like Linus said... by Anonymous Coward · · Score: 0

      Linus who?

      I think you just lost about 90% of the slashdot audience there... you said he wrote a program or something?

  52. Notes from the Cleversafe lead developer by mengland · · Score: 5, Informative

    (Fyi: this link to the New York Times article bypasses any need to login/register with the nytimes.com website.)

    I'm the Cleversafe Dispersed Storage software-development project leader. I work with Chris Gladwin (mentioned in the New York Times article) as a fellow manager at Cleversafe.

    I offer some comments below to help outline some of the unique aspects of the Cleversafe technology.

    Encryption is not dispersal. Cleversafe provides both, and then some. The Cleversafe Dispersed Storage software disperses any "datasource" (typically a file) into several slices (our current software current uses 11 slices in an 11-lose-any-5 scheme; future versions may use additional schemes with "wider" slice sets). Additionally, our software also encrypts, compresses, scrambles, and signs the datasource content, but we are not trying to reinvent the wheel: other software technologies exist to do these things, and we leverage them extensively.

    We found that a bigger challenge than creating or managing dispersal algorithms was to make the entire storage system regardless of the dispersal algorithm used (and we design the system to be dispersal-scheme agnostic). The meta-data management system and many other things took us far longer to implement then the Cleversafe IDA. It's not hard to use Reed-Solomon, or some other algorithm on a single file or a small set of files and disperse the slices by hand onto several different system (or use variants of this like the 3-piece secret story with Amy, Bob, and Charlie mentioned above). It's much harder to manage this across an entire file system (with hundreds of thousand of files--or many more depending on the file system) for an unlimited number of file systems from all the various users across to be stored on heterogeneous set of an unlimited-number of geographically-dispersed, commodity-storage nodes in a completely-decentralized way with no dependence on the original source of the data (eg, you could sledgehammer your laptop and not lose any data that's stored on our grid/storage service). (I apologize for that run-on sentence.)

    Further, dispersed-storage systems do not require replication. (Dispersal systems may replicate data for performance purposes, if at all, depending on the application/configuration/installation/context.) If a system replicates entire copies of the data (be they encrypted or not) then it, by (our) definition is not a dispersed-storage system. So a continual question I have when evaluate other systems: do they replicate the data in whole or not? Most systems replicate.

    Cleversafe is not the first to present a dispersal system, but we like to think we are the first to make it broadly usable by people and inter-operable with other systems. See our cmdline client (which will soon have continous-backup and XML-programmable policy management), our Dispersed Storage API, our dsgfs file system, a soon-to-be released GUI client, and future "connectors" (what we call the applications that leverage our technology) to come, all available at http://www.cleversafe.org.

    A side note: "revision management" is built into the Cleversafe system to address what I call "soft" failures (accidental deletes, application failures, etc) vs. "hard" failures (hard disk crashes) as well as archival requirements.

    I believe that the concept of "dispersed storage" will eventually change how the world thinks about storage systems--regardless of whether or not these are Cleversafe-based systems (I think Cleversafe presents the best such system, but I of course am biased).

  53. Shared secret by hoggoth · · Score: 1

    There is prior art:

    "Blondie, what did he tell you? I know which graveyard the money is buried in. Don't die on me Blondie. What did he tell you?"

    "A name... a name on a gravestone..."

    "Ah! We are partners! I know the graveyard, you know the name! Partners just like good old times, eh?!"

    --
    - For the complete works of Shakespeare: cat /dev/random (may take some time)
  54. Use more than one pad? by GrEp · · Score: 1

    (Didn't bother to RTFA but..)
    Why not just get K random sequences and XOR them together to get a 1 time pad. Then encrypt the data and store it in public view. You will need ALL the pads to unlock it.

    --

    bash-2.04$
    bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
    1. Re:Use more than one pad? by Anonymous Coward · · Score: 1, Insightful

      Using K one-time pads to encrypt data would be highly secure, but it would not be very reliable. Losing any one of the K one time pads would mean losing your data. It would also not be very efficient: the K one time pads would each take up as much space as the original file. The Cleversafe IDA can recover the data even if 5 out of 11 sites are irrecoverably destroyed making it highly reliable. Each slice is approximately 1/6th the size of the original data, meaning that all 11 slices take up about 11/6ths the size of the original data (equivealent to the storage requirements of making one copy). This is what makes IDA's so promising for storage, you get extreme reliability along with extreme efficiency.

  55. It's an RPG quest! by Psychochild · · Score: 1

    So this is why there were so many of those "scattered items" type quests in console RPGs.

    Elder: "We need the sacred information of Pr0n!"
    Elder: "Unfortunately, the dastardly Cleversafe has scattered this information into 12 parts."
    Elder: "You must go to each of the 12 ancient ruins and collect the sacred information for us!"
    Player: "This quest sucks."

    Makes sense now....

    --
    Brian "Psychochild" Green
    MMO developer's blog
  56. Monty Python's Joke Warfare did it already by MarkCarson · · Score: 1

    Way back when the earth was flat, Monty Python had a skit where there was a joke so funny that you'd die laughing, literally. The military wanted to use it against they enemies, so they can to translate it into the enemies language (probably German in the skit,I don't recall) but if a single translator was given the whole joke, they woule die laughing before they would write down the translation. So they cut the printed copy of the joke into smaller (non-leathal) sections and had a group of translators translate sections of the joke which was later reassembled into final form. Same principle.

    --
    I'm scared of world leaders who think locally and act globally.
  57. re: redundant slices by Anonymous Coward · · Score: 0

    I do see possible implementation flaw. The data is encrypted and distributed across multiple locations. It seems that relying on a single or small number of bandwidth providers removes some of the benefit from this technology. You've got redundant encrypted storage, but a single provider (or a group working together) could agreggate all of the pieces. How do you get all of the pieces distributed without moving this data over a compromised channel? Isn't this a geo-transposition cypher over an unsecured channel wrapped around

    Any ideas?

  58. Potentially great for internal use... by WoTG · · Score: 2, Interesting

    I've often wondered when someone would get around to perfecting a dispersed backup system for LAN's. With the average workstation toteing 100GB drives, and the average use of a handful of GB's, there seems to be a surplus of cheap disk space on the LAN... at least compared to backup tapes or other media. Though, in hindsight, I guess a single fire or building disaster would still be catastrophic...

    1. Re:Potentially great for internal use... by Anonymous Coward · · Score: 0

      See http://gridblocks.sourceforge.net/ and their gbdisk.

      ac because of open wlan....

  59. wasn't this an item in slashdot a while ago? by bobp0303 · · Score: 1

    The focus was on security rather than data storage, but I'm sure I read about it on slashdot...